index2df
scrapes the index pages of a board
("
看板
") and extracts the
information into a data frame.
index2df(board, newest = 1, pages = NA, search_term = NA, search_page = 1)
board | Character. Either a URL or a
board name, such as "Gossiping",
"Baseball", "LoL".
board name is case-insensitive. See
Examples for details.
|
---|---|
newest | Integer. Number of pages, starting from
the most recent page, to scrape.
Defaults to |
pages | Integer vector. A vector of index page number(s).
This parameter lets you scrape index pages by providing
index page numbers. Becareful not to
provide numbers exceeding the range of current index pages.
Defaults to |
search_term | Character. A term to search in the index, such as "魯蛇". There are also some advanced search methods:
|
search_page | Integer vector. A vector of index page
number(s). With argument |
A data frame with one post info per row.
Do not request too many pages one time. It places heavy load on the server.
get_index_info
get_index_info
extracts data from
one index page, while index2df
deals with
several. In addition, index2df
has more
functionality to deal with multiple pages extraction
# Get data from 'Gossiping' index_df <- index2df("Gossiping") head(index_df) if (FALSE) { # Or use URL directly link <- "https://www.ptt.cc/bbs/Gossiping/index" index_df <- index2df(link) }