Extract data from multiple index pages of a PTT board.

index2df scrapes the index pages of a board (" 看板 ") and extracts the information into a data frame.

index2df(board, newest = 1, pages = NA, search_term = NA,
  search_page = 1)

Arguments

board	Character. Either a URL or a board name, such as "Gossiping", "Baseball", "LoL". board name is case-insensitive. See Examples for details. `board` has a different requirements when used with argument `search` (See below).
newest	Integer. Number of pages, starting from the most recent page, to scrape. Defaults to `1`, which scrapes only the newest page. If set to `2`, then scrapes the newest and the second-newest page, and so forth.
pages	Integer vector. A vector of index page number(s). This parameter lets you scrape index pages by providing index page numbers. Becareful not to provide numbers exceeding the range of current index pages. Defaults to `NA`.
search_term	Character. A term to search in the index, such as "魯蛇". There are also some advanced search methods: Post thread Prepend "thread:" to the search term (post title): "thread:<post-title>". Posts of an author Prepend "author:" to the author's ID, e.g., "author:Plumage".
search_page	Integer vector. A vector of index page number(s). With argument `search_term` set, `search_page` lets you scrape index pages related to a specific term. Defaults to `1`, which scrapes only the newest page.

Value

A data frame with one post info per row.

Warning

Do not request too many pages one time. It places heavy load on the server.

Examples

# Get data from 'Gossiping'
index_df <- index2df("Gossiping")
head(index_df)

if (FALSE) {
# Or use URL directly
link <- "https://www.ptt.cc/bbs/Gossiping/index"

index_df <- index2df(link)
}

Extract data from multiple index pages of a PTT board.

Arguments

Value

Warning

See also

Examples

Contents