Data Retrieval

Functions for extracting information from PTT Web.

as_url()

Turn PTT board name to URL

read_html2()

Read PTT pages with "over18-confirmation"

down_html()

Dowload HTML files to local directory

index2df()

Extract data from multiple index pages of a PTT board.

post2df()

Extract information from PTT posts

Word Segmentation

Functions for performing word segmentation on post content.

seg_content() seg_comment()

Word segmentation for PTT post content and comments.

Corpus Construction

Functions for converting data frames to corpus objects.

comment2qcorp() comment2tmcorp()

Convert 'comment' list-column to 'corpus' list-column

post2qcorp() post2tmcorp()

Convert post data frame to corpus objects

Handy Helpers

Functions for getting useful information about pttR or PTT.

ptt()

Get PTT info

example_posts()

Retreive example data set of posts data frame

get_ptt_dict()

Get PTT dictionary

hotboards()

Return a data frame with popular boards info

ping2zh()

Pingyin-Character translation