Word segmentation for PTT post content and comments.

seg_content(x, words = NULL, tags = NULL, user = NULL)

seg_comment(x, words = NULL, tags = NULL, user = NULL)

Arguments

x

Column 'content' or 'comment' from a data frame returned by post2df.

words

Character vector. A vector of words to pass to jiebaR dictionary. See new_user_word for details.

tags

Character vector. A vector of tags specifying the lexical categories of the words in `words`. Defaults to `n` (noun). See new_user_word for details.

user

Character. A string specifying the path to an user defined dictionary. Defaults to pttR built-in dictionary. See https://qinwenfeng.com/jiebaR/worker-.html#user- for details.

Source

For details about the built-in ptt dictionary, see https://liao961120.github.io/PTT-scrapy/.