3. Corpus Readers

yft.corp_readers.read_rawtext_as_words(fp: str, tk_sep='\u3000')[source]

Read a corpus from raw text

Parameters
fpstr

File path

tk_sepstr, optional

Token separator, by default u’ ’

Yields
str

A word

Notes

The structure of the (default) corpus file:

<tk> <tk> <tk> <tk> ...<tk>

<tk> <tk> <tk> <tk> ...<tk>

...
<tk> <tk> <tk> <tk> ...<tk>