3. Corpus Readers¶
-
yft.corp_readers.read_rawtext_as_words(fp: str, tk_sep='\u3000')[source]¶ Read a corpus from raw text
- Parameters
- fpstr
File path
- tk_sepstr, optional
Token separator, by default u’ ’
- Yields
- str
A word
Notes
The structure of the (default) corpus file:
<tk> <tk> <tk> <tk> ...<tk> <tk> <tk> <tk> <tk> ...<tk> ... <tk> <tk> <tk> <tk> ...<tk>