queryParser
¶
-
KWIC.queryParser.
querySpecificity
(queryObj={'tk': '^我們$', 'pos': 'N%', 'tk.regex': True})[source]¶ Score a token object for specificity.
- Parameters
- queryObjdict
A token object in a list returned by
tokenize()
.
- Returns
- float
A point indicating the specificity of the token. Higher score means the token is more specific and may result in fewer query results in the corpus. This point is used to determine the seed token of an ngram to search in the corpus (to boost performance).
-
KWIC.queryParser.
tokenize
(string)[source]¶ Parse query string for ngram into token objects
- Parameters
- stringstr
Query string with each token enclosed in a pair of square brackets. In each token, the tag
word
andpos
could be given as[word="他們" pos="N.*"]
. To search with regex inword
, append.regex
toword
:[word.regex="們$" pos="N.*"]
.pos
by default uses regex search.
- Returns
- list
A list of token objects (dictionaries), with each dictionary representing the token in the query string (i.e. token enclosed in the brackets). Each token has three key-value pairs:
tk:
str
. The pattern of the word to search for.tk.regex:
bool
. Whether to use regex search with word.pos:
str
. The pattern of the pos tag to search for.