New interrogation options
This release is designed to reflect a change from purpose-built interrogator()
search functions to the search
and show
arguments, which are much more powerful. Users can construct a dict
object with one or more dependency criteria to match, and elect to match all criteria or any criterion with searchmode = 'any'/'all'
.
>>> criteria = {'lemma': ['think', 'feel', 'want'],
... 'pos': r'^V',
... 'function': 'root'}
>>> r = interrogator(corpus, search = criteria, show = ['word'], searchmode = 'all')
>>> list(r.results.columns)[:5]
might return:
['think', 'thinking', 'want', 'wants', 'feel']
Passing in a longer list for the show
argument will set what is given in the output, as well as its order:
>>> r = interrogator(corpus, search = criteria, show = ['f', 'p', 'l'], searchmode = 'all')
>>> list(r.results.columns)[:3]
will produce column names with concatenated function, pos and lemma:
['root/vbp/think', 'root/vbg/thinking', 'root/vb/want']
Another improvement is the exclude
argument, which takes the place of blacklist
, function_filter
and pos_filter
. Alongside excludemode = 'any'/'all'
, it operates just like search
, allowing the user to exclude results matching one or more criteria:
>>> excs = {'pos': r'^V', 'word': r'ing$'}
>>> r = interrogator(corpus, search = criteria, show = ['f', 'p', 'l'],
... searchmode = 'all', exclude = excs, excludemode = 'all')
would remove any verbal token ending in ing
. Changing excludemode
to 'any'
would remove all verbs and all words ending in ing
.
The release has various other bugfixes, code cleanup, and some miscellaneous bits and pieces, such as a function for turning results into Pandas Multi Index DataFrames. Full API documentation is forthcoming.