Release New interrogation options · interrogator/corpkit

This release is designed to reflect a change from purpose-built interrogator() search functions to the search and show arguments, which are much more powerful. Users can construct a dict object with one or more dependency criteria to match, and elect to match all criteria or any criterion with searchmode = 'any'/'all'.

>>> criteria = {'lemma': ['think', 'feel', 'want'],
...             'pos': r'^V',
...             'function': 'root'}

>>> r = interrogator(corpus, search = criteria, show = ['word'], searchmode = 'all')
>>> list(r.results.columns)[:5]

might return:

['think', 'thinking', 'want', 'wants', 'feel']

Passing in a longer list for the show argument will set what is given in the output, as well as its order:

>>> r = interrogator(corpus, search = criteria, show = ['f', 'p', 'l'], searchmode = 'all')
>>> list(r.results.columns)[:3]

will produce column names with concatenated function, pos and lemma:

['root/vbp/think', 'root/vbg/thinking', 'root/vb/want']

Another improvement is the exclude argument, which takes the place of blacklist, function_filter and pos_filter. Alongside excludemode = 'any'/'all', it operates just like search, allowing the user to exclude results matching one or more criteria:

>>> excs = {'pos': r'^V', 'word': r'ing$'}
>>> r = interrogator(corpus, search = criteria, show = ['f', 'p', 'l'],
...     searchmode = 'all', exclude = excs, excludemode = 'all')

would remove any verbal token ending in ing. Changing excludemode to 'any' would remove all verbs and all words ending in ing.

The release has various other bugfixes, code cleanup, and some miscellaneous bits and pieces, such as a function for turning results into Pandas Multi Index DataFrames. Full API documentation is forthcoming.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New interrogation options