Skip to content

Major release

Compare
Choose a tag to compare
@interrogator interrogator released this 20 Feb 23:49
· 894 commits to master since this release

In this major release, stability and performance have been improved in dozens of ways:

  • Python 2/3 compatibility
  • Smart multiprocessing
  • Useful documentation, ReadTheDocs site generation
  • Much smaller repository size
  • Compatible with multiple versions of CoreNLP
  • Increased object orientation generally
  • Nose tests
  • Travis CI integration
  • Faster save/load via cPickle
  • Countless bugfixes

Levels of abstraction have been added beyond Corpus (Corpora) and Interrogation (Interrodict), with useful methods attached to each. Interrogation and concordancing have become two sides of the same coin, rather than separate tasks, helping to build computational workflows that investigate functional linguistic notions of probabilistic grammar and lexis as delicate grammar.

One emerging part of corpkit is the configurations() method, which automatically analyses the behaviour of a lexical item or items in the corpus. This will be very useful in automated workflows that seek to identify key participants and processes, and then to generate an overview of how each behaves. A little more work is still needed here, however. Also on the horizon are multilingual support and the use of spaCy ... but perhaps some of this needs to wait until I've made peace with my thesis.