Skip to content

Releases: inpho/vsm

v0.2-alpha

21 Oct 13:10
Compare
Choose a tag to compare

Key changes in version v0.2

  • A Cythonized version of the collapsed Gibbs sampling loop used by the LDA sequential and multiprocessing models is now used by default for much shorter training times.
  • The various methods used to quantify distances between numerical representations of semantic features of data (words, documents, topics) now default to using metric functions. In particular, distances between probability distributions are computed as the Jensen-Shannon distance; other sorts of vectors (e.g., from LSA or from BEAGLE) are compared using angular distance. vsm.spatial also includes a wrapper for any distance or similarity function found in scipy.spatial.distance.
  • Most of the plotting and clustering functionality has been migrated to an extension vsm.extension.clustering, as there are many possibilities in this direction and the core of vsm should limit itself to providing a stable source of data for these.
  • Likewise, the corpus building tools have been migrated to an extension, vsm.extension.corpusbuilders. There are many ways to build a corpus and corpus data and metadata arrives in many different forms. The core of vsm should limit itself to providing a stable target data structure for the corpus preparation stage of the workflow.
  • Importing the various classes that vsm has provides is now much simplified. In the style of numpy, import vsm or from vsm import * should drag in most of the commonly used classes and functions.

v0.1-develop

24 Jun 14:57
Compare
Choose a tag to compare
v0.1-develop Pre-release
Pre-release

The key differences of the branch from v0.1 are the following:

  • The LDA viewer sim_* functions take the similarity or distance function as a parameter. The module includes an implementation of the Jensen-Shannon divergence and this is set as the default for LDA.
  • The distance matrix methods return the Manifold object, which facilitates clustering. This branch then also requires sklearn and matplotlib to import the viewer classes.