Skip to content

Package for constructing paths of embeddings obtained from transformers.

License

Notifications You must be signed in to change notification settings

datasig-ac-uk/nlpsig

Repository files navigation

nlpsig

Actions Status Codecov Status Documentation Status PyPI version PyPI platforms

NLPSig (nlpsig) is a Python package for constructing streams/paths of embeddings obtained from transformers. The key contributions are:

  • A simple API for taking streams of textual data and constructing streams of embeddings from transformers
  • Simple API for performing dimensionality reduction with nlpsig.DimReduce on the embeddings obtained from transformers by some simple wrappers over popular dimensionality reduction algorithms such as PCA, UMAP, t-SNE, etc.
    • This is particularly useful if we wish to use path signatures in any downstream model since the dimensionality of the embeddings obtained from transformers is usually very high.
    • We present some Signature Network models for longitudinal NLP tasks in the sig-networks library which uses these paths constructed in this library as inputs to neural networks which utilise path signature methodology.
  • We also have simple classes for constructing train/test splits of the data and for K-fold cross-validation in which are general and are applied to examples in the Signature Networks in the sig-networks library.

NLPSig is used by the sig-networks as detailed in our EACL demo paper Sig-Networks Toolkit: Signature Networks for Longitudinal Language Modelling.

Installation

NLPSig is available on PyPI and can be installed with pip:

pip install nlpsig

Contributing

To take advantage of pre-commit, which will automatically format your code and run some basic checks before you commit:

pip install pre-commit  # or brew install pre-commit on macOS
pre-commit install  # will install a pre-commit hook into the git repo

After doing this, each time you commit, some linters will be applied to format the codebase. You can also/alternatively run pre-commit run --all-files to run the checks.

See CONTRIBUTING.md for more information on running the test suite using nox.