Skip to content

Releases: hopsparser/hopsparser

v0.7.1 Bug fix

16 Nov 20:08
4efca26
Compare
Choose a tag to compare

Fixed

  • Update to pydantic 2 to fix breaking changes and pin it

v0.7.0 – Extra annotations prediction

09 Mar 15:32
99c8e3f
Compare
Choose a tag to compare

Added

  • HOPS now provides a custom spaCy component to use in spaCy pipelines.
  • Options for using weighted multitask losses, including the adaptative strategy used in Candito
    (2022).
  • HOPS will learn and predict token-level labels encoded in the MISC column (as key=value if you
    give it the name of the key in the extra_annotations part of the config. See the example
    mDeBERTa-polyglot
    config for
    an example of such a config

v0.6.0 – Partial supervision

29 Jul 11:13
9b8b784
Compare
Choose a tag to compare

This release mainly adds support for training using partially annotated data (materialized in the CoNLL-U files by _ in a column), along with some improvement to tooling and bump of dependencies upper versions.

Added

  • hopsparser evaluate now accepts an optional output argument, allowing to write directly to a
    file if needed.
  • A new script to help catch performances regressions on released models.

Changed

  • We now accept partially annotated CoNLL-U files as input for training: any learnable cell (UPOS,
    HEAD, DEPREL) for which the value is _ will not contribute to the loss.

Full Changelog: v0.5.0...v0.6.0

v0.5.0

13 May 11:27
39dc80e
Compare
Choose a tag to compare

The performances of the contemporary models in this release are improved, most notably for models
not using BERT.

Added

  • The scripts/zenodo_upload.py script, a helper for uploading files to a Zenodo deposit.

Changed

  • The CharRNN lexer now represent words with last hidden (instead of cell) state of the LSTM and do
    not run on padding anymore.
  • Minimal Pytorch version is now 1.9.0
  • Minimal Transformers version is now 4.19.0
  • Use torch.inference_mode instead of toch.no_grad over all the parser methods.
  • BERT lexer batches no longer have an obsolete, always zero word_indices attribute
  • DependencyDataset does not have lexicon attributes (ito(lab|tag and their inverse) since we
    don't need these anymore.
  • The train_model script now skips incomplete runs with a warning.
  • The train_model script has nicer logging, including progress bars to help keep track of the
    experiments.

Fixed

  • The first word in the word embeddings lexer vocabulary is not used as padding anymore and has a
    real embedding.
  • BERT embeddings are now correctly computed with an attention mask to ignore padding.
  • The root token embedding coming from BERT lexers is now an average of non-padding words'
    embeddings
  • FastText embeddings are now computed by averaging over non-padding subwords' embeddings.
  • In server mode, models are now correctly in eval mode and processing is done
    in torch.inference_mode.

Full Changelog: v0.4.2...v0.5.0

v0.4.2 — Loading fix

07 Apr 22:08
b45462b
Compare
Choose a tag to compare

Fixed

  • Model cross-device loading (e.g. loading on CPU a model trained on GPU) works now (#65)

Merged PRs

Full Changelog: v0.4.1...v0.4.2

v0.4.1 — metadata bugfix

25 Mar 20:20
102a15d
Compare
Choose a tag to compare

Changed

  • Remove the dependency on click_pathlib (#63)

Fixed

  • Compatibility with setuptools 61 parsing of PEP 621 specs

See the full list of changes: v0.4.0...v0.4.1

v0.4.0 — Interfaces modernisation

23 Mar 11:18
89b882f
Compare
Choose a tag to compare

Highlights

  • Support for modular lexers: you can now customize what lexers you use and in what number (if you want to add 5 muppets, it's your call and BERT and word-level embeddings are now fully decoupled)
  • Lexer configuration and saving is now much more portable (so the models are now fully self-contained instead of phoning home to huggingface for BERT for instance), the models are also much smaller now.
  • Many kinks have been ironed out all over, but in particular in
  • More decoupling of the configuration saved in a model (which is internal API) and the configurations used for training (which is public), with the latter becoming more
  • Support for Torch up to 1.11 and Python 3.
  • Many more and better tests
  • Removal of a lot of old code and renaming of many things, including a full port to the name
  • We also now have new UD 2.9 models, with more to come.

v0.3.3 — Transformers version fix

24 Nov 10:12
c12cb97
Compare
Choose a tag to compare

Setting a stricter higher version limit on transformers for model compatibility reasons.

v0.3.2

09 Jun 14:43
Compare
Choose a tag to compare

This is the first release to land on PyPI, and it is mostly meant for backward compatibility. It maintains most of the former API (which is nevertheless deprecated and will raise warnings), including the old name of the package npdependency.

v0.2.0 (pre-release)

01 Dec 10:33
Compare
Choose a tag to compare
v0.2.0 (pre-release) Pre-release
Pre-release

This release mainly serves to host the non-BERT models while we work on stabilizing v0.1.0.