Releases: hopsparser/hopsparser
Releases · hopsparser/hopsparser
v0.7.1 Bug fix
Fixed
- Update to pydantic 2 to fix breaking changes and pin it
v0.7.0 – Extra annotations prediction
Added
- HOPS now provides a custom spaCy component to use in spaCy pipelines.
- Options for using weighted multitask losses, including the adaptative strategy used in Candito
(2022). - HOPS will learn and predict token-level labels encoded in the MISC column (as
key=value
if you
give it the name of the key in theextra_annotations
part of the config. See the example
mDeBERTa-polyglot
config for
an example of such a config
v0.6.0 – Partial supervision
This release mainly adds support for training using partially annotated data (materialized in the CoNLL-U files by _
in a column), along with some improvement to tooling and bump of dependencies upper versions.
Added
hopsparser evaluate
now accepts an optional output argument, allowing to write directly to a
file if needed.- A new script to help catch performances regressions on released models.
Changed
- We now accept partially annotated CoNLL-U files as input for training: any learnable cell (UPOS,
HEAD, DEPREL) for which the value is_
will not contribute to the loss.
Full Changelog: v0.5.0...v0.6.0
v0.5.0
The performances of the contemporary models in this release are improved, most notably for models
not using BERT.
Added
- The
scripts/zenodo_upload.py
script, a helper for uploading files to a Zenodo deposit.
Changed
- The CharRNN lexer now represent words with last hidden (instead of cell) state of the LSTM and do
not run on padding anymore. - Minimal Pytorch version is now
1.9.0
- Minimal Transformers version is now
4.19.0
- Use
torch.inference_mode
instead oftoch.no_grad
over all the parser methods. - BERT lexer batches no longer have an obsolete, always zero
word_indices
attribute DependencyDataset
does not have lexicon attributes (ito(lab|tag
and their inverse) since we
don't need these anymore.- The
train_model
script now skips incomplete runs with a warning. - The
train_model
script has nicer logging, including progress bars to help keep track of the
experiments.
Fixed
- The first word in the word embeddings lexer vocabulary is not used as padding anymore and has a
real embedding. - BERT embeddings are now correctly computed with an attention mask to ignore padding.
- The root token embedding coming from BERT lexers is now an average of non-padding words'
embeddings - FastText embeddings are now computed by averaging over non-padding subwords' embeddings.
- In server mode, models are now correctly in eval mode and processing is done
intorch.inference_mode
.
Full Changelog: v0.4.2...v0.5.0
v0.4.2 — Loading fix
Fixed
- Model cross-device loading (e.g. loading on CPU a model trained on GPU) works now (#65)
Merged PRs
- Fix GPU→CPU cross-loading by @LoicGrobol in #66
Full Changelog: v0.4.1...v0.4.2
v0.4.1 — metadata bugfix
Changed
- Remove the dependency on
click_pathlib
(#63)
Fixed
- Compatibility with setuptools 61 parsing of PEP 621 specs
See the full list of changes: v0.4.0...v0.4.1
v0.4.0 — Interfaces modernisation
Highlights
- Support for modular lexers: you can now customize what lexers you use and in what number (if you want to add 5 muppets, it's your call and BERT and word-level embeddings are now fully decoupled)
- Lexer configuration and saving is now much more portable (so the models are now fully self-contained instead of phoning home to huggingface for BERT for instance), the models are also much smaller now.
- Many kinks have been ironed out all over, but in particular in
- More decoupling of the configuration saved in a model (which is internal API) and the configurations used for training (which is public), with the latter becoming more
- Support for Torch up to 1.11 and Python 3.
- Many more and better tests
- Removal of a lot of old code and renaming of many things, including a full port to the name
- We also now have new UD 2.9 models, with more to come.
v0.3.3 — Transformers version fix
Setting a stricter higher version limit on transformers for model compatibility reasons.
v0.3.2
v0.2.0 (pre-release)
This release mainly serves to host the non-BERT models while we work on stabilizing v0.1.0.