Skip to content

v3.6.0: New span finder component and pipelines for Slovenian

Compare
Choose a tag to compare
@adrianeboyd adrianeboyd released this 07 Jul 08:32
· 178 commits to master since this release
6fc153a

✨ New features and improvements

  • NEW: span_finder pipeline component to identify overlapping, unlabeled spans (#12507).
  • Language updates:
    • Add initial support for Malay (#12602).
    • Update Latin defaults to support noun chunks, update lexical/tokenizer defaults and add example sentences (#12538).
  • Add option to return scores separately keyed by component name with spacy evaluate --per-component, Language.evaluate(per_component=True) and Scorer.score(per_component=True) (#12540).
  • Support custom token/lexeme attribute for vectors (#12625).
  • Support spancat_singlelabel in spacy debug data CLI (#12749).
  • Typing updates for PhraseMatcher and SpanGroup (#12642, #12714).

πŸ”΄ Bug fixes

  • #12569: Require that all SpanGroup spans come from the current doc.

πŸ“¦ Trained pipelines updates

We have added new pipelines for Slovenian that use the trainable lemmatizer and floret vectors.

Package UPOS Parser LAS NER F
sl_core_news_sm 96.9 82.1 62.9
sl_core_news_md 97.6 84.3 73.5
sl_core_news_lg 97.7 84.3 79.0
sl_core_news_trf 99.0 91.7 90.0
  • πŸ™ Special thanks to @orglce for help with the new pipelines!

The English pipelines have been updated to improve handling of contractions with various apostrophes and to lemmatize "get" as a passive auxiliary.

The Danish pipeline da_core_news_trf has been updated to use vesteinn/DanskBERT with performance improvements across the board.

⚠️ Backwards incompatibilities

  • SpanGroup spans are now required to be from the same doc. When initializing a SpanGroup, there is a new check to verify that all added spans refer to the current doc. Without this check, it was possible to run into string store or other errors.

πŸ“– Documentation and examples

πŸ‘₯ Contributors

@adrianeboyd, @bdura, @danieldk, @davidberenstein1957, @diyclassics, @essenmitsosse, @honnibal, @ines, @isabelizimm, @jmyerston, @kadarakos, @KennethEnevoldsen, @khursani8, @ljvmiranda921, @rmitsch, @shadeMe, @svlandeg, @tomaarsen, @victorialslocum, @vin-ivar, @ZiadAmerr