Skip to content

Releases: huspacy/huspacy

huspacy-v0.9.0

23 May 11:17
Compare
Choose a tag to compare

Changed

  • Added support for new models (hu_core_news_md-v3.5.2, hu_core_news_lg-v3.5.2, hu_core_news_trf_xl-v3.5.2, hu_core_news_trf_xl-v3.5.2)
  • Updated documentation with benepar usage and the noun chunking

huspacy-v0.8.1

24 Mar 07:57
Compare
Choose a tag to compare

Fixed

  • Replace bogus transformer model versions with fixed ones (hu_core_news_trf_xl-v3.5.1, hu_core_news_trf_xl-v3.5.1)

huspacy-v0.8.0

23 Mar 22:51
Compare
Choose a tag to compare

Fixed

New

  • Added support for new models (hu_core_news_md-v3.5.1, hu_core_news_lg-v3.5.1, hu_core_news_trf_xl-v3.5.0, hu_core_news_trf_xl-v3.5.1)

huspacy-v0.7.0

08 Feb 10:09
Compare
Choose a tag to compare

New

  • Added support for new models (hu_core_news_md-v3.5.0, hu_core_news_lg-v3.5.0, hu_core_news_trf_xl-v3.4.0)
  • Updated documentation

huspacy-v0.6.0

11 Nov 09:17
Compare
Choose a tag to compare

New

  • Added a lookup component for sentiment lexicons
  • Added integration for novakat's onpp NER model (nerpp)
  • Added support for new models (hu_core_news_trf-v3.4.0, hu_core_news_md-v3.4.2, hu_core_news_lg-v3.4.4)

huspacy-v0.5.1

27 Oct 07:53
Compare
Choose a tag to compare
Bump version: 0.5.0 → 0.5.1

hu_core_ud_lg-0.3.1

04 Oct 20:18
9cebbb5
Compare
Choose a tag to compare

Hungarian multi-task CNN trained on Universal Dependencies data. Assigns context-specific token vectors, Brown cluster IDs, word probabilities, POS tags, dependency parse, named entity tags and lemmata.

Feature Description
Name hu_core_ud_lg
Version 0.3.1
spaCy >=2.2.1
Model size 1360 MB
Pipeline tokenizer, sentencizer, tagger, parser, lemmatizer, ner
Vectors 1140008 unique vectors (300 dimensions)
Sources Universal Dependencies, Szeged Corpus, Web Corpus, Wikipedia, Hunnerwiki, Szeged NER corpora
License CC BY-NC-SA 4.0

Pipeline details

  Vectors Tokenizer Sentencizer Tagger Parser Lemmatizer NER
Model Word2Vec CBOW dim=300 minfreq=10 Rule-based implemented in SpaCy Rule-based Multi-task CNN Multi-task CNN Lemmy (CST-like) CNN
Training data Wikipedia dump (2017-04-21)) and the Hungarian Webcorpus - - CONLL'17 training data CONLL'17 training data UD converted Szeged Korpusz Hunnerwiki, Szeged NER Business & Criminal
Test data Hungarian analogical questions CONLL'17 test data CONLL'17 test data CONLL'17 test data CONLL'17 test data CONLL'17 test data Szeged NER Business & Criminal
Accuracy ACC 20.95 F1 99.89 F1 96.97 ACC 94.81 UAS 76.18 LAS 66.58 ACC 95.51 F1 93.95

hu_core_ud_lg-0.3.0

26 Sep 21:36
Compare
Choose a tag to compare

Hungarian multi-task CNN trained on Universal Dependencies data. Assigns context-specific token vectors, Brown cluster IDs, word probabilities, POS tags, dependency parse, named entity tags and lemmata.

Feature Description
Name hu_core_ud_lg
Version 0.3.0
spaCy >=2.1.8
Model size 1360 MB
Pipeline tokenizer, sentencizer, tagger, parser, lemmatizer, ner
Vectors 1140008 unique vectors (300 dimensions)
Sources Universal Dependencies, Szeged Corpus, Web Corpus, Wikipedia, Hunnerwiki, Szeged NER corpora
License CC BY-NC-SA 4.0

Pipeline details

  Vectors Tokenizer Sentencizer Tagger Parser Lemmatizer NER
Model Word2Vec CBOW dim=300 minfreq=10 Rule-based implemented in SpaCy Rule-based Multi-task CNN Multi-task CNN Lemmy (CST-like) CNN
Training data Wikipedia dump (2017-04-21)) and the Hungarian Webcorpus - - CONLL'17 training data CONLL'17 training data UD converted Szeged Korpusz Hunnerwiki, Szeged NER Business & Criminal
Test data Hungarian analogical questions CONLL'17 test data CONLL'17 test data CONLL'17 test data CONLL'17 test data CONLL'17 test data Szeged NER Business & Criminal
Accuracy ACC 20.95 F1 99.89 F1 96.97 ACC 94.91 UAS 75.73 LAS 66.16 ACC 95.49 F1 93.95

hu_core_ud_lg-0.2.0

01 Jun 21:47
Compare
Choose a tag to compare
hu_core_ud_lg-0.2.0 Pre-release
Pre-release

Hungarian multi-task CNN trained on Universal Dependencies data. Assigns context-specific token vectors, Brown cluster IDs, word probabilities, POS tags, dependency parse and lemmata.

Feature Description
Name hu_core_ud_lg
Version 0.2.0
spaCy >=2.1.0
Model size 1360 MB
Pipeline tokenizer, sentencizer, tagger, parser, lemmatizer
Vectors 1140008 unique vectors (300 dimensions)
Sources Universal Dependencies, Szeged Corpus, Web Corpus, Wikipedia
License CC BY-NC-SA 4.0

Pipeline details

  Vectors Tokenizer Sentencizer Tagger Parser Lemmatizer
Model Word2Vec CBOW dim=300 minfreq=10 Rule-based implemented in SpaCy Rule-based Multi-task CNN multi-task CNN Lemmy (CST-like)
Training data Wikipedia dump (2017-04-21)) and the Hungarian Webcorpus - - CONLL'17 training data CONLL'17 training data UD converted Szeged Korpusz
Test data Hungarian analogical questions CONLL'17 test data CONLL'17 test data CONLL'17 test data CONLL'17 test data CONLL'17 test data
Accuracy ACC 20.95 F1 99.89 F1 96.97 ACC 94.82 UAS 78.02 LAS 67.92 ACC 95.60

hu_core_ud_lg-0.1.0

04 Jan 23:11
Compare
Choose a tag to compare
hu_core_ud_lg-0.1.0 Pre-release
Pre-release

Hungarian multi-task CNN trained on Universal Dependencies data. Assigns context-specific token vectors, Brown cluster IDs, word probabilities, POS tags, dependency parse and lemmata.

Feature Description
Name hu_core_ud_lg
Version 0.1.0
spaCy >=2.0.0
Model size 1350 MB
Pipeline tokenizer, sentencizer, tagger, parser, lemmatizer
Vectors 1140008 unique vectors (300 dimensions)
Sources Universal Dependencies, Szeged Corpus, Web Corpus, Wikipedia
License CC BY-NC-SA 4.0

Pipeline details

  Vectors Tokenizer Sentencizer Tagger Parser Lemmatizer
Model Word2Vec CBOW dim=300 minfreq=10 Rule-based implemented in SpaCy Rule-based Multi-task CNN multi-task CNN Lemmy (CST-like)
Training data Wikipedia dump (2017-04-21)) and the Hungarian Webcorpus - - CONLL'17 training data CONLL'17 training data UD converted Szeged Korpusz
Test data Hungarian analogical questions CONLL'17 test data CONLL'17 test data CONLL'17 test data CONLL'17 test data CONLL'17 test data
Accuracy ACC 20.95 F1 99.88 F1 96.64 ACC 95.11 UAS 77.52 LAS 68.45 ACC 95.60