Skip to content

Releases: estnltk/estnltk

EstNLTK 1.7.2

11 Aug 13:29
Compare
Choose a tag to compare

Release of version 1.7.2 || Installation || Changelog || Tutorials

EstNLTK 1.7.1

01 Sep 11:10
Compare
Choose a tag to compare

Release of version 1.7.1 || Installation || Changelog || Tutorials

EstNLTK 1.7.0

17 Jun 13:47
Compare
Choose a tag to compare

Release of version 1.7.0 || Installation || Changelog || Tutorials

Estnltk 1.6.2beta

16 Apr 21:09
Compare
Choose a tag to compare

[1.6.2-beta] - 2018-04-16

Changed

  • Moved command line scripts for processing etTenTen and koondkorpus from estnltk/corpus_processing to corpus_processing;
  • The command line scripts for processing etTenTen and koondkorpus were remade in a way that they both use the JSON format of the version 1.6 for storing intermediate results;
  • Restructured tutorials: basic_nlp_toolchain.ipynb was split into 7 separate tutorials and moved to tutorials/nlp_pipeline. Morphology and syntax-related tutorials were also move to tutorials/nlp_pipeline;
  • Indexing of Text and Layer objects.
  • Banned equal spans in not ambiguous layers.

Added

  • Functionality to store and query text objects in the Postgres database.
  • Tagger AddressGrammarTagger to extract address information from text.
  • Tutorial demonstrating how to extract addresses from text using AddressGrammarTagger and store results in the Postgres database (tutorials/postgres/storing_text_objects_in_postgres.ipynb).
  • Module parse_koondkorpus.py, which can be used for loading texts from XML TEI files of the Estonian Reference Corpus as EstNLTK Text objects. The module was ported from the version 1.4.1.1 and improved upon. Improvements: default encoding is now 'utf-8', and there is a working option to preserve the original sentence and paragraph tokenization from the XML files;
  • Tutorial about loading XML TEI files with EstNLTK;
  • Added more helpful scripts for processing large corpora (a script for random selection and clean-up of files);
  • Added AdjectivePhraseTagger (ported from version 1.4.1.1);
  • DisambiguatingTagger to disambiguate ambiguous layers.
  • EnvelopingSpan to replace SpanList in enveloping layers.
  • Attribute lists to hold and represent attribute values extracted from layers.

Estnltk 1.6.1beta

27 Mar 11:35
Compare
Choose a tag to compare

Changed

  • Redesigned Tagger base class. The deprecated TaggerOld is also in use so far.
  • Moved morphology-related modules from estnltk/taggers/ to estnltk/taggers/morph/;
  • Moved functions that convert between Vabamorf dicts and EstNLTK's Spans to estnltk/taggers/morph/morf_common.py;
  • Updated make_resolver: default parameters for morphological analysis are now taken from morf_common.py;
  • Updated SentenceTokenizer: base_sentence_tokenizer is now customizable (e.g. LineTokenizer can be used to split into sentences by newlines);

Added

  • Finite grammar module and GrammarParsingTagger.
  • New taggers GapTagger, EnvelopingGapTagger, PhraseTagger, SpanTagger and vocabulary reading methods for PhraseTagger and SpanTagger.
  • Added command line scripts that can be used for processing etTenTen and Koondkorpus;
  • Added JavaProcess (ported from version 1.4.1.1);
  • Added ClauseSegmenter (ported from version 1.4.1.1). Layer 'clauses' can now be added to the Text object. Note: this adds Java dependency to the EstNLTK: Java SE Runtime Environment (version >= 1.8) must be installed into the system and available from the PATH environment variable;
  • Added UserDictTagger, which can be used to provide dictionary-based post-corrections to morphological analyses;

Fixed

  • Bugfix in PostMorphAnalysisTagger: postcorrections are no longer applied to empty spans;
  • Bugfix in VabamorfTagger: layer_name can now be changed without running into errors;
  • Fix in GTMorphConverter: added the missing disambiguation step. Clause annotations are now used to resolve the ambiguities related to conversion of sid, ksid, nuksid forms;
  • SyntaxIgnoreTagger: improved detection of parenthesized acronyms;
  • CompoundTokenTagger: improved detection of numbers with percentages;

Estnltk 1.6.0beta

26 Mar 10:40
Compare
Choose a tag to compare
updated .travis.yml

Estnltk 1.4.1.1

06 Dec 18:42
Compare
Choose a tag to compare

Changed

  • Removed estner/estner.json file
  • Removed unnecessary resource /maltparser/estnltkBasedDep2.mco

Fixed

  • Fix encoding bug in event_tagger when runing tests on windows;

Estnltk 1.4.1

23 Nov 13:30
Compare
Choose a tag to compare

Added

  • Improved NER performance using __slots__ in estner data model;
  • Added sent_tokenizer_for_koond.py : a sentence tokenizer for processing 'koondkorpus' text files ( as found in http://ats.cs.ut.ee/keeletehnoloogia/estnltk/koond.zip ), which provides several post-processing fixes to known sentence-splitting problems;
  • Updated 'koondkorpus' processing scripts teicorpus.py and convert_koondkorpus.py: added the option to specify the encoding of the input files;
  • Added terminalprettyprinter.py module, which provides a pretty-printer method that can be used for graphically formatting annotated texts in terminal;
  • Added gt_conversion.py module that can be used for converting morphological analysis categories from Vabamorf's format to the Giellatekno's (gt) format;
  • Added basic support for syllable extraction
  • Added EventTagger, KeywordTagger and RegexTagger and fixed basic Tagger API for creating new layers;
  • Added adjective phrase tagger (marks fragments such as "väga hea" and "küllalt tore")

Changed

  • Updated Temporal expression tagger's and Clause segmenter's jar files to Java version 1.8;
  • A major change: re-implementation of syntactic parsing interface:
    • pre-processing scripts of the the VISLCG3-based syntactic analyser were rewritten in Python to ensure platform-independent processing;
    • "estnltk.syntax.tagger.SyntaxTagger" was reimplemented in two modules ("SyntaxPreprocessing" and "VISLCG3Pipeline"), and the modules were made available as a common pipeline in "estnltk.syntax.parsers.VISLCG3Parser";
    • added a possibility to use custom rules in VISLCG3Parser, or to load rules from a custom location;
    • updated MaltParser's model so that surface-syntactic labels are now also generated during the parsing;
    • moved MaltParser-based syntactic analysis and VISLCG3-based syntactic analysis to a common interface; both parsers are now available in the module "estnltk.syntax.parsers";
    • changed how syntactic information is stored in Text: syntactic analyses are now attached in a separate layer (and different layers are created for MaltParser's analyses and VISLCG3's analyses);
    • added "estnltk.syntax.utils.Tree", which provides an interface for making queries over a syntactic tree, and allows to export syntactic analyses as nltk's DependencyGraphs and Trees;
    • added methods for importing syntactically analysed Texts from CG3 and CONLL format files;
  • Improved NounPhraseChunker: made it compatible with the new interface of syntactic parsing;
  • Converted tutorials to jupyter notebooks to make them runnable and testable;
  • Tested and validated tutorials;

Fixed

  • Fix a bug in NER feature extraction module with python 3.4;
  • Fix in MaltParser's interface: temporary files are now maintained in system specific temp files dir (to avoid permission errors);
  • Updated Temporal expression tagger:
    • fixed a TIMEX normalization bug: verb tense information is now properly used;
    • improved TIMEX extraction: re-implemented phrase level joining to provide more accurate extraction of long phrases;
  • Fixed osx installs;
  • Updated Vabamorf to fix #55;
  • Fixed too restrictive package dependencies;

Release 1.4.0

26 Apr 14:27
Compare
Choose a tag to compare
fixed long description

Release 1.3.0

08 Jan 11:15
Compare
Choose a tag to compare
Release 1.3.0