Moved command line scripts for processing etTenTen and koondkorpus from estnltk/corpus_processing to corpus_processing;
The command line scripts for processing etTenTen and koondkorpus were remade in a way that they both use the JSON format of the version 1.6 for storing intermediate results;
Restructured tutorials: basic_nlp_toolchain.ipynb was split into 7 separate tutorials and moved to tutorials/nlp_pipeline. Morphology and syntax-related tutorials were also move to tutorials/nlp_pipeline;
Indexing of Text and Layer objects.
Banned equal spans in not ambiguous layers.

Added

Functionality to store and query text objects in the Postgres database.
Tagger AddressGrammarTagger to extract address information from text.
Tutorial demonstrating how to extract addresses from text using AddressGrammarTagger and store results in the Postgres database (tutorials/postgres/storing_text_objects_in_postgres.ipynb).
Module parse_koondkorpus.py, which can be used for loading texts from XML TEI files of the Estonian Reference Corpus as EstNLTK Text objects. The module was ported from the version 1.4.1.1 and improved upon. Improvements: default encoding is now 'utf-8', and there is a working option to preserve the original sentence and paragraph tokenization from the XML files;
Tutorial about loading XML TEI files with EstNLTK;
Added more helpful scripts for processing large corpora (a script for random selection and clean-up of files);
Added AdjectivePhraseTagger (ported from version 1.4.1.1);
DisambiguatingTagger to disambiguate ambiguous layers.
EnvelopingSpan to replace SpanList in enveloping layers.
Attribute lists to hold and represent attribute values extracted from layers.

Assets 2

27 Mar 11:35

AleksTk

1.6.1beta

96662f9

Estnltk 1.6.1beta

Changed

Redesigned Tagger base class. The deprecated TaggerOld is also in use so far.
Moved morphology-related modules from estnltk/taggers/ to estnltk/taggers/morph/;
Moved functions that convert between Vabamorf dicts and EstNLTK's Spans to estnltk/taggers/morph/morf_common.py;
Updated make_resolver: default parameters for morphological analysis are now taken from morf_common.py;
Updated SentenceTokenizer: base_sentence_tokenizer is now customizable (e.g. LineTokenizer can be used to split into sentences by newlines);

Added

Finite grammar module and GrammarParsingTagger.
New taggers GapTagger, EnvelopingGapTagger, PhraseTagger, SpanTagger and vocabulary reading methods for PhraseTagger and SpanTagger.
Added command line scripts that can be used for processing etTenTen and Koondkorpus;
Added JavaProcess (ported from version 1.4.1.1);
Added ClauseSegmenter (ported from version 1.4.1.1). Layer 'clauses' can now be added to the Text object. Note: this adds Java dependency to the EstNLTK: Java SE Runtime Environment (version >= 1.8) must be installed into the system and available from the PATH environment variable;
Added UserDictTagger, which can be used to provide dictionary-based post-corrections to morphological analyses;

Fixed

Bugfix in PostMorphAnalysisTagger: postcorrections are no longer applied to empty spans;
Bugfix in VabamorfTagger: layer_name can now be changed without running into errors;
Fix in GTMorphConverter: added the missing disambiguation step. Clause annotations are now used to resolve the ambiguities related to conversion of sid, ksid, nuksid forms;
SyntaxIgnoreTagger: improved detection of parenthesized acronyms;
CompoundTokenTagger: improved detection of numbers with percentages;

Assets 2

26 Mar 10:40

AleksTk

1.6.0beta

5dc0c27

Estnltk 1.6.0beta

updated .travis.yml

Assets 2

06 Dec 18:42

AleksTk

1.4.1.1

eef59a7

Estnltk 1.4.1.1

Changed

Removed estner/estner.json file
Removed unnecessary resource /maltparser/estnltkBasedDep2.mco

Fixed

Fix encoding bug in event_tagger when runing tests on windows;

Assets 2

23 Nov 13:30

urdvr

1.4.1

8cbb90c

Estnltk 1.4.1

Added

Improved NER performance using __slots__ in estner data model;
Added sent_tokenizer_for_koond.py : a sentence tokenizer for processing 'koondkorpus' text files ( as found in http://ats.cs.ut.ee/keeletehnoloogia/estnltk/koond.zip ), which provides several post-processing fixes to known sentence-splitting problems;
Updated 'koondkorpus' processing scripts teicorpus.py and convert_koondkorpus.py: added the option to specify the encoding of the input files;
Added terminalprettyprinter.py module, which provides a pretty-printer method that can be used for graphically formatting annotated texts in terminal;
Added gt_conversion.py module that can be used for converting morphological analysis categories from Vabamorf's format to the Giellatekno's (gt) format;
Added basic support for syllable extraction
Added EventTagger, KeywordTagger and RegexTagger and fixed basic Tagger API for creating new layers;
Added adjective phrase tagger (marks fragments such as "väga hea" and "küllalt tore")

Changed

Updated Temporal expression tagger's and Clause segmenter's jar files to Java version 1.8;
A major change: re-implementation of syntactic parsing interface:
- pre-processing scripts of the the VISLCG3-based syntactic analyser were rewritten in Python to ensure platform-independent processing;
- "estnltk.syntax.tagger.SyntaxTagger" was reimplemented in two modules ("SyntaxPreprocessing" and "VISLCG3Pipeline"), and the modules were made available as a common pipeline in "estnltk.syntax.parsers.VISLCG3Parser";
- added a possibility to use custom rules in VISLCG3Parser, or to load rules from a custom location;
- updated MaltParser's model so that surface-syntactic labels are now also generated during the parsing;
- moved MaltParser-based syntactic analysis and VISLCG3-based syntactic analysis to a common interface; both parsers are now available in the module "estnltk.syntax.parsers";
- changed how syntactic information is stored in Text: syntactic analyses are now attached in a separate layer (and different layers are created for MaltParser's analyses and VISLCG3's analyses);
- added "estnltk.syntax.utils.Tree", which provides an interface for making queries over a syntactic tree, and allows to export syntactic analyses as nltk's DependencyGraphs and Trees;
- added methods for importing syntactically analysed Texts from CG3 and CONLL format files;
Improved NounPhraseChunker: made it compatible with the new interface of syntactic parsing;
Converted tutorials to jupyter notebooks to make them runnable and testable;
Tested and validated tutorials;

Fixed

Fix a bug in NER feature extraction module with python 3.4;
Fix in MaltParser's interface: temporary files are now maintained in system specific temp files dir (to avoid permission errors);
Updated Temporal expression tagger:
- fixed a TIMEX normalization bug: verb tense information is now properly used;
- improved TIMEX extraction: re-implemented phrase level joining to provide more accurate extraction of long phrases;
Fixed osx installs;
Updated Vabamorf to fix #55;
Fixed too restrictive package dependencies;

Assets 2

26 Apr 14:27

AleksTk

1.4.0

0e9d0b8

Release 1.4.0

fixed long description

Assets 6

08 Jan 11:15

urdvr

1.3.0

d0957cc

Release 1.3.0

Release 1.3.0

Assets 6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1.6.2-beta] - 2018-04-16

Changed

Added

Changed

Added

Fixed

Changed

Fixed

Added

Changed

Fixed

Releases: estnltk/estnltk

EstNLTK 1.7.2

EstNLTK 1.7.1

EstNLTK 1.7.0

Estnltk 1.6.2beta

[1.6.2-beta] - 2018-04-16

Changed

Added

Estnltk 1.6.1beta

Changed

Added

Fixed

Estnltk 1.6.0beta

Estnltk 1.4.1.1

Changed

Fixed

Estnltk 1.4.1

Added

Changed

Fixed

Release 1.4.0

Release 1.3.0