Skip to content

Fixed loading error for customized pipelines and added a function for converting trankit outputs to CoNLL-U format

Latest
Compare
Choose a tag to compare
@minhhdvn minhhdvn released this 19 Jun 22:52
· 24 commits to master since this release
  • The issue #17 of loading customized pipelines has been fixed in this new release. Please check it out here.
  • In this new release, trankit supports conversion of trankit outputs in json format to CoNLL-U format. The conversion is done via the new function trankit2conllu, which can be used as belows:
from trankit import Pipeline, trankit2conllu

p = Pipeline('english')

# document level
json_doc = p('''Hello! This is Trankit.''')
conllu_doc = trankit2conllu(json_doc)
print(conllu_doc)
#1       Hello   hello   INTJ    UH      _       0       root    _       _
#2       !       !       PUNCT   .       _       1       punct   _       _
#
#1       This    this    PRON    DT      Number=Sing|PronType=Dem        3       nsubj   _       _
#2       is      be      AUX     VBZ     Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin   3       cop     _       _
#3       Trankit Trankit PROPN   NNP     Number=Sing     0       root    _       _
#4       .       .       PUNCT   .       _       3       punct   _       _

# sentence level
json_sent = p('''This is Trankit.''', is_sent=True)
conllu_sent = trankit2conllu(json_sent)
print(conllu_sent)
#1       This    this    PRON    DT      Number=Sing|PronType=Dem        3       nsubj   _       _
#2       is      be      AUX     VBZ     Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin   3       cop     _       _
#3       Trankit Trankit PROPN   NNP     Number=Sing     0       root    _       _
#4       .       .       PUNCT   .       _       3       punct   _       _