Skip to content

mukatee/little-pos-tagger

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

little-pos-tagger

Simple part of sentence tagger in Java, ported from a Python version explained and linked here.

This is something I did when trying to get a better understanding of POS taggers and build one to use myself with some Java code I have.

There area also several more mature POS taggers for Java such as OpenNLP and Stanford versions.

This is maybe a bit more simple and has some useful explanation you can follow from the Python link above. So maybe easier to get an idea of at least the basic concepts.

I originally tried this with the Finnish language and used the FinnTreeBank data to train the tagger. However, any language and similar datasets should probably work.

There is a Python script in the source tree that was used to parse the FinnTreeBank to suitable format for what this eats. There is also another Python script there that takes the same data and outputs a format suitable for OpenNLP. You can then try the different approaches if you like. And use them as a basis for some other treebanks I guess.. I couldn't quite figure out a good configuration for the Stanford tagger but it should be able to take one of the above inputs as well if you can create the config.

There is are examples in the examples package on how to train the tagger and how to use it for predictions.

The process of building this and trying to figure out what it is all about is explained in more detail here.

About

simple part of sentence tagger in java, ported from python

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published