Skip to content

A robust and easy-to-use toolkit for POS (Part of Speech; NLP) tagging. It's approach is to automatically construct tagging rules in the form of a binary tree. Supports pre-trained UPOS, XPOS tagging models for about 80 languages.

License

swelcker/cmd.csp.postagger

Repository files navigation

csplogo

cmd.csp.postagger

License: MIT Maintenance GitHub release GitHub tag GitHub commits GitHub contributors

A robust and easy-to-use toolkit for POS (Part of Speech; NLP) tagging. It's approach is to automatically construct tagging rules in the form of a binary tree. Supports pre-trained UPOS, XPOS tagging models for about 80 languages. See folder Models for more details. Used in the Cognitive Service Platform cmd.csp.

Prerequisites

There are no prerequisites or dependencies others than java core

Installing/Usage

To use, merge the following into your Maven POM (or the equivalent into your Gradle build script):

<repository>
  <id>github</id>
  <name>GitHub swelcker Apache Maven Packages</name>
  <url>https://maven.pkg.github.com/swelcker</url>
</repository>

<dependency>
  <groupId>cmd.csp</groupId>
  <artifactId>csppostagger</artifactId>
  <version>1.0.0</version>
</dependency>

Then, import cmd.csp.postagger.*;` in your application :

// Example
import csppostagger.*;

	private CSPPOSTagger posTagger = new CSPPOSTagger();
	private HashMap<String, String> FREQDICT=null;

  // init tree from rules file
  posTagger.constructTreeFromLanguage(senLanguage);
  // init FREQDICT
  FREQDICT = utl.getDictionaryByLanguage(senLanguage);
...

          ... = posTagger.tagSentence(FREQDICT, 'your string or sentence");

``` or
  wordtags = CSPPOSInitialTagger.InitTagger4Sentence(FREQDICT, sen);

  int size = wordtags.size();
  wt = new String[size];
  for (int ti = 0; ti < size; ti++) {
    tokenizer.BagOfTags.put(wordtags.get(ti).word, tokenizer.BagOfTags.getInteger(wordtags.get(ti).word, 0)+1);
    CSPPOSFWObject object = Utils.getObject(wordtags, size, ti);
    CSPPOSNode firedNode = posTagger.findFiredNode(object);
    maptags.put(wordtags.get(ti).word, firedNode.conclusion);
    tokenizer.WordTagList.put(wordtags.get(ti).word, firedNode.conclusion);
    wt[ti]=wordtags.get(ti).word;
  }

Built With

  • Maven - Dependency Management

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

Versioning

We use SemVer for versioning. For the versions available, see the tags on this repository.

Authors

  • Stefan Welcker - Modifications based on RDRPOSTagger

See also the list of contributors who participated in this project.

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Find more information about RDRPOSTagger at: http://rdrpostagger.sourceforge.net/

The general architecture and results of the original RDRPOSTagger can be found in the following papers:

About

A robust and easy-to-use toolkit for POS (Part of Speech; NLP) tagging. It's approach is to automatically construct tagging rules in the form of a binary tree. Supports pre-trained UPOS, XPOS tagging models for about 80 languages.

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages