Skip to content

languagetool-org/german-pos-dict

Repository files navigation

german-pos-dict

A German part-of-speech dictionary that can be used from Java. This repo contains no code but Morfologik binary files to look up part-of-speech data. As a developer, consider using LanguageTool instead of this. If you really want to use this directly, please check out the unit tests for examples.

Also use LanguageTool to export the data in these dictionaries, as documented here.

The POS tags are documented here.

Internal

If you update the tagger dictionary, always make sure to also update the synth dictionary with the same data and vice versa. LanguageTool expects the two to be in sync.

To prepare a release (note this will only add forms, not remove them):

  • (optional) move readings from do-not-synthesize.txt to filter-archaic.txt (in the execution path of SynthDictionaryBuilder)
  • call ./download-data.sh
  • set DBUSER, DBPASS, and LT_PASS in ./data-to-dict.sh
  • call ./data-to-dict.sh
  • increase version in pom.xml
  • call mvn install
  • test it from the software that integrates it (including a regression test)

To make a release:

  • set the version in pom.xml to not include SNAPSHOT
  • rm src/main/resources/org/languagetool/resource/de/SynthDictionaryBuilder*tags.txt
  • mvn clean test
  • mvn clean deploy -P release
  • go to https://oss.sonatype.org/#stagingRepositories
  • scroll to the bottom, select latest version, and click Release
  • git tag vx.y
  • git push origin vx.y