Skip to content

TurkuNLP/turku-ner-corpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Turku NER corpus

An open, broad-coverage corpus for Finnish named entity recognition presented in Luoma et al. (2020) A Broad-coverage Corpus for Finnish Named Entity Recognition (PDF).

Example of Turku NER corpus annotation

Releases

Version 1.0:

Recommended. This is the first complete, stable release of the corpus and the version used in our experiments with the data.

Quickstart

A version of the corpus data is found in CoNLL-like format in the following files:

  • data/conll/train.tsv: training data
  • data/conll/dev.tsv: development data
  • data/conll/test.tsv: test data

These files are in a simple two-column tab-separated format with IOB2 tags:

Turun       B-ORG
yliopiston  I-ORG
entinen     O
kansleri    O
Eero        B-PER
Vuorio      I-PER
on          O
palkittu    O

The corpus annotation marks mentions of person (PER), organization (ORG), location (LOC), product (PRO) and event (EVENT) names as well as dates (DATE).

Most NER taggers can be straightforwardly trained and evaluated with this data. In our experiments with the corpus, the best performance was achieved with the keras-bert-ner tagger using the FinBERT model.

Guidelines

The Turku NER corpus annotation guidelines are available in PDF format.

Reference

https://www.aclweb.org/anthology/2020.lrec-1.567.pdf:

@inproceedings{luoma-etal-2020-broad,
    title = "A Broad-coverage Corpus for {F}innish Named Entity Recognition",
    author = {Luoma, Jouni and Oinonen, Miika and Pyyk{\"o}nen, Maria and Laippala, Veronika and Pyysalo, Sampo},
    booktitle = "Proceedings of The 12th Language Resources and Evaluation Conference",
    year = "2020",
    url = "https://www.aclweb.org/anthology/2020.lrec-1.567",
    pages = "4615--4624",
}

About

Open broad-coverage corpus for Finnish named entity recognition.

Resources

License

Stars

Watchers

Forks

Packages

No packages published