Skip to content

Ontology-Based Information Extraction for Scholarly Data in Biodiversity Research

Notifications You must be signed in to change notification settings

fusion-jena/BiodivTagger

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BiodivTagger and QEMP corpus Repository

DOI

This repository contains the QEMP corpus, a metadata corpus from biodiversity research with 50 metadata files selected from 5 different repositories and biodiversity related projects as well as the BiodivTagger, a text mining pipeline that extracts biological entities.

Structure

  • Pipeline contains the text mining pipeline to annotate biological Named Entities.
  • Evaluation contains the python script to evaluate the pipeline with the gold standard and the evaluation results.
  • QEMP Corpus contains the raw metadata xml files per data repository and the gold standard in json format.
  • Ontological Issues List provides a list with missing ontological entries and ontological conflicts.

Licenses

Citation

Löffler, F., Abdelmageed, N., Babalou, S., Kaur, P., König-Ries, B.: Tag Me If You Can! Semantic Annotation of Biodiversity Metadata with the QEMP Corpus and the BiodivTagger, Language Resources and Evaluation Conference (LREC), Marseille, France, 2020