Scholarly entity usage detection

Abstract

We introduce a new method to extract named entities from scientific publications. Unlike other Named Entity Recognition tasks we extract those named entities which have actually been used in the papers, not just mentioned or proposed. We train our classification model on method and data set names and show that for both entity types equally good performance can be achieved. We show that our model can be applied to any entity type with minimal human interaction. We further create an extension to the Microsoft Academic Graph of the used entities which we use to analyze the information about used methods and data sets.

Summary of our approach

Our classification-pipeline consists of a named entity recognition using a TSE-NER approach, a usage-classificator part using SciBERT and finally an aggregation of sentence-level usage classification results to the document level.

Structure of this project

This project is divided into several submodules. A detailed description can be found in the respective module subdirectories.

SmartPub-TSE-NER: For named entity recognition, we train a CRF using TSE-NER, which is a fork of mvallet91/SmartPub-TSENER but uses SciBERT instead of word2vec embeddings.
annotation-set-extraction is used for creating the annotation data set that is used for training of our usage classificator.
annotators-agreement is used for calculating the annotator agreement of the created data set.
usage-classificator: Trains four different models for classifying whether an entity in a sentence has been used or proposed.
classification-pipeline: Applies both the TSE-NER model for named entitiy recognition as well as a trained usage classification model to a corpus of documents.
studies contains several jupyter notebooks for analysis of the results from the classification pipeline.
mag-extension contains our extensions to the Microsoft Academic Graph.

Contact

The system has been designed and implemented by Michael Färber, Alexander Albers, and Felix Schüber. Feel free to reach out to us:

Michael Färber, michael.faerber@kit.edu

How to Cite

Please cite our work as follows:

@inproceedings{Faerber2021SDU,
  author    = {Michael F{\"{a}}rber and
               Alexander Albers and 
               Felix Schüber},
  title     = "{Identifying Used Methods and Datasets in Scientific Publications}",
  booktitle = "{Proceedings of the AAAI-21 Workshop on Scientific Document Understanding (SDU'21)@AAAI'21}",
  location  = "{Virtual Event}",
  year      = {2021}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SmartPub-TSENER

SmartPub-TSENER

annotation-set-extraction

annotation-set-extraction

annotators-agreement

annotators-agreement

classification-pipeline

classification-pipeline

mag-extension

mag-extension

studies

studies

usage-classificator

usage-classificator

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Scholarly entity usage detection

Abstract

Summary of our approach

Structure of this project

Contact

How to Cite

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
SmartPub-TSENER		SmartPub-TSENER
annotation-set-extraction		annotation-set-extraction
annotators-agreement		annotators-agreement
classification-pipeline		classification-pipeline
mag-extension		mag-extension
studies		studies
usage-classificator		usage-classificator
.gitignore		.gitignore
README.md		README.md

michaelfaerber/scholarly-entity-usage-detection

Folders and files

Latest commit

History

Repository files navigation

Scholarly entity usage detection

Abstract

Summary of our approach

Structure of this project

Contact

How to Cite

About

Topics

Resources

Stars

Watchers

Forks

Languages