NLP Contribution Graph

This repo contains data and code to solve SemEval-2021 Task 11: NLP Contribution Graph
For detailed description of our method, please see the paper "UIUC_BioNLP at SemEval-2021 Task 11: A Cascade of Neural Models for Structuring Scholarly NLP Contributions".

Dependencies

This repo requires simpletransformers/ - the customized Simple Transformers package
- With customized model for subtask 1 to incorporate additional features
- Extended from Simple Transformers version 0.51.10, compatible with common usage
- Please first install the common package by running this code:
  - pip install simpletransformers==0.51.10
    find the installation directory, and replace the simpletransformers folder with this folder

Data

training_data/ - the training data merged with the trial data, with full annotation.
interim/ - intermediate data files converted from the training data
- all_sent.csv - contains all the sentences, each with its section header, positional features, paper topic and index, BIO tags, etc.
- pos_sent.csv - a subset of all_sent.csv consisting of all the positive sentences.
- triples.csv - contains each positive sentence with the predicates and terms in it, and the corresponding triples of different types.
test_data/ - the test data, with sentence and phrase annotation released.

Scripts

pre.py - preprocess training data, report potential errors, produce all_sent.csv and pos_sent.csv
ext.py - preprocess training data, and produce triples.csv
train_sent/ - Note that all scripts in this folder require the customized Simple Transformers package.
- A binary classifier is trained for subtask 1: contribution sentence classification
- A multi-class classifier is trained to classify sentences into information units
- A filename ended with '_ens' indicates that submodels are trained for ensembling.
train_ner/ - The models are trained for subtask 2: key phrase extraction.
- In the 'specific_bio' scheme, we use specific BIO tags to indicate phrase types, and train an NER model directly.
- In the 'simple_bio' scheme, we first identify the phrases, and then classify them into predicates and terms. The script for ensembling the models are also provided.
train_rel/ - For subtask 3: triple extraction, four models are trained to extract triples of type A, B, C and D respectively.
- For type A triples, two schemes are implemented: pairwise classification and direct triple classification. Only the latter scheme is used in evaluation phases.
predict1/ - scripts for Evaluation Phase 1 (end-to-end evaluation). Run the scripts in this order:
- pre.py - test data preprocessing
- sent_binary.py - contribution sentence classification
- sent_multi.py - information unit classification
- ner.py - phrases extraction. The 'specific-bio' scheme was used in this phase.
- predict_triples.py - extraction of type A, B, C and D triples, using different models.
- submit.ipynb - output formatting for submission
predict2/ - scripts for Evaluation Phase 2 Part 1: given the contribution sentence labels, do the rest.
- The naming of scripts basically follows that in predict1/.
- A filename ended with '-ens' indicates that an ensemble of submodels is used for prediction.
- In this phase and later, we used the 'simple-bio' scheme for phrase extraction.
predict3/ - scripts for Evaluation Phase 2 Part 2: given the labels of contribution sentences and phrases, do the rest.
- We copied the result of information unit classification in predict2/. Thus after running pre.py, we directly started from phrase classification.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
interim		interim
predict1		predict1
predict2		predict2
predict3		predict3
simpletransformers		simpletransformers
test_data		test_data
train_ner		train_ner
train_rel		train_rel
train_sent		train_sent
training_data		training_data
Python.gitignore		Python.gitignore
README.md		README.md
ext.py		ext.py
parse.py		parse.py
pre.py		pre.py

Liu-Hy/nlp-contrib-graph

Folders and files

Latest commit

History

Repository files navigation

NLP Contribution Graph

Dependencies

Data

Scripts

Useful Links

About

Resources

Stars

Watchers

Forks

Languages