Skip to content

End-to-end integration of HuggingFace's models for sequence labeling.

License

Notifications You must be signed in to change notification settings

avramandrei/BERT-Sequence-Labeling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BERT-Sequence-Labeling

This repostiory integrates HuggingFaces's models in an end-to-end pipeline for sequence labeling. Here is a complete list of the available models.

If you found this repository helpful, please give it a star.:blush:

Install

git clone https://github.com/avramandrei/BERT-Sequence-Labeling.git
cd BERT-Sequence-Labeling
pip3 install -r requirements.txt

Input Format

The files used for training, validation and testing must be in a format similar to the CoNLL:

# sent_id = email-enronsent20_01-0048
# text = Please let us know if you have additional questions.
1	Please	please	INTJ	UH	_	2	discourse	2:discourse	_
2	let	let	VERB	VB	Mood=Imp|VerbForm=Fin	0	root	0:root	_
3	us	we	PRON	PRP	Case=Acc|Number=Plur|Person=1|PronType=Prs	2	obj	2:obj|4:nsubj:xsubj	_
4	know	know	VERB	VB	VerbForm=Inf	2	xcomp	2:xcomp	_
5	if	if	SCONJ	IN	_	7	mark	7:mark	_
6	you	you	PRON	PRP	Case=Nom|Person=2|PronType=Prs	7	nsubj	7:nsubj	_
7	have	have	VERB	VBP	Mood=Ind|Tense=Pres|VerbForm=Fin	4	advcl	4:advcl:if	_
8	additional	additional	ADJ	JJ	Degree=Pos	9	amod	9:amod	_
9	questions	question	NOUN	NNS	Number=Plur	7	obj	7:obj	SpaceAfter=No
10	.	.	PUNCT	.	_	2	punct	2:punct	_

Training

To train a model, use the train.py script. This will start training a model that will predict the labels of the column specified by the [predict_column] argument.

python3 train.py [path_train_file] [path_dev_file] [tokens_column] [predict_column] [lang_model_name]

Inference

To predict new values, use the predict.py script. This will create a new file by replacing the predicted column of the test file with the predicted values.

python3 predict.py [path_test_file] [model_path] [tokens_column] [predict_column] [lang_model_name]

Results

English EWT

model upos xpos
bert-base-cased 95.92 95.27
roberta-base 95.77 95.18

Cite

Please consider citing the following paper as a thank you to the authors:

@article{avram2020upb,
  title={UPB at SemEval-2020 Task 6: Pretrained Language Models for Definition Extraction},
  author={Avram, Andrei-Marius and Cercel, Dumitru-Clementin and Chiru, Costin-Gabriel},
  journal={arXiv e-prints},
  pages={arXiv--2009},
  year={2020}
}


About

End-to-end integration of HuggingFace's models for sequence labeling.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages