BioNER

This repository contains the code for BioNER, an LSTM-based model designed for biomedical named entity recognition (NER).

Download

We provide the model trained for the following datasets:

Dataset	Mirror (Siasky)	Mirror (Mega)
MedMentions full	Download Model	Download Model
MedMentions ST21pv	Download Model	Download Model
JNLPBA	Download Model	Download Model

In addition, the word embeddings trained with fastText on PubMed Baseline 2021 are provided for the following n-gram ranges:

n-gram range	Mirror (Siasky)	Mirror (Mega)	Mirror (Storj)
3-4	Download	Download	Download
3-6	Download	Download	Download

Installation

Install the dependencies.

pip install -r requirements.txt

As deterministic behaviour is enabled by default, you may need to set a debug environment variable CUBLAS_WORKSPACE_CONFIG to prevent RuntimeErrors when using CUDA.

export CUBLAS_WORKSPACE_CONFIG=:4096:8

Usage

Dataset Preprocessing

BioNER expects a dataset in the CoNLL-2003 format. We used the tool bconv for preprocessing the MedMentions dataset.

Training

You can either use the provided Makefile to train the BioNER model or execute train_bioner.py directly. Makefile: Don't forget to fill in the empty fields in the Makefile before the first start.

make train-bioner ngrams=3-4

Annotation

You can annotate a CoNLL-2003 dataset in the following way:

python annotate_dataset.py \
--embeddings \ # path to the word embeddings file 
--dataset \ # path to the CoNLL-2003 dataset
--outputFile \ # path to the output file for storing the annotated dataset
--model # path to the trained BioNER model

Furthermore, you can add the flag --enableExportCoNLL to export an additional file at the same location at the same parent folder as the outputFile, which can be used for the evaluation with the original conlleval.pl perl script (source).

Name		Name	Last commit message	Last commit date
Latest commit History 163 Commits
.github/workflows		.github/workflows
.idea		.idea
bioner		bioner
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
annotate_dataset.py		annotate_dataset.py
dataset_to_conll_file.py		dataset_to_conll_file.py
datexis.py		datexis.py
main.py		main.py
parameter_optimization.py		parameter_optimization.py
requirements.txt		requirements.txt
train_bioner.py		train_bioner.py

License

phil1995/BioNER

Folders and files

Latest commit

History

Repository files navigation

BioNER

Download

Installation

Usage

Dataset Preprocessing

Training

Annotation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages