Skip to content

diegoolano/biomedical_interpretable_entity_representations

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Biomedical Interpretable Entity Representations

Biomedical Interpretable Entity Representations
Diego Garcia-Olano, Yasumasa Onoe, Ioana Baldini, Joydeep Ghosh, Byron Wallace, Kush Varshney
Findings of ACL 2021

@inproceedings{garcia-olano-etal-2021-biomedical,
    title = "Biomedical Interpretable Entity Representations",
    author = "Garcia-Olano, Diego  and
      Onoe, Yasumasa  and
      Baldini, Ioana  and
      Ghosh, Joydeep  and
      Wallace, Byron  and      
      Varshney, Kush",
    booktitle = "Findings of the 59th Annual Meeting of the Association for Computational Linguistics",
    year = "2021",
    publisher = "Association for Computational Linguistics",
}

To use pre-trained models without re-training BIERS, see colab notebooks in "Replicating downstream tasks" section at bottom.

Installing Dependencies

$ git clone https://github.com/diegoolano/biomedical_interpretable_entity_representations.git
$ virtualenv --python=~/envs/py37/bin/python biomed_env
$ source biomed_env/bin/activate
$ pip install -r requirements.txt 

How to train BioMed IER models

See ier_model/train.sh

   Make sure to: 
   - set goal to "medwiki", 
   - set training and dev sets, 
   - set paths in transformers\_constants.py appropriately, 
   - make sure to use a GPU with a lot of memory ( ie v100 has 32GB) or lower the batch size.
   - set the intervals on which you'd like to get training acc, eval acc on dev, etc
   - set log location

BIER training data and best models

BIER triples can be found [ here ]

Model files:

See prior section for how to train BIER models using training data

See Colabs below for how to load and use models on downstream tasks

Replicating downstream task results

See experiments/README.md for baselines

  • Clinical NED task using EHR dataset: Open In Colab
  • Entity Linking Classification on Cancer Genetics dataset: Open In Colab

Connecting PubMed entities to Wiki Categories through UMLS to generate training data

  • after generating (mention, context, categories) triples we then learn BIERs as follows:


BioMed IER architecture for learning biomed entity representations with interpretable components

  • after learning BIERs we can test their efficacy in a Zeroshot capacity for different biomed tasks


Zeroshot results for varying amounts of supervision

About

code and datasets associated with ACL 2021 paper "Biomedical Interpretable Entity Representations"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published