yarn

Yarn is a system for creating vectorial concept representations from an ontology containing descriptions of these concepts. These concept representations can then be used to disambiguate terms, and link them to the appropriate concept.

For more information, see the paper Using Distributed Representations to Disambiguate Biomedical and Clinical Concepts by Stéphan Tulkens, Simon Šuster and Walter Daelemans, which was presented at the BioNLP Workshop at ACL 2016.

License

MIT

Contributors

Stéphan Tulkens, Simon Suster, and Walter Daelemans. If you use this work or build upon it, please cite our paper, as follows:

@inproceedings{tulkens2016using,
  title={Using Distributed Representations to Disambiguate Biomedical and Clinical Concepts},
  author={Tulkens, St{\'e}phan and {\v{S}}uster, Simon and Daelemans, Walter},
  booktitle={Proceedings of the 15th Workshop on Biomedical Natural Language Processing},
  pages={77--82},
  year={2016}
}

Requirements

Python 3
Numpy
Reach

All are available from pip

Usage

Yarn requires:

A set of word vectors
A set of concepts, with their descriptions
A set of documents with their ambiguous terms marked

The word vectors we used can be downloaded from the BioASQ website.

If you want to replicate the original experiments, you need to adhere to the formats below. If you want to use Yarn for your own experiments, e.g. just creating concept representations, you can choose your own format.

concepts

Concepts are represented by a top-level dictionary of terms, concepts that pertain to these terms, and a list of descriptions (strings), of these concepts.

{"term":
  {"concept id_1":
    [description_1,
     description_2,
     ...
     description_n]
  },
  {"concept_id_2":
    [description_1,
     description_2,
     ...
     description_n]
  }
}

documents

Similarly, documents to be disambiguated are represented by a dictionary. Note that each document must contain at least one occurrence of the ambiguous term under which it is classified.

{"term":
  {"concept id_1":
    [document_1,
     document_2,
     ...
     document_n]
  },
  {"concept_id_2":
    [document_1,
     document_2,
     ...
     document_n]
  }
}

The original Yarn experiments were run with the MSH dataset (Jimeno-Yepes 2011) and the 2015AB release of the UMLS. Because these resources are not freely distributable, we were not able to redistribute them with this package.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
sample_data		sample_data
yarn		yarn
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
experiment_1.py		experiment_1.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sample_data

sample_data

yarn

yarn

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

experiment_1.py

experiment_1.py

requirements.txt

requirements.txt

Repository files navigation

yarn

License

Contributors

Requirements

Usage

concepts

documents

About

Releases

Packages

Languages

License

clips/yarn

Folders and files

Latest commit

History

Repository files navigation

yarn

License

Contributors

Requirements

Usage

concepts

documents

About

Topics

Resources

License

Stars

Watchers

Forks

Languages