Content-Aware Node2vec

Source code and datasets of BioNLP 2019 paper: "Embedding Biomedical Ontologies by Jointly Encoding Network Structure and Textual Node Descriptors"

Datasets

The folder "datasets" contains the edgelists of the two datasets, denoted Part-of and Is-a, used in Content-Aware Node2vec. For each dataset, exist some dictionaries in the folder data_utilities. For example for the Is-a dataset:

isa_phrase_dic.p (mapping between nodes and textual descriptors--the keys are the textual descriptors -- you must use the reversed_dic)
isa_phrase_vocab.p (the textual descriptors associated with each node)
isa_reversed_dic.p (the reversed dictionary of isa_phrase_dic.p)

Run

First run the following script to generate the train/test graphs

python3 create_dataset.py [--input path-to-edgelist] [--dataset [part_of,isa]]

Then you can run the experiments file to train

python3 experiments.py

All of the parameters can be modified from the config file, but also passed as arguments too.

Dependencies

pytorch == 1.0.1
networkx == 2.2
scikit_learn == 0.20.2

Cite

If you use the code, please cite this paper:

S. Kotitsas, D. Pappas, I. Androutsopoulos, R. McDonald and M. Apidianaki, "Embedding Biomedical Ontologies by Jointly Encoding Network Structure and Textual Node Descriptors". Proceedings of the 18th Workshop on Biomedical Natural Language Processing (BioNLP 2019) of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019), Florence, Italy, 2019.

Name		Name	Last commit message	Last commit date
Latest commit History 300 Commits
data_utilities		data_utilities
datasets/relation_instances_edgelists		datasets/relation_instances_edgelists
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
config.py		config.py
create_dataset.py		create_dataset.py
dataloader.py		dataloader.py
experiments.py		experiments.py
models.py		models.py
node2vec.py		node2vec.py
train_node2vec.py		train_node2vec.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data_utilities

data_utilities

datasets/relation_instances_edgelists

datasets/relation_instances_edgelists

.gitignore

.gitignore

LICENSE.md

LICENSE.md

README.md

README.md

config.py

config.py

create_dataset.py

create_dataset.py

dataloader.py

dataloader.py

experiments.py

experiments.py

models.py

models.py

node2vec.py

node2vec.py

train_node2vec.py

train_node2vec.py

utils.py

utils.py

Repository files navigation

Content-Aware Node2vec

Datasets

Run

Dependencies

Cite

About

Releases

Packages

Languages

License

SotirisKot/Content-Aware-Node2Vec

Folders and files

Latest commit

History

Repository files navigation

Content-Aware Node2vec

Datasets

Run

Dependencies

Cite

About

Resources

License

Stars

Watchers

Forks

Languages