Joint-Retrofitting

Natural Language Processing Laboratory at National Taiwan University

Overview

The joint sense retrofitting model utilizes the contextual and ontological information to derive sense vectors. The sense embedding is learned iteratively via constraining the distance between the sense vector and its word form vector, its sense neighbors and its contextual neighbors. You can use this tool to create sense embedding vector from any trained word vector quickly. Moreover, I provide the evaluation program and four benchmark datasets that can easily test your new sense vector.

Requirements

Python3
Numpy

Data

Word vector file

A file containing a pre-trained word vector model. In word vector model, each line has a word vector as follows : the -1.0 0.1 0.2

p.s. You can download pre-trained word vector in Word2Vec or GloVe.
Lexicon file (provided in thesaurus_ontology/)

It's an ontology file that contains words and its' synonyms. Each line represents a word and all it's synonyms. The format is : <wordsense><weight> <neighbor-1><weight> <neighbor-2><weight> ...

ps. I used Thesaurus-API to parse the ontology.
Word similarity evaluation dataset (provided in eval_data/)

Program Execution

$ python joint_retrofit.py -i word_vec_file -l lexicon_file -n num_iter -o out_vec_file
-i : path of word vectors input file
-l : path of ontology file
-n : number of iterations (default : n=10)
-o : path of output file

Example :

python joint_retrofit.py -i word_vec_file -l ontology_file -n num_iter -o out_vec_file

Evaluation

$ python we_sensesim.py word_vec_file

This program will show the cosine similarity score of the word vector on each dataset. In eval_data/ directory, there are MEN, MTurk, RW, WS353 datasets. You can add more evaluation dataset to test your word vector on your own.

Reference

Pennington, J. et al. 2014. Glove: Global vectors for word representation.
Jauhar, S.K. et al. 2015. Ontologically grounded multi-sense representation learning for semantic vector space models.
M. Faruqui, J. Dodge, S.K. Jauhar, C. Dyer, E. Hovy and N.A. Smith et al. 2015. Retrofitting word vectors to semantic lexicons.

How to cite this resource

Please cite the following paper when referring to Joint in academic publications and papers.

Ting-Yu Yen, Yang-Yin Lee, Hen-Hsen Huang and Hsin-Hsi Chen (2018). “That Makes Sense: Joint Sense Retrofitting from Contextual and Ontological Information.” In Proceedings of the Web Conference 2018, poster, 23-27 April 2018, Lyon, France.

Contact

Feel free to contact me if there's any problems.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
eval_data		eval_data
thesaurus_ontology		thesaurus_ontology
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
joint_retrofit.py		joint_retrofit.py
utils.py		utils.py
we_sensesim.py		we_sensesim.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eval_data

eval_data

thesaurus_ontology

thesaurus_ontology

.gitignore

.gitignore

LICENSE.md

LICENSE.md

README.md

README.md

joint_retrofit.py

joint_retrofit.py

utils.py

utils.py

we_sensesim.py

we_sensesim.py

Repository files navigation

Joint-Retrofitting

Overview

Requirements

Data

Program Execution

Evaluation

Reference

How to cite this resource

Contact

About

Releases

Packages

Languages

License

y95847frank/Joint-Retrofitting

Folders and files

Latest commit

History

Repository files navigation

Joint-Retrofitting

Overview

Requirements

Data

Program Execution

Evaluation

Reference

How to cite this resource

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Languages