Exploiting the Global WordNet Graph in Neural Word Sense Disambiguation by Integrating Personalized PageRank
This repo hosts the code necessary to reproduce the results of our EMNLP 2021 paper, Exploiting the Global WordNet Graph in Neural Word Sense Disambiguation by Integrating Personalized PageRank, by Ahmed ElSheikh, Michele Bevilacqua and Roberto Navigli, which you can read on EMNLP Anthology.
This repository relies on the Simone's CODE
.
@inproceedings{elsheikh-bevilacqua-navigli-2020-breaking,
title = "Exploiting the Global WordNet Graph in Neural Word Sense Disambiguation by Integrating Personalized PageRank",
author = "ElSheikh, Ahmed and Bevilacqua, Michele and Navigli, Roberto",
year = "2021",
address = "Online",
publisher = "Emperical Method for Natural Language Processing",
}
Neural Word Sense Disambiguation (WSD) has recently been shown to benefit from the incorporation of pre-existing knowledge, such as that coming from the WordNet graph. However, state-of-the-art approaches have been successful in exploiting only the local structure of the graph, with only close neighbors of a given synset influencing the prediction. In this work, we improve a classification model by recomputing logits as a function of both the vanilla independently produced logits and the global WordNet graph. We achieve this by incorporating an online neural approximated PageRank, which enables us to refine edge weights as well. This method allows us to exploit the global graph structure while keeping space requirements linear in the number of edges. We obtain strong improvements, matching the current state of the art
-
make sure to have miniconda installed. if not, install it
-
It is recommended to create a fresh
conda
env to use the repo- conda create -n ewiser_ext python=3.6.9 pip - conda activate ewiser_ext - git clone github.com/elsheikh21/nlp_thesis.git - pip install -r requirements.txt - pip install torch==1.5.0+cu101 torchvision==0.6.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html - pip install torch-sparse torch-scatter -f https://pytorch-geometric.com/whl/torch-1.5.0+cu101.html
-
if it needs
APEX
to be installedgit clone https://github.com/NVIDIA/apex cd apex pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
- SemCor
- SemCor + untagged glosses
- SemCor + tagged glosses + WordNet Examples
- WSD Evaluation Framework: contains the SemCor training corpus, along with the evaluation datasets from Senseval and SemEval.
Pre-preprocessed SensEmBERT+LMMS embeddings is needed to train your model:
-
vim the
predict_eval_script.sh
to add your<ckpt_dir>/<best_mdl_path>
-
then run the following
cd wsd_thesis sh predict_eval_script.sh # OR nohup sh predict_eval_script.sh > eval_script.out # to log the results
-
All flags related to training & model params can be found in
train.py
&wsd/models/model.py
-
Run the following script
cd yat_thesis sh train.sh
-
or to make it run in background
nohup sh train.sh > experiment_name.out &
This project is released under the CC-BY-NC 4.0 license (see LICENSE.txt
). If you use EWISER, please put a link to this repo.
The authors gratefully acknowledge the support of the ERC Consolidator Grant MOUSSE No. 726487 under the European Union's Horizon 2020 research and innovation programme.
This work was supported in part by the MIUR under the grant "Dipartimenti di eccellenza 2018-2022" of the Department of Computer Science of the Sapienza University of Rome.