Skip to content

BestActionNow/SemiSupBLI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Semi-Supervised Bilingual Lexicon Induction with Two-Way Message Passing Mechanisms

In this repository, We present the implementation of our two poposed semi-supervised approches CSS and PSS for BLI.

Dependencies

  • python 3.7
  • Pytorch
  • Numpy
  • Faiss

How to get the datasets

You need to download the MUSE dataset from here to the ./muse_data directory.

You need to download the VecMap dataset from here to the ./vecmap_data directory.

How to run

You can run the following command to evaluate CSS on the MUSE dataset with "5k all" annotated lexicon:

python main.py --config_file ./configs/config-CSS-muse-en-es-5kall.yaml

You can run the following command to evaluate PSS on the VecMap dataset with "5k all" annotated lexicon:

python main.py --config_file ./configs/config-PSS-vecmap-en-es-5kall.yaml

Configuration

Then we briefly discribe some important fields in the configuration file:

  • "method"" specifies the model to evaludate. "CSSBli" for CSS or "PSSBli" for PSS.
  • "src" and "tgt" indicate the source and target languages of BLI task.
  • "data_params/data_dir" specifies which dataset to use where "./muse_data/" for MUSE or "./vecmap_data/" for VevMap.
  • "supervised/max_count" indicates the size of annotated lexicon where "-1" for "5k all", "100" for "100 unique" and "5000" for "5000 unique".

Other fields specify the hyperparameters for CSS and PSS.

About

The implementation of the paper accepted by EMNLP2020

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages