Skip to content

NJUNLP/FGraDA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FGraDA


Resources and code for our paper "FGraDA: A Dataset and Benchmark for Fine-Grained Domain Adaptation in Machine Translation". This project implements several baselines used in our paper. The implementation is build upon NJUNMT. Please cite our paper if you find this repository helpful in your research:

@article{zhu2021fgrada,
  title={FDMT: A Benchmark Dataset for Fine-grained Domain Adaptation in Machine Translation},
  author={Zhu, Wenhao and Huang, Shujian and Pu, Tong and Huang, Pingxuan and Zhang, Xu and Yu, Jian and Chen, Wei and Wang, Yanfeng and Chen, Jiajun},
  journal={arXiv preprint arXiv:2012.15717},
  year={2021}
}

Requirements

  • python==3.8.10
  • pytorch==1.6.0
  • PyYAML==5.4.1
  • tensorboardX=2.4.0
  • sacrebleu==2.0.0

Instructions

We use an example to show how to run our codes.

Data

For convenience, We provide both raw data and pre-processed data of FGraDA, which can be download here.

Train Base Model

bash ../run_scripts/train.sh

Finetune Model on Parallel Data

bash ../run_scrpts/finetune.sh

Inference with Grid Beam Search

To prepare for grid beam search, you need to run ./scripts/build_constraint.py to generate the json file before runing the following script.

bash ../run_scrpts/translate_beam_search.sh

We recommend you to use below weight hyper-parameter to replicate results of DictGBS and WikiBT+DictGBS.

Model AV AIE RTN SP
DictGBS 0.3 0.35 0.15 0.35
WikiBT+DictGBS 0.4 0.25 0.05 0.35

Inference with Beam search

bash ../run_scrpts/translate_grid_beam_search.sh