GitHub - NJUNLP/FGraDA: A Fine-grained Domain Adaptation Dataset

FGraDA

Resources and code for our paper "FGraDA: A Dataset and Benchmark for Fine-Grained Domain Adaptation in Machine Translation". This project implements several baselines used in our paper. The implementation is build upon NJUNMT. Please cite our paper if you find this repository helpful in your research:

@article{zhu2021fgrada,
  title={FDMT: A Benchmark Dataset for Fine-grained Domain Adaptation in Machine Translation},
  author={Zhu, Wenhao and Huang, Shujian and Pu, Tong and Huang, Pingxuan and Zhang, Xu and Yu, Jian and Chen, Wei and Wang, Yanfeng and Chen, Jiajun},
  journal={arXiv preprint arXiv:2012.15717},
  year={2021}
}

Requirements

python==3.8.10
pytorch==1.6.0
PyYAML==5.4.1
tensorboardX=2.4.0
sacrebleu==2.0.0

Instructions

We use an example to show how to run our codes.

Data

For convenience, We provide both raw data and pre-processed data of FGraDA, which can be download here.

Train Base Model

bash ../run_scripts/train.sh

Finetune Model on Parallel Data

bash ../run_scrpts/finetune.sh

Inference with Grid Beam Search

To prepare for grid beam search, you need to run ./scripts/build_constraint.py to generate the json file before runing the following script.

bash ../run_scrpts/translate_beam_search.sh

We recommend you to use below weight hyper-parameter to replicate results of Dict_GBS and Wiki_BT+Dict_GBS.

Model	AV	AIE	RTN	SP
Dict_GBS	0.3	0.35	0.15	0.35
Wiki_BT+Dict_GBS	0.4	0.25	0.05	0.35

Inference with Beam search

bash ../run_scrpts/translate_grid_beam_search.sh

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
configs		configs
docs		docs
run_scripts		run_scripts
scripts		scripts
src		src
unittests		unittests
BENCHMARK.md		BENCHMARK.md
LICENSE		LICENSE
README.md		README.md
changelog.md		changelog.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

configs

configs

docs

docs

run_scripts

run_scripts

scripts

scripts

src

src

unittests

unittests

BENCHMARK.md

BENCHMARK.md

LICENSE

LICENSE

README.md

README.md

changelog.md

changelog.md

requirements.txt

requirements.txt

Repository files navigation

FGraDA

Requirements

Instructions

Data

Train Base Model

Finetune Model on Parallel Data

Inference with Grid Beam Search

Inference with Beam search

About

Releases

Packages

Contributors 2

Languages

License

NJUNLP/FGraDA

Folders and files

Latest commit

History

Repository files navigation

FGraDA

Requirements

Instructions

Data

Train Base Model

Finetune Model on Parallel Data

Inference with Grid Beam Search

Inference with Beam search

About

Resources

License

Stars

Watchers

Forks

Languages