Skip to content

ybisk/charNMT-noise

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

charNMT-noise

Scripts and noise data for Belinkov & Bisk Synthetic and Natural Noise Both Break Neural Machine Translation ICLR 2018

MT Data

The experiments reported in the paper are conducted on the TED talks corpus prepared for IWSLT 2016, which is available on the WIT3 website.

Pretrained Models

Nematus: http://data.statmt.org/rsennrich/wmt16_systems/

char2char: https://github.com/nyu-dl/dl4mt-c2c

Sources of Natural Noise

French:

Aurlien Max and Guillaume Wisniewski. Mining Naturally-occurring Corrections and Paraphrases from Wikipedias Revision History LREC 2010 corpus

German:

Katrin Wisniewski et al. MERLIN: an online trilingual learner corpus empirically grounding the European Reference Levels in authentic learner data 2013 corpus1 corpus2

Czech:

Karel Sebesta et al. CzeSL grammatical error correction dataset (CZeSL-GEC) Tech Report LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics, Charles University 2017 corpus

About

Scripts and noise data for Belinkov & Bisk 2018

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published