Ancient-Modern Chinese Translation with a New Large Training Dataset

This repo contains the dataset built in the following paper:

Ancient-Modern Chinese Translation with a New Large Training Dataset. Dayiheng Liu, Kexin Yang, Qian Qu, Jiancheng Lv, TALLIP 2019 [arXiv]

Overview

We create a new large-scale Ancient-Modern Chinese parallel corpus which contains 1.24M bilingual pairs. To our best knowledge, this is the first large high-quality Ancient-Modern Chinese dataset.

Dataset

We plan to gradually release the dataset.

The dataset can be downloaded at the [link].

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Ancient-Modern Chinese Parallel Dataset Release Agreement.pdf		Ancient-Modern Chinese Parallel Dataset Release Agreement.pdf
README.md		README.md
data.png		data.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ancient-Modern Chinese Parallel Dataset Release Agreement.pdf

Ancient-Modern Chinese Parallel Dataset Release Agreement.pdf

README.md

README.md

data.png

data.png

Repository files navigation

Ancient-Modern Chinese Translation with a New Large Training Dataset

Overview

Dataset

About

Releases

Packages

dayihengliu/a2m_chineseNMT

Folders and files

Latest commit

History

Repository files navigation

Ancient-Modern Chinese Translation with a New Large Training Dataset

Overview

Dataset

About

Resources

Stars

Watchers

Forks