Partially-Aligned Data-to-Text Generation with Distant Supervision

This is the code for the EMNLP 2020 paper "Partially-Aligned Data-to-Text Generation with Distant Supervision". Traditional text generation task requires well-aligned data which is expensive to annotate. We relax the strict restrictions and propose this new task aiming at utilizing automatically made partially-aligned data. This method considerably expands the application domains where only automatically partially-aligned data is available.

Requirements

GCC >= 4.8
Python >= 3.7

Install

git clone https://github.com/fuzihaofzh/distant_supervision_nlg.git
cd distant_supervision_nlg
./scripts/setup.sh

Preprocess Data

./scripts/preprocess.sh wita50k

Train Baseline Model (Optional)

The model will be evaluated automatically during training.

# Train S2ST model
./scripts/train.sh wita50k base
# Check Score
tail -n 1 output/eval/wita50k__base/eval.100.txt

Train Our DSG Model

The model will be evaluated automatically during training.

# Step 1. SE Training
./scripts/train.sh wita50k endorsement,pretrain
# Step 2. S2SG Training
./scripts/train.sh wita50k endorsement,beam_endorse
# Check Score
tail -n 1 output/eval/wita50k__endorsement,beam_endorse/eval.100.txt

Cite

@inproceedings{fu2020partially,
  title={Partially-Aligned Data-to-Text Generation with Distant Supervision},
  author={Fu, Zihao and Shi, Bei and Lam, Wai and Bing, Lidong and Liu, Zhiyuan},
  booktitle={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  pages={9183--9193},
  year={2020}
}

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
data/wita50k		data/wita50k
e2e-metrics		e2e-metrics
fairseq		fairseq
output/preprocessed/wita50k		output/preprocessed/wita50k
scripts		scripts
tools/fastBPE		tools/fastBPE
.gitignore		.gitignore
README.md		README.md
distant supervision nlg.pdf		distant supervision nlg.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data/wita50k

data/wita50k

e2e-metrics

e2e-metrics

fairseq

fairseq

output/preprocessed/wita50k

output/preprocessed/wita50k

scripts

scripts

tools/fastBPE

tools/fastBPE

.gitignore

.gitignore

README.md

README.md

distant supervision nlg.pdf

distant supervision nlg.pdf

requirements.txt

requirements.txt

Repository files navigation

Partially-Aligned Data-to-Text Generation with Distant Supervision

Requirements

Install

Preprocess Data

Train Baseline Model (Optional)

Train Our DSG Model

Cite

About

Releases

Packages

Languages

fuzihaofzh/distant_supervision_nlg

Folders and files

Latest commit

History

Repository files navigation

Partially-Aligned Data-to-Text Generation with Distant Supervision

Requirements

Install

Preprocess Data

Train Baseline Model (Optional)

Train Our DSG Model

Cite

About

Resources

Stars

Watchers

Forks

Languages