Skip to content

AtmaHou/Seq2SeqDataAugmentationForLU

Repository files navigation

Sequence-to-sequence Data Augmentation for Dialogue Language Understanding

用户输入多样性拓展生成

Author: Atma

Update: 2018/6/7

Introduction

This repo is code for the COLING 2018 paper: Sequence-to-sequence Data Augmentation for Dialogue Language Understanding

Data

Get ATIS in Data dir.

Get full StandfordLU data at link, which contains both slot and intent labels for full data. To use this data, please cite:

@inproceedings{hou2018coling,
	author    = {Yutai Hou and
	Yijia Liu and
	Wanxiang Che and
	Ting Liu},
	title     = {Sequence-to-Sequence Data Augmentation for Dialogue Language Understanding},
	booktitle = {Proc. of COLING},
	pages     = {1234--1245},
	year      = {2018},
}

Get started

The following steps show code usage for the ATIS dataset.

  • Step1: Clustering Sentences

    python3 run_clustering.py -d atis

Tips:

To remove clustering effects for baseline setting i.e. cluster all data into one class:
python3 run_clustering.py -cm no_clustering -d atis
  • Step2: Prepare data

    python3 run_onmt_generation.py -gd

Tips:

There are some alternatives for baseline setting:

No clustering, Full connect , no index
python3 run_onmt_generation.py -gd -pm circle -ni -nc

Full connect , no index
python3 run_onmt_generation.py  -gd -pm full_connect -ni

Diverse connect, no index
python3 run_onmt_generation.py  -gd -ni

Diverse connect, no filtering
python3 run_onmt_generation.py  -gd -fr 1
  • Step3: Seq2Seq Generation

    python3 run_onmt_generation.py -t atis_labeled -f

Tips:

Again, alternatives for baseline:

No clustering, Full connect , no index
python3 run_onmt_generation.py -t atis_labeled -f -pm circle -ni -nc

Full connect , no index  ===> running
python3 run_onmt_generation.py -t atis_labeled -f -pm full_connect -ni

Diverse connect, no index
python3 run_onmt_generation.py -t atis_labeled -f -ni

Diverse connect, no filtering
CUDA_VISIBLE_DEVICES="1" python3 run_onmt_generation.py  -t atis_labeled -f -fr 1
  • Step4: Surface Realization

    python3 run_onmt_generation.py -t atis_labeled -rf

Tips:

For surface realization only baseline:
python3 run_thesaurus.py -t atis_labeled -rf
  • Step5: Generate Conll Format Data

    python3 run_slot_filling_evaluation.py -t atis_labeled -gd xiaoming -cd

Tips: For surface realization only baseline: python3 run_slot_filling_evaluation.py -t atis_labeled -gd xiaoming -cd -rfo

Notice

As the slot-filling used by our work is simply Bi-LSTM and our augmentation method suit for all slot-filling algorithm, we only release the seq2seq argumentation part and CONLL format data generation part.

You can add your own slot filling algorithm.

About

This repo is code for the COLING 2018 paper: Sequence-to-sequence Data Augmentation for Dialogue Language Understanding.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages