Hierarchical Context Tagging for Utterance Rewriting

Data Preprocessing

To extract labels and rules automatically, run the following commands. For Chinese Rewrite dataset, replace data_preprocess_en with data_preprocess_zh. You can replace pipeline_canard.sh with either pipeline_mudoco.sh or pipeline.sh (for Rewrite).

Download preprocessed data

From Google Drive.

tar -xzvf RaST_data.tar.gz
mv RaST_data/canard* data_preprocess_en
mv RaST_data/mudoco* data_preprocess_en
mv RaST_data/rewrite* data_preprocess_zh

Training

Return to the root directory and modify line 22 of train.sh with the correct model directory in experiments/ that contains params.json. Then run sh train.sh <dataset>. The top-2 checkpoints will be saved in this given directory under the current epoch number (e.g., experiments/canard/05).

Model checkpoints

Download from these links:

tar -xzvf <checkpoint_tar>  # e.g., canard21_03-16.tar.gz
mv <checkpoint_dir> experiments  # e.g., canard21_03-16

Evaluation

Modify line 16 of test.sh to point to the correct model directory. Then run the following command:

sh test.sh <dataset> <epoch_number>  # e.g., sh test.sh canard 05

From checkpoints

The best-performing models per dataset are below.

Dataset	Path
CANARD	`experiments/canard21_03-16/05`
MuDoCo	`experiments/mudoco21_03-16/19`
Rewrite	`experiments/rewrite21_03_19/19`

To evaluate on an existing checkpoint, modify line 16 of test.sh to point to the correct checkpoint directory. Note that the best epoch number per checkpoint can be found in the table above.

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
data_preprocess_en		data_preprocess_en
data_preprocess_zh		data_preprocess_zh
experiments		experiments
scripts		scripts
.gitignore		.gitignore
README.md		README.md
data_loader.py		data_loader.py
evaluate.py		evaluate.py
metrics.py		metrics.py
score.py		score.py
sequence_tagger.py		sequence_tagger.py
single_headed_additive_attn.py		single_headed_additive_attn.py
test.sh		test.sh
train.py		train.py
train.sh		train.sh
utils.py		utils.py

lisjin/hct

Folders and files

Latest commit

History

Repository files navigation

Hierarchical Context Tagging for Utterance Rewriting

Data Preprocessing

Download preprocessed data

Training

Model checkpoints

Evaluation

From checkpoints

About

Resources

Stars

Watchers

Forks

Languages