GitHub - zhaochaocs/Planned-PTM: Codebase for Planned-PTM (NAACL 22)

Planned-PTM

This is the codebase of our NAACL22 Findings Paper Revisiting Generative Commonsense Reasoning: A Pre-Ordering Approach.

We find that in keywords-to-text generation tasks, the order of keywords can significantly impact the generation quality of Pre-trained LMs. We therefore propose a simple pre-ordering approach to elaborately manipulate the order of the input keywords before generation.

The code is inherited from KG-BART, which is based on Huggingface Transformer. We only followed the implementation of the vanilla BART model. We thank the authors for their effort to make these implementations public.

Step 1. Data Preparation

1.1. Put train/dev/test set under the `dataset/` folder.

We use CommonGen as the dataset to conduct experiments in the paper. We provide toy examples under this folder to demonstrate the data format.

1.2. Create oracle plans for each data subset.

for data_split in train dev test
do
    python src/data/create_plan.py --src_file dataset/commongen.${data_split}.src_new.txt \
                                   --tgt_file  dataset/commongen.${data_split}.tgt.txt \
                                   --plan_file dataset/commongen.${data_split}.src_new.txt.oracle
done

Step 2. Model Training

The following two steps can be done simultaneously.

2.1. Train a random model using the random input.

cuda_device=0
exp_name=bart_random

CUDA_VISIBLE_DEVICES=${cuda_device} python src/training/run_seq2seq.py \
        --data_dir dataset \
        --train_src_file commongen.train.src_new.txt \
        --train_tgt_file commongen.train.tgt.txt \
        --dev_src_file commongen.dev.src_new.txt \
        --dev_tgt_file commongen.dev.tgt.txt \
        --output_dir output/${exp_name} \
        --log_dir log/${exp_name} \
        --train_batch_size 96 \
        --eval_batch_size 96 \
        --gradient_accumulation_steps 4 \
        --learning_rate 0.00003 \
        --num_train_epochs 3

2.2. Train a planned model using the oracle planned input.

cuda_device=1
exp_name=bart_oracle

CUDA_VISIBLE_DEVICES=${cuda_device} python src/training/run_seq2seq.py \
        --data_dir dataset \
        --train_src_file commongen.train.src_new.txt.oracle \
        --train_tgt_file commongen.train.tgt.txt \
        --dev_src_file commongen.dev.src_new.txt.oracle \
        --dev_tgt_file commongen.dev.tgt.txt \
        --output_dir output/${exp_name} \
        --log_dir log/${exp_name} \
        --train_batch_size 96 \
        --eval_batch_size 96 \
        --gradient_accumulation_steps 4 \
        --learning_rate 0.00003 \
        --num_train_epochs 3

Step3. Model Inference

The following three steps have to be done step-by-step.

3.1. Test on random model with random input.

exp_name=bart_random
CUDA_VISIBLE_DEVICES=${cuda_device} python src/training/decode_seq2seq.py \
            --input_file dataset/commongen.test.src_new.txt \
            --model_recover_path output/${exp_name}/best_model/model.best.bin \
            --output_dir output/${exp_name}/best_model/Gen \
            --output_file model.best \
            --batch_size 100

3.2. Extract skeletons from output of the random model as predicted plans.

python src/data/create_plan.py --src_file dataset/commongen.test.src_new.txt \
                               --tgt_file  output/${exp_name}/best_model/Gen/model.best \
                               --plan_file dataset/commongen.test.src_new.txt.plan

3.3. Test on oracle model with the (predicted) planned input.

exp_name=bart_oracle
CUDA_VISIBLE_DEVICES=${cuda_device} python src/training/decode_seq2seq.py \
            --input_file dataset/commongen.test.src_new.txt.plan \
            --model_recover_path output/${exp_name}/best_model/model.best.bin \
            --output_dir output/${exp_name}/best_model/Gen \
            --output_file model.best \
            --batch_size 100

Citation

@inproceedings{zhao2022revisiting,
  title={Revisiting Generative Commonsense Reasoning: A Pre-Ordering Approach},
  author={Zhao, Chao and Brahman, Faeze and Huang, Tenghao and Chaturvedi, Snigdha},
  booktitle={Findings of the Association for Computational Linguistics: NAACL 2022},
  pages={1709--1718},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
dataset		dataset
doc		doc
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset

dataset

doc

doc

src

src

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Planned-PTM

Step 1. Data Preparation

1.1. Put train/dev/test set under the `dataset/` folder.

1.2. Create oracle plans for each data subset.

Step 2. Model Training

2.1. Train a random model using the random input.

2.2. Train a planned model using the oracle planned input.

Step3. Model Inference

3.1. Test on random model with random input.

3.2. Extract skeletons from output of the random model as predicted plans.

3.3. Test on oracle model with the (predicted) planned input.

Citation

About

Releases

Packages

Languages

License

zhaochaocs/Planned-PTM

Folders and files

Latest commit

History

Repository files navigation

Planned-PTM

Step 1. Data Preparation

1.1. Put train/dev/test set under the dataset/ folder.

1.2. Create oracle plans for each data subset.

Step 2. Model Training

2.1. Train a random model using the random input.

2.2. Train a planned model using the oracle planned input.

Step3. Model Inference

3.1. Test on random model with random input.

3.2. Extract skeletons from output of the random model as predicted plans.

3.3. Test on oracle model with the (predicted) planned input.

Citation

About

Resources

License

Stars

Watchers

Forks

Languages

1.1. Put train/dev/test set under the `dataset/` folder.