GitHub - jpqiang/Chinese-Idiom-Paraphrasing

Chinese Idiom Paraphrasing

Chinese Idiom Paraphrasing (CIP), which goal is to rephrase the idioms of input sentence to generate a fluent, meaning-preserving sentence without any idiom:

Data in this dataset and several approaches:

LSTM approach
Transformer approach
mt5 approach
infill approach

Dependecies

Python>=3.6
torch>=1.7.1
transformers==4.8.0
fairseq==0.10.2

Pre-trained model

you can download all pre-trained models here(4s9n), and put it intomodeldirectory.

If you want train models from scratch, you need uses the pre-trained language models t5-pegasus (ZhuiyiTechnology) and place the models under the model directory after downloading.

Train

train LSTM and Transformer model by fairseq, you need process data for jieba and bpe tokenize sentence, we use scripts from Subword-nmt:

git clone https://github.com/rsennrich/subword-nmt

Then run

sh prepare.sh

train LSTM, Transformer, t5-pegasus, infill model

sh train_lstm.sh
sh train_transformer.sh
sh train_t5_pegasus.sh
sh train_infill.sh

Generate

Run the following command to generate

sh fairseq_generate.sh
sh generate_t5_pegasus.sh
sh generate_infill.sh

Citation

@article{qiang2022chinese,
  title={Chinese Idiom Paraphrasing},
  author={Qiang, Jipeng and Li, Yang and Zhang, Chaowei and Li, Yun and Yuan, Yunhao and Zhu, Yi and Wu, Xindong},
  journal={arXiv preprint arXiv:2204.07555},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.idea		.idea
data		data
fairseq		fairseq
preprocess		preprocess
result		result
README.md		README.md
evaluate_extra_score.py		evaluate_extra_score.py
fairseq_generate.sh		fairseq_generate.sh
generate_infill.py		generate_infill.py
generate_infill.sh		generate_infill.sh
generate_t5_pegasus.py		generate_t5_pegasus.py
generate_t5_pegasus.sh		generate_t5_pegasus.sh
idioms.txt		idioms.txt
image.png		image.png
prepare.sh		prepare.sh
requirements.txt		requirements.txt
train_infill.py		train_infill.py
train_infill.sh		train_infill.sh
train_lstm.sh		train_lstm.sh
train_t5_pegasus.py		train_t5_pegasus.py
train_t5_pegasus.sh		train_t5_pegasus.sh
train_transformer.sh		train_transformer.sh

jpqiang/Chinese-Idiom-Paraphrasing

Folders and files

Latest commit

History

Repository files navigation

Chinese Idiom Paraphrasing

Dependecies

Pre-trained model

Train

Generate

Citation

About

Resources

Stars

Watchers

Forks

Languages