Skip to content

asahi-research/script-for-transformer-based-seq2bf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

script-for-transformer-based-seq2bf

This is a repository for Transformer-based Lexically Constrained Headline Generation (EMNLP'21). In this repository, we provide the script to preprocess Japanese News Corpus (JNC) and split them into train/valid/test sets. You can get JNC for a fee (more details). Note that we use the 2019 version of the JNC.

Usage

sh run.sh

Example of run.sh

python ./src/jnc_filter.py \
    --input_path ./data/JNC-corpus.json \
    --output_path ./output/

Check dataset

The results of the data splitting in our paper are shown in the directory ids. Please use it to check the processing results.

About

Repository for Transformer-based Lexically Constrained Headline Generation (EMNLP'21)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published