Sports Reporter

Python code for Learning to Select, Track, and Generate for Data-to-Text (Iso et al; ACL 2019).

Resources

Rotowire-modified dataset

Please refer to rotowire-modified repo.

Usage

Dependencies

The code was written for Python 3.X and requires DyNet.
Dependencies can be installed using requirements.txt.
For running information extractor, you should install torch.

Preprocessing

Before starting an experiment, you should run our provided setup.sh.

./setup.sh

After that, you can make the annotation file for training data via information extractor:

cd ./data2text-1
cat ../rotowire_v2/train.json | python -c 'import sys, json, nltk; print("\n".join(" ".join(nltk.word_tokenize(" ".join(x["summary"]))) for x in json.load(sys.stdin)))' > ../rotowire_v2/train_summary.txt
python data_utils.py -mode prep_gen_data -gen_fi ../rotowire_v2/train_summary.txt -dict_pfx "rotowire-modified-ie" -output_fi train_gold.h5 -input_path "../rotowire_v2" -train
th extractor.lua -gpuid 1 -datafile rotowire-modified-ie.h5 -preddata train_gold.h5 -dict_pfx "rotowire-modified-ie" -just_eval

Then, you can see the annotation file train_gold.h5-tuples.txt and make a vocab file for training.

cd ..
VOCAB=<path to the vocablary file>
python make_data.py ./rotowire_v2 ./data2text-1/train_gold.h5-tuples.txt $VOCAB

Train model

python reporter.py train $VOCAB --valid_file ./rotowire_v2/valid.json

Decode

MODEL=<path to the trained model file>
python reporter.py decode $VOCAB $MODEL ./rotowire_v2/test.json

Updated Results for RotoWire-modified

without writer info	RG (P% / #)	CS (P% / R%)	CO	BLEU
Joint+Rec+TVD (B=5)	18.09 / 48.54	23.24 / 28/92	14.47	15.34
Conditional (B=5)	20.28 / 61.76	27.20 / 29.76	15.88	15.26
Puduppully+, AAAI'19	82.55 / 34.05	32.30 / 43.74	16.67	14.82
Puduppully+, ACL'19	91.13 / 32.41	37.05 / 43.06	20.62	15.23
Iso+, ACL'19	91.98 / 31.66	40.44 / 46.63	21.56	15.74

with writer info	RG (P% / #)	CS (P% / R%)	CO	BLEU
Puduppully+, AAAI'19	82.55 / 34.05	32.30 / 43.74	16.67	14.82
+ stage 1	85.54 / 30.26	42.33 / 49.38	21.26	18.01
+ stage 2	83.35 / 32.42	33.28 / 42.92	16.73	16.57
+ stage 1 & 2	84.09 / 28.16	43.63 / 47.75	21.96	18.57
Iso+, ACL'19	91.98 / 31.66	40.44 / 46.63	21.56	15.74
+ writer	93.32 / 29.44	51.76 / 55.21	24.97	20.62

License and References

This code is available under the MIT Licence, see LICENCE

When you write a paper using this code, please cite the followings.

@InProceedings{Iso2019Learning,
    author = {Iso, Hayate
              and Uehara, Yui
              and Ishigaki, Tatsuya
              and Noji, Hiroshi
              and Aramaki, Eiji
              and Kobayashi, Ichiro
              and Miyao, Yusuke
              and Okazaki, Naoaki
              and Takamura, Hiroya},
    title = {Learning to Select, Track, and Generate for Data-to-Text},
    booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL)},
    year = {2019}
  }

Author

@isomap

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
dump		dump
ie		ie
.gitignore		.gitignore
LICENCE		LICENCE
README.md		README.md
make_data.py		make_data.py
network.py		network.py
nn.py		nn.py
reporter.py		reporter.py
requirements.txt		requirements.txt
setup.sh		setup.sh
text2num.py		text2num.py
trainer.py		trainer.py
utils.py		utils.py
vocab.py		vocab.py

License

aistairc/sports-reporter

Folders and files

Latest commit

History

Repository files navigation

Sports Reporter

Resources

Rotowire-modified dataset

Usage

Dependencies

Preprocessing

Train model

Decode

Updated Results for RotoWire-modified

License and References

Author

About

Topics

Resources

License

Stars

Watchers

Forks

Languages