Skip to content

isomap/factedit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fact-based Text Editing

Conference arXiv Slide

Code and Datasets for Fact-based Text Editing (Iso et al; ACL 2020).

Dataset

Datasets are created from publicly availlable table-to-text datasets. The dataset created from "webnlg" referred to as "webedit", and the dataset created from "rotowire(-modified)" referred to as the "rotoedit" data.

To extract the data, run tar -jxvf webedit.tar.bz2 to form a webedit/ directory (and similarly for rotoedit.tar.bz2).

Model overview

The model, which we call FactEditor, consists of three components, a buffer for storing the draft text and its representations, a stream for storing the revised text and its representations, and a triples for storing the triples and their representations.

FactEditor scans the text in the buffer, copies the parts of text from the buffer into the stream if they are described in the triples in the memory, deletes the parts of the text if they are not mentioned in the triples, and inserts new parts of next into the stream which is only presented in the triples.

Usage

Dependencies

  • The code was written for Python 3.X and requires AllenNLP.
  • Dependencies can be installed using requirements.txt.

Training

Set your config file path and serialization dirctory as environment variables:

export CONFIG=<path to the config file>
export SERIALIZATION_DIR=<path to the serialization_dir>

Then you can train FactEditor:

allennlp train $CONFIG \
            -s $SERIALIZATION_DIR \
            --include-package editor

For example, the following is the sample script for training the model with WebEdit dataset:

allennlp train config/webedit.jsonnet \
            -s models/webedit \
            --include-package editor 

Decoding

Set the dataset you want to decode and the model checkpoint you want to use as environment variables:

export INPUT_FILE=<path to the dev/test file>
export ARCHIVE_FILE=<path to the model archive file>

Then you can decode with FactEditor:

python predict.py $INPUT_FILE \
                  $ARCHIVE_FILE \
                  --cuda_device -1

To run on a GPU, run with --cuda_device 0 (or any other CUDA devices).

To run the model with a pretrained checkpoint the development set of WebEdit data:

python predict.py ./data/webedit/dev.jsonl \
                  ./models/webedit.tar.gz \
                  --cuda_device -1

References

@InProceedings{iso2020fact,
    author = {Iso, Hayate and
              Qiao, Chao and
              Li, Hang},
    title = {Fact-based Text Editing},
    booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL)},
    pages={171--182},
    year = {2020}
  }