PyTorch code for the Findings of EMNLP 2020 paper "Control, Generate and Augment: A Scalabel Framework for Multi-Attributes Controlled Text Generation". The camera-ready version of the paper is accesible here.
Please download the YELP restaurants review data from here and the IMDB 50K movie review from here. The preprocessing of the data can be executed following the procedure explained in the supplementary materials of the paper "Control, Generate and Augment: A Scalabel Framework for Multi-Attributes Controlled Text Generation"
To obtain the multi-attributes dataset used please run first
python TenseLabeling.py
and second
python PronounLabeling.py
to train the model please run
python Analysis.py
All the parameters to obtain the results reported in the paper are set as default values. The model trained is saved in the bin folder. The name used is the date and the time the experiment is started
To generate new sentences simply run
python generation.py
The default parameters for this script let generate sentences with all possible combinations of attributes. For specifically attributes, please specify the examples desired.
All these scripts are in the Evalution folder
For the Data Augmentation Evaluation please run
python AugmentData.py
to generate all the combinations of augmented data for each of the starting training size in the paper. Afterwards run
python GPU_DAE.py
to obtain the validation and test results for the data augmentation experiment.
please run the script
python AttrMatch.py
to obtain all the different attribute matching accuracy for the generated sentences
python UniversalSentenceEvaluator.py
In the folder Generated you will find examples of our generated sentences, running both single and multi-attribute controls. In addition, the model checkpoints for each of these experiments are provided alongside with the parameters used for the experiments
All models presented in this work were implemented in PyTorch, and trained and tested on single Titan XP GPUs with 12GB memory.
The average runtime was 07:26:14 for the model trained with YELP. The average runtime was 04:09:54 for the model trained with IMDB.
Dataset | S-VAE (Generator) | Discriminator |
---|---|---|
YELP | 3.417.176 | 4452 |
IMDB | 4.433.176 | 4470 |