GitHub - aitorormazabal/poetry_generation: Code for PoeLM

Code for the paper "PoeLM: A Meter- and Rhyme-Controllable Language Model for Unsupervised Poetry Generation"

Creating the augmented corpus

To create the augmented corpus given a regular text corpus, simply use the corresponding pipeline_{lan}.sh command. For example, for Spanish:

./pipeline_es.sh corpus augmented_corpus

This will create various files, and the augmented corpus will be saved to agumented_corpus.pref. The special control codes are saved to augmented_corpus.tag.special_vocab and augmented_corpus.tag.special_vocab.spaces. We can then train a sentencepiece model using these files:

% spm_train  --user_defined_symbols=augmented_corpus.tag.special_vocab.spaces --input=augmented_corpus.pref --model_prefix=spm --vocab_size=50000

Finally, we can encode the text using this sentencepiece model and train a regular language model with it.

Publication

If you use this code for academic research, please cite the PoeLM paper:

@misc{https://doi.org/10.48550/arxiv.2205.12206,
  doi = {10.48550/ARXIV.2205.12206},
  
  url = {https://arxiv.org/abs/2205.12206},
  
  author = {Ormazabal, Aitor and Artetxe, Mikel and Agirrezabal, Manex and Soroa, Aitor and Agirre, Eneko},
  
  keywords = {Computation and Language (cs.CL), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences},
  
  title = {PoeLM: A Meter- and Rhyme-Controllable Language Model for Unsupervised Poetry Generation},
  
  publisher = {arXiv},
  
  year = {2022},
  
  copyright = {Creative Commons Attribution 4.0 International}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
es		es
eu		eu
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

es

es

eu

eu

README.md

README.md

Repository files navigation

Creating the augmented corpus

Publication

About

Releases

Packages

Languages

aitorormazabal/poetry_generation

Folders and files

Latest commit

History

Repository files navigation

Creating the augmented corpus

Publication

About

Resources

Stars

Watchers

Forks

Languages