Skip to content

ychalier/pseudo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pseudonym Generator with Anagrams

Generates permutations of lists of letters that sound like real proper nouns.

Built With

License

This project is licensed under the GPL-3.0 license.

Contributing

Contributions are welcomed. Feel free to create pull requests with your changes!

Training A Model

You will need a working installation of Python 3.

Current corpus only contains the full text of Les Misérables by Victor Hugo. It is more than enough for training a basic model for French. Yet, you may want to use more recent datasets or add support for other languages. In that case, you may want to start by gathering a few megabytes of text data.

Execute the train.py script, and pass your corpus as argument. For instance, here is how the default French model was trained:

python train.py --max-token-length 5 --output-path data/tokens.tsv corpus/*

Then, put the generated TSV file in the model.zip archive. The archive.ps1 and archive.sh scripts can do that for you.

Adding Prefix Lists

The model.zip archive contains text files serving as prefix list:

You may add your own list within the archive. It should contain one entry per line. Normalization is performed on the fly, so you do not have to worry about it. Again, if you put it inside the data folder, the archive.ps1 and archive.sh scripts can add it to the archive for you.

Then, make sure to add the filename of this list as an option for the prefix select tag in index.html (option's value should be the filename with the extension).