Skip to content

Hugging Faces Tokenizer

afiaka87 edited this page Apr 15, 2021 · 1 revision

Custom Tokenizer

This repository supports Huggingface Tokenizers if you wish to use it instead of the default simple tokenizer. Simply pass in an extra --bpe_path when invoking train_dalle.py and generate.py, with the path to your BPE json file.

The only requirement is that you use 0 as the padding during tokenization

ex.

$ python train_dalle.py --image_text_folder ./path/to/data --bpe_path ./path/to/bpe.json