[BUG] Tokenizer config has add_bos_token=true while LLM Studio is training with add_special_tokens=False #644

pascal-pfeiffer · 2024-03-21T07:15:59Z

🐛 Bug

The generated tokenizer_config.json has add_bos_token=true while H2O LLM Studio is training with add_special_tokens=False.
Using the default AutoTokenizer, this leads to different behaviors.

We should be explicit/correct about it and set add_bos_token=false

To Reproduce

Fine tune a model and download / push to HF

LLM Studio version

<=1.4.1, b70b04f

The text was updated successfully, but these errors were encountered:

psinger · 2024-03-21T07:17:00Z

add_eos_token=false as well

pascal-pfeiffer added the type/bug Bug in code label Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Tokenizer config has add_bos_token=true while LLM Studio is training with add_special_tokens=False #644

[BUG] Tokenizer config has add_bos_token=true while LLM Studio is training with add_special_tokens=False #644

pascal-pfeiffer commented Mar 21, 2024

psinger commented Mar 21, 2024

[BUG] Tokenizer config has add_bos_token=true while LLM Studio is training with add_special_tokens=False #644

[BUG] Tokenizer config has add_bos_token=true while LLM Studio is training with add_special_tokens=False #644

Comments

pascal-pfeiffer commented Mar 21, 2024

🐛 Bug

To Reproduce

LLM Studio version

psinger commented Mar 21, 2024