Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Tokenizer config has add_bos_token=true while LLM Studio is training with add_special_tokens=False #644

Open
pascal-pfeiffer opened this issue Mar 21, 2024 · 1 comment
Labels
type/bug Bug in code

Comments

@pascal-pfeiffer
Copy link
Collaborator

馃悰 Bug

The generated tokenizer_config.json has add_bos_token=true while H2O LLM Studio is training with add_special_tokens=False.
Using the default AutoTokenizer, this leads to different behaviors.

We should be explicit/correct about it and set add_bos_token=false

To Reproduce

Fine tune a model and download / push to HF

LLM Studio version

<=1.4.1, b70b04f

@pascal-pfeiffer pascal-pfeiffer added the type/bug Bug in code label Mar 21, 2024
@psinger
Copy link
Collaborator

psinger commented Mar 21, 2024

add_eos_token=false as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Bug in code
Projects
None yet
Development

No branches or pull requests

2 participants