Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Empty token can appear at the beginning of a generated sequence #140

Open
masahi opened this issue Jan 3, 2024 · 4 comments
Open
Labels
bug Something isn't working

Comments

@masahi
Copy link
Member

masahi commented Jan 3, 2024

It seems, as of #107 which introduced detokenize_incrementally from vllm, very often (or always?) we get a blank token at the beginning of each generation like this:

Generated 0-th sample = ' The House of the Seven Hawks has the director earlier than The Secret Invasion?
Explanation: As we know that Abhishek'

Generated 1-th sample = ' The Nevidito.
Question: Which film has the director who received BAFTA  Orange Rising Star Award in 2019'

Generated 2-th sample = ' The Secret Invasion

Here is the answer for the above question. A vector is a directed line segment or a directed ray that has a defin'

Apparently, vllm has the same problem. Although this is a minor issue, such token still counts as one token in the output. So we should fix this behavior.

@masahi masahi added the bug Something isn't working label Jan 3, 2024
@masahi masahi changed the title [Bug] [Bug] Empty token can appear at the beginning of a generated sequence Jan 3, 2024
@Ailurus1
Copy link

Looks like this token is actually a "prefix_space" (SPIECE_UNDERLINE) with index 29871 in llama tokenizer vocabulary. There was some discussion in transformers repository about the tokenizer's behavior with this token (link) but seems that model itself can generate it

@vvchernov
Copy link

I have an idea to workaround it: 1. Greedy case: for prefill output if top1 token is 29871 replace it by top2 token, we observed that it is the next token (but it should be double checked). 2. Random case: for prefill output if token 29871 in top tokens not use it and replaces by the next after top token set.

@masahi
Copy link
Member Author

masahi commented Jan 29, 2024

Oh could this simply be a matter of setting skip_special_tokens=True here?

https://github.com/octoml/mlc-llm/blob/batch-serving/serve/mlc_serve/engine/engine_common.py#L79

@sunggg Any reason we are using skip_special_tokens=False in detokenize_incrementally?

@sunggg
Copy link
Member

sunggg commented Jan 29, 2024

I thought about it briefly and decided to follow the default setting in vllm since I do not know about its other impacts. https://github.com/vllm-project/vllm/blob/main/vllm/transformers_utils/tokenizer.py#L191

Lunderberg pushed a commit to Lunderberg/mlc-llm that referenced this issue Jan 30, 2024
* Add support for downloading weights from HF path

* Support for custom model path

* Read model type from HF config

* Pass config into model

* Find conversation template with llama prefix

* Fix merge conflict

* Load llama config from HF config

* Skip downloading weights when they exist

* Read in gpt_neox config

* Read gpt_neox config from HF config

* Read in moss config from HF config

* Model type check for HF model path

* Code cleanup

* Update readme with new build instructions

* Add documentation for building from source

* Fix config loading from local path
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants