[Bug] Empty token can appear at the beginning of a generated sequence #140

masahi · 2024-01-03T09:38:48Z

It seems, as of #107 which introduced detokenize_incrementally from vllm, very often (or always?) we get a blank token at the beginning of each generation like this:

Generated 0-th sample = ' The House of the Seven Hawks has the director earlier than The Secret Invasion?
Explanation: As we know that Abhishek'

Generated 1-th sample = ' The Nevidito.
Question: Which film has the director who received BAFTA  Orange Rising Star Award in 2019'

Generated 2-th sample = ' The Secret Invasion

Here is the answer for the above question. A vector is a directed line segment or a directed ray that has a defin'

Apparently, vllm has the same problem. Although this is a minor issue, such token still counts as one token in the output. So we should fix this behavior.

The text was updated successfully, but these errors were encountered:

Ailurus1 · 2024-01-26T16:00:10Z

Looks like this token is actually a "prefix_space" (SPIECE_UNDERLINE) with index 29871 in llama tokenizer vocabulary. There was some discussion in transformers repository about the tokenizer's behavior with this token (link) but seems that model itself can generate it

vvchernov · 2024-01-28T05:46:35Z

I have an idea to workaround it: 1. Greedy case: for prefill output if top1 token is 29871 replace it by top2 token, we observed that it is the next token (but it should be double checked). 2. Random case: for prefill output if token 29871 in top tokens not use it and replaces by the next after top token set.

masahi · 2024-01-29T10:05:43Z

Oh could this simply be a matter of setting skip_special_tokens=True here?

https://github.com/octoml/mlc-llm/blob/batch-serving/serve/mlc_serve/engine/engine_common.py#L79

@sunggg Any reason we are using skip_special_tokens=False in detokenize_incrementally?

sunggg · 2024-01-29T16:03:44Z

I thought about it briefly and decided to follow the default setting in vllm since I do not know about its other impacts. https://github.com/vllm-project/vllm/blob/main/vllm/transformers_utils/tokenizer.py#L191

* Add support for downloading weights from HF path * Support for custom model path * Read model type from HF config * Pass config into model * Find conversation template with llama prefix * Fix merge conflict * Load llama config from HF config * Skip downloading weights when they exist * Read in gpt_neox config * Read gpt_neox config from HF config * Read in moss config from HF config * Model type check for HF model path * Code cleanup * Update readme with new build instructions * Add documentation for building from source * Fix config loading from local path

masahi added the bug Something isn't working label Jan 3, 2024

masahi changed the title ~~[Bug]~~ [Bug] Empty token can appear at the beginning of a generated sequence Jan 3, 2024

masahi mentioned this issue Jan 26, 2024

Enable Logprobs in MLC Batch Serving #82

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Empty token can appear at the beginning of a generated sequence #140

[Bug] Empty token can appear at the beginning of a generated sequence #140

masahi commented Jan 3, 2024 •

edited

Ailurus1 commented Jan 26, 2024

vvchernov commented Jan 28, 2024

masahi commented Jan 29, 2024 •

edited

sunggg commented Jan 29, 2024

[Bug] Empty token can appear at the beginning of a generated sequence #140

[Bug] Empty token can appear at the beginning of a generated sequence #140

Comments

masahi commented Jan 3, 2024 • edited

Ailurus1 commented Jan 26, 2024

vvchernov commented Jan 28, 2024

masahi commented Jan 29, 2024 • edited

sunggg commented Jan 29, 2024

masahi commented Jan 3, 2024 •

edited

masahi commented Jan 29, 2024 •

edited