Remove repeated concat of prompt and decode tokens in detokenization #139

masahi · 2024-01-03T09:33:07Z

As of the parallel sampling work, we are maintaining prompt and decode tokens separately. However, to preserve the output of detokenization, I had to concat them repeatedly:
https://github.com/octoml/mlc-llm/blob/batch-serving/serve/mlc_serve/engine/engine_common.py#L97-L99

It seems very wasteful to do concat and detokenization of the entire tokens repeatedly while what we really need is only the new delta at the postfix.

masahi added the bug Something isn't working label Jan 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove repeated concat of prompt and decode tokens in detokenization #139

Remove repeated concat of prompt and decode tokens in detokenization #139

masahi commented Jan 3, 2024 •

edited

Remove repeated concat of prompt and decode tokens in detokenization #139

Remove repeated concat of prompt and decode tokens in detokenization #139

Comments

masahi commented Jan 3, 2024 • edited

masahi commented Jan 3, 2024 •

edited