[BUG] Slow Tokenizer Message is printing when the Fast Tokenizer may be in use #407

david-vectorflow · 2024-05-07T23:41:28Z

Issue description:
For Llava 1.5 13b, if you run it with the --tokenizer_mode "auto" flag set, it still prints a message that the slow tokenizer is being used. Llava has an image processor and a text tokenizer. It is not possible (to my knowledge) to set a fast image processor, but you can set a fast image tokenizer from the sentence piece library. Setting the flag above should create that fast tokenizer. However the code still prints this message Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead. If you look at the code here since the LlavaTokenizer's text tokenizer is a field, it should actually be:
if not isinstance(tokenizer, PreTrainedTokenizerFast) or (isinstance(tokenizer, LlavaTokenizer) and not isinstance(tokenizer.tokenizer, PreTrainedTokenizerFast)):

Steps to reproduce:

Please list the steps to reproduce the issue, such as:

subprocess.Popen([
        "python", "-m", "lightllm.server.api_server", 
        "--host", "0.0.0.0",
        "--port", "8080",
        "--tp", "1",
        "--max_total_token_num", "20000",
        "--trust_remote_code",
        "--enable_multimodal",
        "--cache_capacity", "1000",
        "--model_dir", MODEL_DIR, 
        "--tokenizer_mode", "auto",
        "--max_req_total_len", "6000"])

Expected behavior:
No message should print. OR a message should print showing which fast tokenizer is in use.

Error logging:
INFO 05-07 21:41:31 [tokenizer.py:76] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead.

The text was updated successfully, but these errors were encountered:

hiworldwzj · 2024-05-10T01:26:44Z

@david-vectorflow thanks， we will fix this。

david-vectorflow added the bug Something isn't working label May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Slow Tokenizer Message is printing when the Fast Tokenizer may be in use #407

[BUG] Slow Tokenizer Message is printing when the Fast Tokenizer may be in use #407

david-vectorflow commented May 7, 2024 •

edited

hiworldwzj commented May 10, 2024

[BUG] Slow Tokenizer Message is printing when the Fast Tokenizer may be in use #407

[BUG] Slow Tokenizer Message is printing when the Fast Tokenizer may be in use #407

Comments

david-vectorflow commented May 7, 2024 • edited

hiworldwzj commented May 10, 2024

david-vectorflow commented May 7, 2024 •

edited