You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Issue description:
For Llava 1.5 13b, if you run it with the --tokenizer_mode "auto" flag set, it still prints a message that the slow tokenizer is being used. Llava has an image processor and a text tokenizer. It is not possible (to my knowledge) to set a fast image processor, but you can set a fast image tokenizer from the sentence piece library. Setting the flag above should create that fast tokenizer. However the code still prints this message Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead. If you look at the code here since the LlavaTokenizer's text tokenizer is a field, it should actually be: if not isinstance(tokenizer, PreTrainedTokenizerFast) or (isinstance(tokenizer, LlavaTokenizer) and not isinstance(tokenizer.tokenizer, PreTrainedTokenizerFast)):
Steps to reproduce:
Please list the steps to reproduce the issue, such as:
Expected behavior:
No message should print. OR a message should print showing which fast tokenizer is in use.
Error logging: INFO 05-07 21:41:31 [tokenizer.py:76] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead.
The text was updated successfully, but these errors were encountered:
Issue description:
For Llava 1.5 13b, if you run it with the
--tokenizer_mode "auto"
flag set, it still prints a message that the slow tokenizer is being used. Llava has an image processor and a text tokenizer. It is not possible (to my knowledge) to set a fast image processor, but you can set a fast image tokenizer from the sentence piece library. Setting the flag above should create that fast tokenizer. However the code still prints this messageUsing a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead.
If you look at the code here since theLlavaTokenizer
's text tokenizer is a field, it should actually be:if not isinstance(tokenizer, PreTrainedTokenizerFast) or (isinstance(tokenizer, LlavaTokenizer) and not isinstance(tokenizer.tokenizer, PreTrainedTokenizerFast)):
Steps to reproduce:
Please list the steps to reproduce the issue, such as:
Expected behavior:
No message should print. OR a message should print showing which fast tokenizer is in use.
Error logging:
INFO 05-07 21:41:31 [tokenizer.py:76] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead.
The text was updated successfully, but these errors were encountered: