You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
question:
trained multimodal model can only input one image at one time , is there any method to support multi image & queries at one time?
such as following:
double enter to end input (EXIT: exit chat, RESET: reset history) >>> **image input**: xxx/test.jpg or None
double enter to end input (EXIT: exit chat, RESET: reset history) >>> **query:** describe this images.
xxxxxxxxxxxxxx
double enter to end input (EXIT: exit chat, RESET: reset history) >>>
The text was updated successfully, but these errors were encountered:
@ztfmars hi ,when I do double enter after input: xtuner chat LLM-Research/Meta-Llama-3-8B-Instruct \ --visual-encoder ./clip-vit-large-patch14-336 \ --llava ./LLM-Research/llava-llama-3-8b \ --prompt-template llama3_chat \ --image ./test.jpg.the following error occurs:
double enter to end input (EXIT: exit chat, RESET: reset history) >>> what is this photo about?
Traceback (most recent call last):
File "/home/huangjun/.conda/envs/xtuner-env/lib/python3.10/site-packages/xtuner/tools/chat.py", line 491, in
main()
File "/home/huangjun/.conda/envs/xtuner-env/lib/python3.10/site-packages/xtuner/tools/chat.py", line 469, in main
generate_output = llm.generate(
File "/home/huangjun/.conda/envs/xtuner-env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/huangjun/.conda/envs/xtuner-env/lib/python3.10/site-packages/transformers/generation/utils.py", line 1758, in generate
result = self._sample(
File "/home/huangjun/.conda/envs/xtuner-env/lib/python3.10/site-packages/transformers/generation/utils.py", line 2390, in _sample
model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
File "/home/huangjun/.conda/envs/xtuner-env/lib/python3.10/site-packages/transformers/generation/utils.py", line 1321, in _get_initial_cache_position
past_length = model_kwargs["past_key_values"][0][0].shape[2]
TypeError: 'NoneType' object is not subscriptable
cmd:
xtuner chat LLM-Research/Meta-Llama-3-8B-Instruct \ --visual-encoder ./clip-vit-large-patch14-336 \ --llava ./LLM-Research/llava-llama-3-8b \ --prompt-template llama3_chat \ --image ./test001.png
question:
trained multimodal model can only input one image at one time , is there any method to support multi image & queries at one time?
such as following:
The text was updated successfully, but these errors were encountered: