multi image inputs supports for xtuner chat in llava-llama3? #655

ztfmars · 2024-05-07T08:46:00Z

cmd:
xtuner chat LLM-Research/Meta-Llama-3-8B-Instruct \ --visual-encoder ./clip-vit-large-patch14-336 \ --llava ./LLM-Research/llava-llama-3-8b \ --prompt-template llama3_chat \ --image ./test001.png
question:
trained multimodal model can only input one image at one time , is there any method to support multi image & queries at one time?
such as following:

double enter to end input (EXIT: exit chat, RESET: reset history)  >>> **image input**:  xxx/test.jpg or None

double enter to end input (EXIT: exit chat, RESET: reset history) >>> **query:**  describe this images.

xxxxxxxxxxxxxx

double enter to end input (EXIT: exit chat, RESET: reset history) >>>

The text was updated successfully, but these errors were encountered:

pppppM · 2024-05-07T11:38:55Z

xtuner chat is a simple command-line tool developed for analyzing training results.

If you want to chat with multi images, you can take advantage of inference tools such as ollama and lmdeploy.

https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-gguf#chat-by-ollama
https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-hf#chat-by-lmdeploy

J0eky · 2024-05-28T07:38:12Z

@ztfmars hi ,when I do double enter after input: xtuner chat LLM-Research/Meta-Llama-3-8B-Instruct \ --visual-encoder ./clip-vit-large-patch14-336 \ --llava ./LLM-Research/llava-llama-3-8b \ --prompt-template llama3_chat \ --image ./test.jpg.the following error occurs:
double enter to end input (EXIT: exit chat, RESET: reset history) >>> what is this photo about?

Traceback (most recent call last):
File "/home/huangjun/.conda/envs/xtuner-env/lib/python3.10/site-packages/xtuner/tools/chat.py", line 491, in
main()
File "/home/huangjun/.conda/envs/xtuner-env/lib/python3.10/site-packages/xtuner/tools/chat.py", line 469, in main
generate_output = llm.generate(
File "/home/huangjun/.conda/envs/xtuner-env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/huangjun/.conda/envs/xtuner-env/lib/python3.10/site-packages/transformers/generation/utils.py", line 1758, in generate
result = self._sample(
File "/home/huangjun/.conda/envs/xtuner-env/lib/python3.10/site-packages/transformers/generation/utils.py", line 2390, in _sample
model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
File "/home/huangjun/.conda/envs/xtuner-env/lib/python3.10/site-packages/transformers/generation/utils.py", line 1321, in _get_initial_cache_position
past_length = model_kwargs["past_key_values"][0][0].shape[2]
TypeError: 'NoneType' object is not subscriptable

have you met ths same problem?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multi image inputs supports for xtuner chat in llava-llama3? #655

multi image inputs supports for xtuner chat in llava-llama3? #655

ztfmars commented May 7, 2024

pppppM commented May 7, 2024

J0eky commented May 28, 2024

multi image inputs supports for xtuner chat in llava-llama3? #655

multi image inputs supports for xtuner chat in llava-llama3? #655

Comments

ztfmars commented May 7, 2024

pppppM commented May 7, 2024

J0eky commented May 28, 2024