Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi image inputs supports for xtuner chat in llava-llama3? #655

Open
ztfmars opened this issue May 7, 2024 · 2 comments
Open

multi image inputs supports for xtuner chat in llava-llama3? #655

ztfmars opened this issue May 7, 2024 · 2 comments

Comments

@ztfmars
Copy link

ztfmars commented May 7, 2024

  • cmd:
    xtuner chat LLM-Research/Meta-Llama-3-8B-Instruct \ --visual-encoder ./clip-vit-large-patch14-336 \ --llava ./LLM-Research/llava-llama-3-8b \ --prompt-template llama3_chat \ --image ./test001.png

  • question:
    trained multimodal model can only input one image at one time , is there any method to support multi image & queries at one time?
    such as following:

double enter to end input (EXIT: exit chat, RESET: reset history)  >>> **image input**:  xxx/test.jpg or None

double enter to end input (EXIT: exit chat, RESET: reset history) >>> **query:**  describe this images.

xxxxxxxxxxxxxx

double enter to end input (EXIT: exit chat, RESET: reset history) >>>
@pppppM
Copy link
Collaborator

pppppM commented May 7, 2024

xtuner chat is a simple command-line tool developed for analyzing training results.

If you want to chat with multi images, you can take advantage of inference tools such as ollama and lmdeploy.

https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-gguf#chat-by-ollama
https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-hf#chat-by-lmdeploy

@J0eky
Copy link

J0eky commented May 28, 2024

@ztfmars hi ,when I do double enter after input: xtuner chat LLM-Research/Meta-Llama-3-8B-Instruct \ --visual-encoder ./clip-vit-large-patch14-336 \ --llava ./LLM-Research/llava-llama-3-8b \ --prompt-template llama3_chat \ --image ./test.jpg.the following error occurs:
double enter to end input (EXIT: exit chat, RESET: reset history) >>> what is this photo about?

Traceback (most recent call last):
File "/home/huangjun/.conda/envs/xtuner-env/lib/python3.10/site-packages/xtuner/tools/chat.py", line 491, in
main()
File "/home/huangjun/.conda/envs/xtuner-env/lib/python3.10/site-packages/xtuner/tools/chat.py", line 469, in main
generate_output = llm.generate(
File "/home/huangjun/.conda/envs/xtuner-env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/huangjun/.conda/envs/xtuner-env/lib/python3.10/site-packages/transformers/generation/utils.py", line 1758, in generate
result = self._sample(
File "/home/huangjun/.conda/envs/xtuner-env/lib/python3.10/site-packages/transformers/generation/utils.py", line 2390, in _sample
model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
File "/home/huangjun/.conda/envs/xtuner-env/lib/python3.10/site-packages/transformers/generation/utils.py", line 1321, in _get_initial_cache_position
past_length = model_kwargs["past_key_values"][0][0].shape[2]
TypeError: 'NoneType' object is not subscriptable

have you met ths same problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants