Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I input 2688×672 images? #14

Open
mvsoom opened this issue Apr 16, 2024 · 0 comments
Open

Can I input 2688×672 images? #14

mvsoom opened this issue Apr 16, 2024 · 0 comments

Comments

@mvsoom
Copy link

mvsoom commented Apr 16, 2024

First of all, this looks amazing, thanks for open sourcing this!

I have a general question about LLaVA-UHD. In the paper conclusion, it says

Conclusion

In this work, we present LLaVA-UHD, a large multimodal model that efficiently perceives any aspect ratio and high-resolution images. [...] In this work, we limit the resolution of LLaVA-UHD to maximum of 672×1008. In future, considering the promising efficiency and scalability, we will explore higher-resolution images and more challenging tasks such as small object detection and segmentation. [...]

Does this mean that the maximum resolution of the implementation in this repo is 672×1008, or can I effectively input images with arbitrary ratio? I am specifically looking for 2688×672 (2 rows and 8 columns of 336×336 patches).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant