[Question] Proof about Range of Slice Aspect Ratios #15

JJJYmmm · 2024-04-17T09:36:28Z

It seems that $| \log r|$ should be $|\log {\frac {W_I}{H_I} } + \log {\frac {n}{m} }|$

JJJYmmm · 2024-04-17T09:52:06Z

Another question: since README says the model can be efficiently trained in academic settings, within 23 hours on 8 A100 GPUs (vs. 26 hours of LLaVA-1.5).. Why the training is more efficient?

It seems that image modularization strategy will cost more time or memory usage in image encoding stage(one image are divided into serval parts).

So the efficiency is due to fewer visual tokens(perceiver than mlp projection)? Looking forward to your reply :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Proof about Range of Slice Aspect Ratios #15

[Question] Proof about Range of Slice Aspect Ratios #15

JJJYmmm commented Apr 17, 2024

JJJYmmm commented Apr 17, 2024 •

edited

[Question] Proof about Range of Slice Aspect Ratios #15

[Question] Proof about Range of Slice Aspect Ratios #15

Comments

JJJYmmm commented Apr 17, 2024

JJJYmmm commented Apr 17, 2024 • edited

JJJYmmm commented Apr 17, 2024 •

edited