✨✨Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
-
Updated
May 11, 2024
✨✨Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Mixture-of-Experts for Large Vision-Language Models
InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
Add a description, image, and links to the large-vision-language-model topic page so that developers can more easily learn about it.
To associate your repository with the large-vision-language-model topic, visit your repo's landing page and select "manage topics."