Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥
-
Updated
Feb 13, 2024 - Python
Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
😎 up-to-date & curated list of awesome LMM hallucinations papers, methods & resources.
Official code for Paper "Mantis: Multi-Image Instruction Tuning"
Linear mixed model genome scans for many traits
Tools and Statistical Procedures in Plant Science
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning
LLaVA inference with multiple images at once for cross-image analysis.
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Curated list of the sources about multilevel models
Semantic model of LMM analysis
Use Gemini to auto-label images for use with Autodistill.
weighted GWAS (heteroskedastic GWAS)
Add a description, image, and links to the lmm topic page so that developers can more easily learn about it.
To associate your repository with the lmm topic, visit your repo's landing page and select "manage topics."