Official code for Paper "Mantis: Multi-Image Instruction Tuning"
-
Updated
May 23, 2024 - Python
Official code for Paper "Mantis: Multi-Image Instruction Tuning"
Tools and Statistical Procedures in Plant Science
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
LLaVA inference with multiple images at once for cross-image analysis.
😎 up-to-date & curated list of awesome LMM hallucinations papers, methods & resources.
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
A glossary of terms in AI and their corresponding papers.
Financial Engineering in IRFX in C++
An AI Chatbot that Interacts With the Solana Blockchain
Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥
A Streamlit web application powered by the Gemini API for question/chat and image generation.
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
Use Gemini to auto-label images for use with Autodistill.
Add a description, image, and links to the lmm topic page so that developers can more easily learn about it.
To associate your repository with the lmm topic, visit your repo's landing page and select "manage topics."