🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
-
Updated
May 27, 2024 - HTML
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Multimodal Access and Interactive Data Representation
This repository is used to collect papers and code in the field of AI.
A Comparative Framework for Multimodal Recommender Systems
DANCE: a deep learning library and benchmark platform for single-cell analysis
A python framework accelerating ML based discovery in the medical field by encouraging code reuse. Batteries included :)
This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"
Swarming algorithms like PSO, Ant Colony, Sakana, and more in PyTorch 😊
Automated modeling and machine learning framework FEDOT
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"
A Python package housing a collection of deep-learning multi-modal data fusion method pipelines! From data loading, to training, to evaluation - fusilli's got you covered 🌸
TriDFusion (3DF) Medical Imaging Viewer
[NAACL 2024] Official Implementation of paper "Self-Adaptive Sampling for Efficient Video Question Answering on Image--Text Models"
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
Add a description, image, and links to the multimodality topic page so that developers can more easily learn about it.
To associate your repository with the multimodality topic, visit your repo's landing page and select "manage topics."