multimodality

A Python package housing a collection of deep-learning multi-modal data fusion method pipelines! From data loading, to training, to evaluation - fusilli's got you covered 🌸

machine-learning cnn pytorch attention-mechanism imaging multimodality multivariate-analysis variational-autoencoder data-fusion multimodal multimodal-deep-learning multi-view-learning multi-view graph-neural-network pytorch-lightning

Updated May 16, 2024
Python

dicomtools / TriDFusion

Star

TriDFusion (3DF) Medical Imaging Viewer

machine-learning matlab dicom medical-imaging registration segmentation dicom-rt multimodality radiomics dosimetry dicom-viewer

Updated May 15, 2024
MATLAB

declare-lab / Sealing

Star

[NAACL 2024] Official Implementation of paper "Self-Adaptive Sampling for Efficient Video Question Answering on Image--Text Models"

multimodality video-understanding video-question-answering visual-language-models naacl2024

Updated Apr 26, 2024
Python

jshilong / GPT4RoI

Star

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest

computer-vision gpt roi multimodality llm

Updated Apr 23, 2024
Python

MMStar-Benchmark / MMStar

Star

This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"

evaluation multimodality multimodal-learning visual-question-answering multimodal large-language-models llm llms large-vision-language-model large-vision-language-models large-multimodal-models lvlms lvlm

Updated Apr 17, 2024
Python

hymie122 / RAG-Survey

Star

Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".

survey multimodality rag diffusion-models aigc llm

Updated Apr 17, 2024

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

ai gcc multimodality vlm cradle computer-control lmm grounding ai-agent large-language-models llm generative-ai vision-language-model ai-agents-framework general-computer-control personoid foundation-agent

Updated Apr 15, 2024
Python

ArrowLuo / CLIP4Clip

Star

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

search retrieval ranking clip multimodality multimodal-learning multimodal activitynet retrieval-model msvd msrvtt video-text-retrieval lsmdc didemo video-clip-retrieval

Updated Apr 12, 2024
Python

Improve this page

Add a description, image, and links to the multimodality topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodality topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multimodality

Here are 116 public repositories matching this topic...

YingqingHe / Awesome-LLMs-meet-Multimodal-Generation

MMMU-Benchmark / MMMU

xability / maidr

songqiang321 / Awesome-AI-Papers

PreferredAI / cornac

OmicsML / dance

BiomedSciAI / fuse-med-ml

MileBench / MileBench

kyegomez / swarms-pytorch

aimclub / FEDOT

kyegomez / PALI3

kyegomez / NaViT

florencejt / fusilli

dicomtools / TriDFusion

declare-lab / Sealing

jshilong / GPT4RoI

MMStar-Benchmark / MMStar

hymie122 / RAG-Survey

BAAI-Agents / Cradle

ArrowLuo / CLIP4Clip

Improve this page

Add this topic to your repo