#

multimodal

Here are 657 public repositories matching this topic...

LHees / shared_representations

neural-network bachelor-thesis dbm multimodal shared-representations

Updated Jul 8, 2020
Python

ZacDair / SER_Bi-Modal

This repository contains the source code for my final year project for my undergraduate degree in MTU.

nlp machine-learning neural-network speech-recognition bag-of-words ensemble emodb speech-emotion-recognition multimodal ravdess speech-emotion-classification savee

Updated May 12, 2021
Python

h-peng17 / MMET

Code and data for the paper "Multimodal Entity Tagging with Multimodal Knowledge Base"

retrieval knowledge-graph entity-linking multimodal

Updated Feb 23, 2023
Python

shufangxun / MAC

An end-to-end masked contrastive video-and-language pre-training framework

pytorch clip mae end-to-end-learning multimodal vision-and-language activitynet pretraining msrvtt contrastive-learning vision-transformer video-text-retrieval video-language didemo

Updated Dec 13, 2022

XuyingAI / MetaAgent

🤖 A framework for building AI Agents with LLMs, integrating multimodal generative AI technologies including voice, images, videos, and digital humans 🌈💎✨

agent ai multimodal digital-human stable-diffusion chatgpt

Updated Jul 31, 2023

mv96 / mm_extraction

This repository contains the code and pointer to the trained models to extract proofs and theorems from scientific articles

Updated Nov 29, 2023
Jupyter Notebook

tpt-research / Shuvi

The fuzzy multimodal search service for train, bus, long-distance-bus, and more. Using Shibi as data source.

travel ridesharing public-transport multimodal shibi

Updated Dec 12, 2022
TypeScript

aqaqsubin / mmtod-pc

Multimodal TOD for Psychiatric Counseling

dataset-creation task-oriented-dialogue multimodal dialogue-state-tracking

Updated Oct 5, 2023
Jupyter Notebook

Theodlz / bts-ml-ztf-summer-school

A notebook to learn about ML for astronomy through BTSbot.

outreach machine-learning astronomy cnn supernovae multimodal

Updated Feb 7, 2024
Jupyter Notebook

scratchformers

shreydan / scratchformers

building various transformer model architectures and its modules from scratch.

nlp computer-vision transformers pytorch multimodal

Updated Feb 14, 2024
Jupyter Notebook

textsense

jennafradin / textsense

Visuo-haptic integration during texture exploration

touch vision haptics multimodal

Updated Jan 12, 2024
Processing

kyegomez / MMCA

The open source community's implementation of the all-new Multi-Modal Causal Attention from "DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention"

neural-network opensource-library artificial-intelligence attention attention-mechanism multimodality neuralnetwork opensourceforgood attention-is-all-you-need multimodal gpt4

Updated Mar 11, 2024
Python

D0miH / does-clip-know-my-face

Source Code for the Paper "Does CLIP Know my Face?" (Demo: https://huggingface.co/spaces/AIML-TUDA/does-clip-know-my-face)

ai deep-learning clip multimodal privacy-preserving-machine-learning

Updated Sep 18, 2023
Jupyter Notebook

kyegomez / MGQA

The open source implementation of the multi grouped query attention by the paper "GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints"

artificial-intelligence attention attention-mechanism attention-is-all-you-need attention-mechanisms multimodal attention-lstm attentio gpt4 multiqueryattention

Updated Dec 11, 2023
Python

aehrc / imageclefmedical_caption_23

MedICap: Code for the participation of team CSIRO at the ImageCLEFmedical Caption task of 2023.

medical-imaging image-captioning multimodal-learning multimodal report-generation medical-image-captioning

Updated Sep 4, 2023
Jupyter Notebook

xmed-lab / FDDM

MICCAI 2023: Fundus-Enhanced Disease-Aware Distillation Model for Retinal Disease Classification from OCT Images

knowledge-distillation multimodal ophthalmic-diseases-classification miccai2023

Updated Oct 20, 2023
Python

google-research-datasets / maverics

MAVERICS (Manually-vAlidated Vq^2a Examples fRom Image-Caption datasetS) is a suite of test-only benchmarks for visual question answering (VQA).

evaluation vqa vqa-dataset multimodal data-creation maverics vq2a

Updated Feb 18, 2023

GURPREETKAURJETHRA / Multimodal-AI-App-using-Llava-7B

Multimodal AI App using Llava 7B and Gradio

ai gradio whisper voice-assistant multimodal large-language-models llm generative-ai llava llavacpp

Updated Apr 8, 2024
Jupyter Notebook

bowen-upenn / Multi-Agent-VQA

Multi-Agent VQA: Exploring Multi-Agent Foundation Models on Zero-Shot Visual Question Answering

open-world multi-agent scene-graph zero-shot-learning visual-question-answering multimodal scene-understanding foundation-models large-language-models large-vision-language-models

Updated Apr 3, 2024
Python

josesosajs / telegram-data-collection

Multimodal Pipeline for Collection of Misinformation Data from Telegram. arxiv: https://arxiv.org/abs/2204.12690 , LREC22 Proceedings: http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.159.pdf

data telegram multimodal

Updated Sep 29, 2022
Python

Improve this page

Add a description, image, and links to the multimodal topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodal topic, visit your repo's landing page and select "manage topics."