-
Updated
Jul 8, 2020 - Python
multimodal
Here are 657 public repositories matching this topic...
This repository contains the source code for my final year project for my undergraduate degree in MTU.
-
Updated
May 12, 2021 - Python
Code and data for the paper "Multimodal Entity Tagging with Multimodal Knowledge Base"
-
Updated
Feb 23, 2023 - Python
An end-to-end masked contrastive video-and-language pre-training framework
-
Updated
Dec 13, 2022
🤖 A framework for building AI Agents with LLMs, integrating multimodal generative AI technologies including voice, images, videos, and digital humans 🌈💎✨
-
Updated
Jul 31, 2023
This repository contains the code and pointer to the trained models to extract proofs and theorems from scientific articles
-
Updated
Nov 29, 2023 - Jupyter Notebook
The fuzzy multimodal search service for train, bus, long-distance-bus, and more. Using Shibi as data source.
-
Updated
Dec 12, 2022 - TypeScript
Multimodal TOD for Psychiatric Counseling
-
Updated
Oct 5, 2023 - Jupyter Notebook
A notebook to learn about ML for astronomy through BTSbot.
-
Updated
Feb 7, 2024 - Jupyter Notebook
building various transformer model architectures and its modules from scratch.
-
Updated
Feb 14, 2024 - Jupyter Notebook
Visuo-haptic integration during texture exploration
-
Updated
Jan 12, 2024 - Processing
The open source community's implementation of the all-new Multi-Modal Causal Attention from "DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention"
-
Updated
Mar 11, 2024 - Python
Source Code for the Paper "Does CLIP Know my Face?" (Demo: https://huggingface.co/spaces/AIML-TUDA/does-clip-know-my-face)
-
Updated
Sep 18, 2023 - Jupyter Notebook
The open source implementation of the multi grouped query attention by the paper "GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints"
-
Updated
Dec 11, 2023 - Python
MedICap: Code for the participation of team CSIRO at the ImageCLEFmedical Caption task of 2023.
-
Updated
Sep 4, 2023 - Jupyter Notebook
MICCAI 2023: Fundus-Enhanced Disease-Aware Distillation Model for Retinal Disease Classification from OCT Images
-
Updated
Oct 20, 2023 - Python
MAVERICS (Manually-vAlidated Vq^2a Examples fRom Image-Caption datasetS) is a suite of test-only benchmarks for visual question answering (VQA).
-
Updated
Feb 18, 2023
Multimodal AI App using Llava 7B and Gradio
-
Updated
Apr 8, 2024 - Jupyter Notebook
Multi-Agent VQA: Exploring Multi-Agent Foundation Models on Zero-Shot Visual Question Answering
-
Updated
Apr 3, 2024 - Python
Multimodal Pipeline for Collection of Misinformation Data from Telegram. arxiv: https://arxiv.org/abs/2204.12690 , LREC22 Proceedings: http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.159.pdf
-
Updated
Sep 29, 2022 - Python
Improve this page
Add a description, image, and links to the multimodal topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the multimodal topic, visit your repo's landing page and select "manage topics."