BuStop is a ML-based framework to automatically detect different stay-location types for intra-city public bus travels through multi-modal sensing.
-
Updated
Nov 20, 2022 - Jupyter Notebook
BuStop is a ML-based framework to automatically detect different stay-location types for intra-city public bus travels through multi-modal sensing.
Exploring and Visualizing Referring Expression Comprehension (Bachelor's Thesis by David Álvarez Rosa)
Official implementation for MGN
Code for the paper Visual Explanations of Image–Text Representations via Multi-Modal Information Bottleneck Attribution
Encoder-Decoder CNN-LSTM Model with an attention mechanism for image captioning. Trained using the Microsoft COCO Dataset.
Pytorch Implementation of Multimodal Entailment baseline
This repository contains an official PyTorch implementation of Position-aware Location Regression Network (PLRN) for temporal video grounding, which is presented in the paper Position-aware Location Regression Network for Temporal Video Grounding.
Socratic models for multimodal reasoning & image captioning
Interactive Multimodal Explanations for Easy Visual Question Answering
Emo-CLIM: Emotion-Aligned Contrastive Learning Between Images and Music [ICASSP 2024]
Public repository of our IGARSS 2023 submission
Showcases ongoing, and completed projects within various research themes.
Official Codebase of "A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition" (ICML 2023)
Corpus of resources for multimodal machine learning with physiological signals
[EMNLP 2022] Pytorch code for "Modal-specific Pseudo Query Generation for Video Corpus Moment Retrieval"
A multimodal deep learning framework for prediction of cancer biomarkers
📝🔍🖼️ A deep learning application for retrieving images by searching with text.
[ICCV 2023] The official PyTorch implementation of the paper: "Localizing Moments in Long Video Via Multimodal Guidance"
A list of Numerical Multimodal reasoning papers and their implementation
Official implementation of "Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval (CVPR 2024 Highlight)"
Add a description, image, and links to the multimodal-learning topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-learning topic, visit your repo's landing page and select "manage topics."