Create a large, well-managed and clean data-set for the task of music composition for video soundtracks.
-
Updated
Jul 6, 2023 - Jupyter Notebook
Create a large, well-managed and clean data-set for the task of music composition for video soundtracks.
A Fully Deployable React-Native mobile app that seeks to classify incoming messages in messaging apps into important or disturbing categories. using a Multi-Modal Machine Learning Architecture to achieve Text classification, Image classification and YouTube Video Link classification.
A list of research papers on knowledge-enhanced multimodal learning
A dataset of egocentric vision, eye-tracking and full body kinematics from human locomotion in out-of-the-lab environments. Also, different use cases of the dataset along with example code.
Analyzing Hateful Memes/ (Resources:- Hateful Memes Challenge)
Experiments around using Multi-Modal Casual Attention with Multi-Grouped Query Attention
AI Poet who looks at the images and writes poems Web service.
Facebook Marketplace is a platform for buying and selling products on Facebook. This project involves training a multimodal deep neural network model that predicts the category of a product based on its image and text description.
COMPSCI 696DS Industry Mentorship Program with Meta Reality Labs: Ambient AI: Multimodal Wearable Sensor Understanding (Experiments in Distilling Knowledge in Cross-Modal Contrastive Learning.)
The purpose of this project is to build an NLP model to make reading medical abtracts easier.
Vision-Language Pre-Training for Boosting Scene Text Detectors (CVPR2022)
Streamlit app for demonstrating multi-modal(vision+language) modelling in Pytorch.
Applied Deep Learning 深度學習之應用 by Vivian Chen 陳縕儂 at NTU CSIE
a Discord chatbot trained on Mistral and LLaVA language models
📝🔍🖼️ A deep learning application for retrieving images by searching with text.
Code for paper Multiomics dynamic learning enables personalized diagnosis and prognosis for pan-cancer and cancer-subtypes
Official implementation of the work "Text-Driven Image Editing via Learnable Regions" (CVPR 2024)
[IEEE Access'22] A Deep Attentive Multimodal Learning Approach for Disaster Identification
My master thesis: Siamese multi-hop attention for cross-modal retrieval.
PyTorch Data loaders and abstraction for multi-modal data.
Add a description, image, and links to the multimodal-deep-learning topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-deep-learning topic, visit your repo's landing page and select "manage topics."