Formalizing Multimedia Recommendation through Multimodal Deep Learning, accepted in ACM Transactions on Recommender Systems.
-
Updated
May 10, 2024 - Python
Formalizing Multimedia Recommendation through Multimodal Deep Learning, accepted in ACM Transactions on Recommender Systems.
TasteRank: Personalized Image Search and Recommendation. This research project proposes an AI-based method for scoring photos on relevance to user interests. TasteRank leverages language and vision models, including Mistral LLMs and OpenAI’s CLIP, and applies multimodal machine-learning techniques.
A codebase dedicated to exploring multimodal learning approaches by integrating images of host galaxies of supernovae and their corresponding light-curves and spectra.
A curated list of awesome Multimodal studies.
In Progress Implementation of GATO style Generalist Multimodal model capable of image, text, RL and Robotics tasks
Code for TGRS 2022 paper "Multilevel Spatial-Channel Feature Fusion Network for Urban Village Classification by Fusing Satellite and Streetview Images"
Annotations on a Budget: Leveraging Geo-Data Similarity to Balance Model Performance and Annotation Cost
This repository provides an official implementation for the paper MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild.
A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models"
A collection of resources on applications of multi-modal learning in medical imaging.
Multimodal Computer Vision application leveraging object detections, gesture recognition and speech to text, in order to help user ask any questions about their environment.
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
This repository is cloned from https://github.com/HLR/LatentAlignmentProcedural. This is a potential baseline explored for the textual_cloze task on the RecipeQA Dataset - https://hucvl.github.io/recipeqa/
Corpus of resources for multimodal machine learning with physiological signals
Dynamic Multimodal Inference [CMU Spring 2024 Course Project : 11-785 Introduction to Deep Learning]
Code for Neural Plasticity-Inspired Foundation Model for Observing the Earth Crossing Modalities
VTC: Improving Video-Text Retrieval with User Comments
The `MKGCN` class, coupled with the Spotify API, orchestrates a multi-modal knowledge graph convolutional network to enhance music recommendation systems by integrating user interaction data and diverse music modalities.
Add a description, image, and links to the multimodal-deep-learning topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-deep-learning topic, visit your repo's landing page and select "manage topics."