multimodal-deep-learning

Here are 341 public repositories matching this topic...

salesforce / LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

deep-learning salesforce image-captioning deep-learning-library vision-framework vision-and-language multimodal-deep-learning multimodal-datasets vision-language-transformer vision-language-pretraining visual-question-anwsering

Updated Apr 19, 2024
Jupyter Notebook

jrzaurin / pytorch-widedeep

Star

A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch

python deep-learning text images tabular-data pytorch pytorch-cv multimodal-deep-learning pytorch-nlp pytorch-transformers model-hub pytorch-tabular-data

Updated May 6, 2024
Python

Yutong-Zhou-cv / Awesome-Text-to-Image

Star

(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.

survey generative-adversarial-network image-manipulation image-generation text-to-image image-synthesis multimodal multimodal-deep-learning awseome-list text-to-face

Updated Apr 22, 2024

declare-lab / multimodal-deep-learning

Star

This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.

multimodal-interactions multimodal-learning multimodal-sentiment-analysis multimodal-deep-learning

Updated Mar 15, 2023
OpenEdge ABL

DWCTOD / CVPR2024-Papers-with-Code-Demo

Star

收集 CVPR 最新的成果，包括论文、代码和demo视频等，欢迎大家推荐！Collect the latest CVPR (Conference on Computer Vision and Pattern Recognition) results, including papers, code, and demo videos, etc., and welcome recommendations from everyone!

computer-vision segmentation object-detection cvpr multimodal-deep-learning cvpr2021 cvpr2022 llm cvpr2023 segment-anything cvpr2024

Updated Apr 25, 2024

KimMeen / Time-LLM

Star

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models"

machine-learning deep-learning time-series language-model time-series-analysis time-series-forecast time-series-forecasting multimodal-deep-learning cross-modality multimodal-time-series cross-modal-learning prompt-tuning large-language-models

Updated May 6, 2024
Python

kyegomez / BitNet

Sponsor

Star

Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch

machine-learning deep-neural-networks artificial-intelligence deeplearning multimodal multimodal-deep-learning gpt4

Updated Apr 28, 2024
Python

AlibabaResearch / AdvancedLiterateMachinery

Star

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

ocr computer-vision artificial-intelligence text-recognition document text-detection document-analysis end-to-end-ocr multimodal scene-text-recognition multimodal-deep-learning scene-text-detection vision-language document-understanding scene-text-detection-recognition document-recognition document-intelligence documentai vision-language-transformer vision-language-model

Updated Apr 23, 2024
C++

yuewang-cuhk / awesome-vision-language-pretraining-papers

Star

Recent Advances in Vision and Language PreTrained Models (VL-PTMs)

bert vision-and-language multimodal-deep-learning pretraining vl-ptms

Updated Aug 19, 2022

TheShadow29 / awesome-grounding

Star

awesome grounding: A curated list of research papers in visual grounding

natural-language-processing computer-vision paper awesome-list arxiv papers video-understanding captioning-images captioning-videos phrase-grounding language-grounding multimodal-deep-learning grounding visual-grounding embodied-agent video-grounding image-grounding paper-roadmap

Updated Apr 9, 2023

david-yoon / multimodal-speech-emotion

Star

TensorFlow implementation of "Multimodal Speech Emotion Recognition using Audio and Text," IEEE SLT-18

speech-emotion-recognition multimodal-deep-learning paralinguistics

Updated Mar 25, 2024
Jupyter Notebook

vijayvee / video-captioning

Star

This repository contains the code for a video captioning system inspired by Sequence to Sequence -- Video to Text. This system takes as input a video and generates a caption in English describing the video.

tensorflow seq2seq sequence-to-sequence video-captioning s2vt multimodal-deep-learning

Updated Oct 12, 2019
Python

soujanyaporia / MUStARD

Star

Multimodal Sarcasm Detection Dataset

sarcasm multimodal-interactions sarcasm-detection multimodal-deep-learning

Updated Apr 1, 2023
OpenEdge ABL

theislab / scarches

Star

Reference mapping for single-cell genomics

deep-learning scrna-seq data-integration single-cell rna-seq-analysis single-cell-genomics batch-correction multimodal-deep-learning multiomics human-cell-atlas

Updated Apr 9, 2024
Jupyter Notebook

cap-ntu / Video-to-Retail-Platform

Star

An intelligent multimodal-learning based system for video, product and ads analysis. Based on the system, people can build a lot of downstream applications such as product recommendation, video retrieval, etc.

machine-learning deep-neural-networks deep-learning multimedia network-server multimodal-deep-learning ai-system

Updated Jan 10, 2021
Python

declare-lab / awesome-emotion-recognition-in-conversations

Star

A comprehensive reading list for Emotion Recognition in Conversations

natural-language-processing dialogue-systems emotion-recognition conversational-ai multimodal-interactions multimodal-deep-learning emotion-recognition-in-conversation

Updated Feb 6, 2024

kyegomez / Med-PaLM

Sponsor

Star

Towards Generalist Biomedical AI

opensource deep-learning multimodality biomedical multimodal multimodal-deep-learning gpt4

Updated Feb 17, 2024
Python

richard-peng-xia / awesome-multimodal-in-medical-imaging

Star

A collection of resources on applications of multi-modal learning in medical imaging.

medical-imaging multimodal-learning visual-question-answering multimodal-deep-learning large-language-models medical-report-generation multimodal-large-language-models large-multimodal-models

Updated May 5, 2024

declare-lab / Multimodal-Infomax

Star

This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

multimodal-sentiment-analysis multimodal-deep-learning multimodal-fusion

Updated Mar 14, 2023
Python

omriav / blended-latent-diffusion

Star

Official implementation for "Blended Latent Diffusion" [SIGGRAPH 2023]

computer-vision deep-learning pytorch generative-model image-generation text-to-image diffusion multimodal multimodal-deep-learning text-to-image-synthesis diffusion-models text-guided-manipulation text-driven-editing

Updated Dec 14, 2023
Jupyter Notebook

Improve this page

Add a description, image, and links to the multimodal-deep-learning topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodal-deep-learning topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multimodal-deep-learning

Here are 341 public repositories matching this topic...

salesforce / LAVIS

jrzaurin / pytorch-widedeep

Yutong-Zhou-cv / Awesome-Text-to-Image

declare-lab / multimodal-deep-learning

DWCTOD / CVPR2024-Papers-with-Code-Demo

KimMeen / Time-LLM

kyegomez / BitNet

AlibabaResearch / AdvancedLiterateMachinery

yuewang-cuhk / awesome-vision-language-pretraining-papers

TheShadow29 / awesome-grounding

david-yoon / multimodal-speech-emotion

vijayvee / video-captioning

soujanyaporia / MUStARD

theislab / scarches

cap-ntu / Video-to-Retail-Platform

declare-lab / awesome-emotion-recognition-in-conversations

kyegomez / Med-PaLM

richard-peng-xia / awesome-multimodal-in-medical-imaging

declare-lab / Multimodal-Infomax

omriav / blended-latent-diffusion

Improve this page

Add this topic to your repo