LAVIS - A One-stop Library for Language-Vision Intelligence
-
Updated
Apr 19, 2024 - Jupyter Notebook
LAVIS - A One-stop Library for Language-Vision Intelligence
A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.
收集 CVPR 最新的成果,包括论文、代码和demo视频等,欢迎大家推荐!Collect the latest CVPR (Conference on Computer Vision and Pattern Recognition) results, including papers, code, and demo videos, etc., and welcome recommendations from everyone!
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models"
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
awesome grounding: A curated list of research papers in visual grounding
TensorFlow implementation of "Multimodal Speech Emotion Recognition using Audio and Text," IEEE SLT-18
This repository contains the code for a video captioning system inspired by Sequence to Sequence -- Video to Text. This system takes as input a video and generates a caption in English describing the video.
Multimodal Sarcasm Detection Dataset
Reference mapping for single-cell genomics
An intelligent multimodal-learning based system for video, product and ads analysis. Based on the system, people can build a lot of downstream applications such as product recommendation, video retrieval, etc.
A comprehensive reading list for Emotion Recognition in Conversations
Towards Generalist Biomedical AI
A collection of resources on applications of multi-modal learning in medical imaging.
This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.
Official implementation for "Blended Latent Diffusion" [SIGGRAPH 2023]
Add a description, image, and links to the multimodal-deep-learning topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-deep-learning topic, visit your repo's landing page and select "manage topics."