A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun
-
Updated
Feb 6, 2022 - Python
A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun
Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥
A Comparative Framework for Multimodal Recommender Systems
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
[TPAMI 2023] Multimodal Image Synthesis and Editing: The Generative AI Era
Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".
Automated modeling and machine learning framework FEDOT
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.
X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)
A knowledge base construction engine for richly formatted data
Sequence-to-Sequence Framework in PyTorch
An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
DANCE: a deep learning library and benchmark platform for single-cell analysis
An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal AI that uses just a decoder to generate both text and images
Attention-based multimodal fusion for sentiment analysis
A Survey on multimodal learning research.
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Add a description, image, and links to the multimodality topic page so that developers can more easily learn about it.
To associate your repository with the multimodality topic, visit your repo's landing page and select "manage topics."