Skip to content

PetroIvaniuk/llms-tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 

Repository files navigation

LLMs Tools & Research Projects

The repository contains a list of ready-to-use AI Tools, Open Sources, and Research Projects
Apart from LLMs, you can find here new AI research from other areas such as Computer Vision, etc.
Welcome to contribute.

Large Language Models (LLMs) and Chatbots

DeepLearning.AI Short Courses | Andrew Ng - short courses about LLMs
The Inside Story of ChatGPT’s Astonishing Potential | Greg Brockman | Video TED
State of GPT | Andrej Karpathy | Video
Opportunities in AI - 2023 | Andrew Ng | Video
GPT-4 Turbo | OpenAI DevDay, Opening Keynote | Sam Altman | Video
The Rise and Rise of A.I. LLMs & their associated bots like ChatGPT | Visualization
Generative AI exists because of the transformer | Visualization
2023: The Year of AI | Reading
AI Index Report (Since 2017) | Stanford University | Reading
Prompt Engineering Guide | Reading
Prompt engineering | OpenAI | Reading

Chats & Assistants

Chat Company Notes
MetaAI MetaAI
POE Quora talk to ChatGPT, GPT-4, Claude 3 Opus, DALLE 3, and millions of others
Hume Hume empathic AI voice chat
Pi Inflection AI
Gemini Google
ChatRTX Nvidia runs locally on your PC
Copilot Microsoft
ChatGPT OpenAI

Open Source Models

Model Company Date Notes
Llama Family MetaAI
DBRX Databricks 2024-03-27 a general purpose LLM
Gemma Google 2024-02-21
phi-2 Microsoft 2023-12-12

Models

2021-22 2023 2024
Google LaMDA, GLaM
PaLM, Chinchilla
Bard, PaLM-2, Gemini Gemini 1.5, Gemma,
Gemini 1.5 Flash, Gemma 2
OpenAI ChatGPT GPT-4, GPT-4 Turbo GPT-4o
MetaAI Galactica LLaMA, LLaMA2: HF
Purple Llama
LLaMA3
EleutherAI GPT-J, GPT-NeoX,
GPT Neo
Pythia
Stability AI Stable Vicuna, StableLM,
Stable LM 3B, Stable Beluga,
Stable Chat, Stable LM Zephyr 3B
Stable LM 2 1.6B, Stable LM 2 12B
Anthropic RL-CAI Claude, Claude2, Claude2.1 Claude 3: Haiku, Sonnet, and Opus
BigScience Bloom
Microsoft phi-1, phi-1.5, phi-2
Mistral AI Mistral, Mixtral of experts Mistral Large
Inflection AI Inflection-2 Inflection-2.5
Stanford Alpaca
Berkeley-BAIR Koala
Vicuna Team Vicuna
TII Falcon
Cohere Command R+, Rerank 3
xAI Grok-1, Grok-1.5
  • Snowflake Arctic - an enterprise-focused Language Learning Model (LLM) designed to provide cost-effective training and openness
  • Reka Core - Multimodal LLM
  • Jamba - the world’s first production-grade Mamba based model, by AI21Lab
  • ChatFlow - a no-code platform that lets you set up an OpenAI-powered chatbot for your website
  • Perplexity - the AI-chatbot-powered search engine
  • Smaug-72B-v0.1 - an open-source model to surpass an average score of 80%, by abacus.ai
  • Ferret - An End-to-End MLLM that Accept Any-Form Referring and Ground Anything in Response, by Apple
  • NotebookLM - a powerful new interface that lets you shift effortlessly from reading to asking questions to writing, with an AI thought partner helping you at every turn
  • Amazon Titan - a breadth of high-performing image, multimodal, and text model choices, via a fully managed API, by AWS
  • Qwen - chat & pretrained LLM, by Alibaba Cloud
  • Phind, Phind-70B - model that matches and exceeds GPT-4's coding abilities while running 5x faster
  • FacTool - a tool augmented framework for detecting factual errors of texts generated by LLMs. Factool now supports 4 tasks: knowledge-based QA, code generation, mathematical reasoning, scientific literature review
  • Nougat - Neural Optical Understanding for Academic Documents, a Visual Transformer model that performs an Optical Character Recognition (OCR) task for processing scientific documents into a markup language, and demonstrate the effectiveness of our model on a new dataset of scientific documents, by MetaAI
  • TextFX - AI-powered tools for rappers, writers and wordsmiths
  • Prompt2Model - a system that takes a natural language task description (like the prompts used for LLMs such as ChatGPT) to train a small special-purpose model that is conducive for deployment
  • Giraffe - a new family of models that are finetuned from base LLaMA and LLaMA2
  • ToolBench - open-source, large-scale, high-quality instruction tuning SFT data to facilitate the construction of powerful LLMs with general tool-use capability
  • Platypus - a family of fine-tuned and merged LLMs that achieves the strongest performance and currently stands at first place in HuggingFace's Open LLM Leaderboard as of the release date of this work
  • OpenFlamingo V2 - an open-source effort to replicate DeepMind's Flamingo models
  • MetaGPT - a framework involving LLM-based multi-agents that encodes human standardized operating procedures (SOPs) to extend complex problem-solving capabilities that mimic efficient human workflows
  • Universal and Transferable Adversarial Attacks on Aligned Language Models
  • FlashAttention - an algorithm to speed up attention and reduce its memory footprint—without any approximation
  • Quivr - utilizes the power of Generative AI to store and retrieve unstructured information
  • LongLLaMA - a LLM capable of handling long contexts of 256k tokens or even more
  • OpenLLaMA - open source reproduction of MetaAI’s LLaMA
  • BuboGPT - an advanced LLM that incorporates multi-modal inputs including text, image and audio, with a unique ability to ground its responses to visual objects
  • LAION - Large-scale Artificial Intelligence Open Network
  • Dalai, Code - run LLaMA and Alpaca on your computer
  • LLaMAChat - allows you to chat with LLaMa, Alpaca and GPT4All models all running locally on your CPU
  • GPT4All, Code - an open-source assistant-style LLM that run locally on your CPU
  • SdkVercelAI - you can input a prompt, pick different LLMS, and compare two side by side
  • ChatwithData.ai - AI tool that lets you extract valuable insights and information from data files effortlessly
  • Open Assistant - a completely open-source ChatGPT alternative
  • HuggingChat - first open-source alternative to ChatGPT Powered by Open Assistant's latest model
  • ChatPDF - chat with any PDF
  • PdfGPT - a tool where you can upload pdf and get summaries, answers to your questions by OpenAI
  • Baize - an open-source chat model trained with LoRA. It uses 100k dialogs generated by letting ChatGPT chat with itself
  • Chameleon - a compositional reasoning framework designed to enhance LLMs and overcome their inherent limitations, such as outdated information and lack of precise reasoning

Offline-Mode

  • OpenLLM - an open-source platform designed to facilitate the deployment and operation of LLMs in real-world applications
  • LM Studio - an easy way to run open-source LLMs locally
  • Jan - open-source ChatGPT alternative that runs 100% offline on your computer
  • Pinokio - a browser that lets you install, run, and programmatically control ANY application, automatically

Large Visual Language Models (LVLMs)

  • PaliGemma - a powerful open VLM inspired by PaLI-3, optimized for image captioning, visual Q&A and other image labeling tasks, by Google
  • Idefics2 - it can answer questions about images, describe visual content, create stories grounded in multiple images, extract information from documents, and perform basic arithmetic operations
  • Grok-1.5 Vision - can process a wide variety of visual information, including documents, diagrams, charts, screenshots, and photographs, by xAI
  • AnyText - Multilingual Visual Text Generation And Editing
  • Qwen-VL - multimodal version of the large model series. Accepts image, text, and bounding box as inputs, outputs text and bounding box
  • AnomalyGPT - the LVLM based Industrial Anomaly Detection (IAD) method that can detect anomalies in industrial images without the need for manually specified thresholds
  • IDEFICS - an open-access VLM based on Flamingo. The model accepts arbitrary sequences of image and text inputs and produces text outputs, aiming to bring transparency to AI systems and serve as a foundation for open research in multimodal AI systems
  • Prismer - a data- and parameter-efficient VLM that leverages an ensemble of diverse, pre-trained domain experts
  • MiniGPT-4 - upload an image, and then use chat to identify what's in the picture and learn more about it
  • MultiModal-GPT - a vision and language model for multi-round dialogue with humans; the model is fine-tuned from OpenFlamingo, with LoRA added in the cross-attention and self-attention parts of the language model
  • LLaVA - a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding
  • TaskMatrix - connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting

Evaluation

  • Vibe-Eval - evaluation suite for measuring progress of multimodal language models, by Reka
  • FACET (FAirness in Computer Vision EvaluaTion) - a new comprehensive benchmark for evaluating the fairness of computer vision models across classification, detection, instance segmentation, and visual grounding tasks
  • Arthur Bench - an open-source evaluation tool for comparing LLMs, prompts, and hyperparameters for generative text models
  • AgentBench - the first benchmark designed to evaluate LLM-as-Agent across a diverse spectrum of different environments
  • L-Eval - a comprehensive long-context language models evaluation suite with 18 long document tasks across multiple domains that require reasoning over long texts, including summarization, question answering, in-context learning with long CoT examples, topic retrieval, and paper writing assistance
  • OpenICL - an open-source toolkit for in-context learning and LLM evaluation; supports various state-of-the-art retrieval and inference methods, tasks, and zero-/few-shot evaluation of LLMs
  • OpenAGI - an open-source AGI research platform, specifically designed to offer complex, multi-step tasks and accompanied by task-specific datasets, evaluation metrics, and a diverse range of extensible models

Leaderboards:

  • Chatbot Arena - an open platform to evaluate LLMs by human preference in the real-world
  • Open LLM Leaderboard - evaluate models on 6 key benchmarks using the Eleuther AI Language Model Evaluation Harness, a unified framework to test generative language models on a large number of different evaluation tasks
  • LLM-Perf Leaderboard - a benchmark the performance (latency, throughput, memory & energy) of LLMs with different hardwares, backends and optimizations using Optimum-Benhcmark
  • Hallucinations Leaderboard - evaluates the propensity for hallucination in LLMs across a diverse array of tasks, including Closed-book Open-domain QA, Summarization, Reading Comprehension, Instruction Following, Fact-Checking, and Hallucination Detection
  • NPHardEval leaderboard - a benchmark for assessing the reasoning abilities of LLMs through the lens of computational complexity classes
  • LLM Safety Leaderboard - evaluation for LLM safety and help researchers and practitioners better understand the capabilities, limitations, and potential risks of LLMs
  • The Open Medical-LLM Leaderboard - aims to track, rank and evaluate the performance of LLMs on medical question answering tasks
  • TheFastest.AI - site that provides reliable measurements for the performance of popular models

Libraries

  • LangChain, docs - a framework for developing applications powered by language models
  • LlamaIndex, docs - a “data framework” to help you build LLM apps
  • LLaMA2-Accessory - an open-source toolkit for pre-training, fine-tuning and deployment of LLMs and mutlimodal LLMs
  • LLaMA-Adapter - a lightweight adaption method for fine-tuning Instruction-following and Multi-modal LLaMA models
  • streaming-llm - Efficient Streaming Language Models with Attention Sinks
  • llamafile - run LLMs with a single file
  • outlines, docs - a library to write reliable programs for interactions with generative models: language models, diffusers, multimodal models, classifiers, etc
  • OneLLM - One Framework to Align All Modalities with Language
  • guidance - interleave generation, prompting, and logical control into a single continuous flow matching how the language model actually processes the text
  • agents - an open-source library/framework for building autonomous language agents
  • nanoGPT - the simplest, fastest repository for training/finetuning medium-sized GPTs
  • TorchScale - a PyTorch library that allows researchers and developers to scale up Transformers efficiently and effectively
  • InvokeAI - an implementation of Stable Diffusion, the open source text-to-image and image-to-image generator
  • ComfyUI - a powerful and modular Stable Diffusion GUI and backend. This UI will let you design and execute advanced stable diffusion pipelines using a graph/nodes/flowchart based interface
  • StableSwarmUI - Modular Stable Diffusion Web-User-Interface, with an emphasis on making powertools easily accessible, high performance, and extensibility
  • Wanda - Pruning LLMs by Weights and Activation: removes weights on a per-output basis, by the product of weight magnitudes and input activation norms
  • LOMO: LOw-Memory Optimization - a new optimizer, which fuses the gradient computation and the parameter update in one step to reduce memory usage
  • LMFlow - an extensible, convenient, and efficient toolbox for finetuning large machine learning models, designed to be user-friendly, speedy and reliable, and accessible to the entire community
  • Heron - a library that seamlessly integrates multiple Vision and Language models, as well as Video and Language models. Additionally, we provide pretrained weights trained on various datasets
  • Curated Transformers - a transformer library for PyTorch. It provides state-of-the-art models that are composed from a set of reusable components, by Explosion
  • spacy-llm - integrates LLMs into spaCy, featuring a modular system for fast prototyping and prompting, and turning unstructured responses into robust outputs for various NLP tasks, no training data required, by Explosion
  • Medusa - a simple framework that democratizes the acceleration techniques for LLM generation with multiple decoding heads
  • Self-RAG - a new framework to train an arbitrary LM to learn to retrieve, generate, and critique to enhance the factuality and quality of generations, without hurting the versatility of LLMs
  • OpenAgents - an open platform for using and hosting language agents in the wild of everyday life
  • Mirascope, docs - a toolkit for developing production-ready LLM-powered tools using Python and Pydantic
  • gateway — route to 100+ open & closed source models with a unified API. It is also production-ready with support for caching, fallbacks, retries, timeouts, loadbalancing, and can be edge-deployed for minimum latency

Devices

  • Frame AI glasses - , by Brilliant Labs
  • Ray-Ban Meta Smart Glasses - a 12 MP camera and five-mic system, updates, by Ray-Ban & MetaAI
  • LPU Inference Engine - Language Processing Units, by Groq
  • FigureAI - AI robotics company bringing a general purpose humanoid to life
  • SanctuaryAI - company on a mission to create the world’s first human-like intelligence in general-purpose robots
  • Limitless - personalized AI powered by what you’ve seen, said, and heard
  • rabbit r1 - a personalized operating system through a natural language interface
  • Open Interpreter - a new computer (the 01) with Open Interpreter at the center

Income

  • Poe - price-per-message revenue model for AI bot creators
  • GPTs Store - create custom versions of ChatGPT that combine instructions, extra knowledge, and any combination of skills
  • Voice Library - share your voice in the Voice Library today and earn cash rewards when it's used
  • HuggingChat - making the community's best AI chat models available to everyone

Tools

Text-to-Image Text-to-Music Text-to-Video Games Brand Prompt Generator
Midjourney Mubert GENMO Leonardo.Ai - Assets Flair G-prompter
Adobe Firefly Waveformer PIKA LABS Dreamlab - Animated Sprites Logolivery Prompt Builder
Catbird Kaiber Didimo Midjourney PromptHelper1
BlueWillow Invidio Scenario - Assets Midjourney PromptHelper2
Lexica Moonvalley Skybox - World-building FlowGPT
Playground Morph Studio ilumine AI Anthropic
Imgcreator Haiper Bezi - 3D Assets
Craiyon LTX Studio Charmed - 3D Assets

Text-to-image

Models
Google Muse, Imagen, Parti, HyperDreamBooth, DreamBooth
StyleDrop, Imagen 2, ImageFX, Imagen 3
OpenAI CLIP, DALL·E, DALL·E 2, DALL·E 3
MetaAI CM3leon, Emu Video, Emu Edit, Imagine
stability.ai Stable Diffusion XL, DreamStudio, Clipdrop, DeepFloyd IF: (Code, Demo: HF)
SDXL Turbo, Stable Cascade, Stable Diffusion 3
  • Distribution Matching Distillation - one-step generator achieves comparable image quality with StableDiffusion v1.5 while being 30x faster
  • Generative Powers of Ten - a method that uses a text-to-image model to generate consistent content across multiple image scales, enabling extreme semantic zooms into a scene, e.g., ranging from a wide-angle landscape view of a forest to a macro shot of an insect sitting on one of the tree branches
  • Playground v2 - open weights - an early preview of our efforts to make increasingly powerful graphics models
  • Delta Denoising Score - a novel scoring function for text-based image editing that guides minimal modifications of an input image towards the content described in a target prompt
  • Prompt-to-Prompt - editing framework, where the edits are controlled by text only
  • OpenCLIP - an open source implementation of OpenAI's CLIP (Contrastive Language-Image Pre-training)
  • LEDITS - combined lightweight approach for real-image editing, incorporating the Edit Friendly DDPM inversion technique with Semantic Guidance, thus extending Semantic Guidance to real image editing, while harnessing the editing capabilities of DDPM inversion
  • Würstchen - Fast Diffusion for Image Generation
  • ExactlyAI - create images in seconds with an AI that understands your style
  • ConceptLab - generative models have enabled us to transform our words into vibrant, captivating imagery
  • IP-Adapter - Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
  • MATCHAI - a powerful web app that can copy the color grading from images so you can apply it to your own
  • Ideogram - AI tools that will make creative expression more accessible, fun, and efficient
  • Picogen - nonofficial API to Midjourney AI, Stability AI and DALLE-2 AI
  • FABRIC - Feedback via Attention-Based Reference Image Conditioning - a technique to incorporate iterative feedback into the generative process of diffusion models based on StableDiffusion
  • Controlling Text-to-Image Diffusion by Orthogonal Finetuning (OFT) - for adapting text-to-image diffusion models to downstream tasks
  • InstructPix2Pix Learning to Follow Image Editing Instructions - a method for editing images from human instructions: given an input image and a written instruction that tells the model what to do, our model follows these instructions to edit the image
  • Composer - a large (5 billion parameters) controllable diffusion model trained on billions of (text, image) pairs. It can exponentially expand the control space through composition, leading to an enormous number of ways to generate and manipulate images, i.e., making the infinite use of finite means
  • GigaGAN: Large-scale GAN for Text-to-Image Synthesis - changing texture with prompting, changing style with prompting, by Adobe Research

Multi-modal

  • ImageBind, Demo, Code - Image->Audio, Audio->Image, Text->Image&Audio, Aidio&Image->Image, Audio->Generated Image, by MetaAI
  • GEN-1, Research - use words and images to generate new videos out of existing ones by Runway: AI-Magic-Tools
  • GEN-2, Research - create videos in any style you can imagine with Text to Video generation by Runway: AI-Magic-Tools
    • Mode 01: Text to Video: Synthesize videos in any style you can imagine using nothing but a text prompt. If you can say it, now you can see it
    • Mode 02: Text + Image to Video: Generate a video using a driving image and a text prompt
    • Mode 03: Image to Video: Generate video using just a driving image (Variations Mode)
    • Mode 04: Stylization: Transfer the style of any image or prompt to every frame of your video
    • Mode 05: Storyboard: Turn mockups into fully stylized and animated renders
    • Mode 06: Mask: Isolate subjects in your video and modify them with simple text prompts
    • Mode 07: Render: Turn untextured renders into realistic outputs by applying an input image or prompt
    • Mode 08: Customization: Unleash the full power of Gen-2 by customizing the model for even higher fidelity results
  • MONSTER API
    • text-to-image: a latent text-to-image diffusion model capable of generating photo-realistic images conditioned on text descriptions
    • image-to-image: a latent diffusion model capable of generating photo-realistic generating image-to-image translations guided by a text prompt
    • instruct-pix2pix: a model enables fast and effective image editing based on simple instructions

Images

  • PhotoMaker - Customizing Realistic Human Photos via Stacked ID Embedding
  • DeWatermark - Remove Watermark from photos online free with AI; Upscales - Upscale Images with AI upto 4K
  • NSF - Neural Spline Fields for Burst Image Fusion and Layer Separation
  • Material Palette - a method to extract Physically-Based-Rendering (PBR) materials from a single real-world image
  • DiffusionLight - a simple yet effective technique to estimate lighting in a single input image
  • KREA - generate images and videos with a delightful AI-powered design tool
  • Magnific - the image Upscaler & Enhancer
  • Stable Signature - a new method for watermarking images, by MetaAI
  • wasitai - check if an image was generated by a machine
  • Textify - a tool for replacing the gibberish in AI-generated images with your desired text
  • Interpolating between Images with Diffusion Models - a method for zero-shot controllable interpolation using latent diffusion models
  • AnyDoor: Zero-shot Object-level Image Customization - a diffusion-based image generator with the power to move target objects to new scenes at user-specified locations in a harmonious way
  • Matting Anything, Code, Demo: HF - an efficient and versatile framework for estimating the alpha matte of any instance in an image with user-prompt guidance
  • Plug-and-Play, Code - a large-scale text-to-image generative models have been a revolutionary breakthrough in the evolution of generative AI, allowing us to synthesize diverse images that convey highly complex visual concepts
  • Real-Time Neural Appearance Models - a complete system for real-time rendering of scenes with complex appearance previously reserved for offline use, by NVIDIA
  • Designer, Microsoft Designer expands preview with new AI design features by Microsoft. Designer has all the tools you’d expect, plus a few AI superpowers. Generate stunning designs and original images just by typing what you want. Get writing assistance and automatic layout suggestions for anything you add. Designer can even propose captions and hashtags to make social media sharing effortless
  • Scribble Diffusion - turn your sketch into a refined image using AI
  • StudioGPT - a tool for reimagining an existing image

Computer Vision

  • TAO-Amodal - benchmark is a dataset that includes amodal and modal bounding boxes for visible and occluded objects
  • OMG-Seg - One Model that is Good enough to efficiently and effectively handle all the segmentation tasks, including image semantic, instance, and panoptic segmentation, as well as their video counterparts, open vocabulary settings, prompt-driven, interactive segmentation like SAM, and video object segmentation
  • PUG (Photorealistic Unreal Graphics) - 3 datasets for representation learning research
  • Tracking Anything in High Quality - a framework for high performance video object tracking and segmentation
  • DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data - a new benchmark of synthetic image triplets that span a wide range of mid-level variations, labeled with human similarity judgments
  • CoTracker - an architecture that jointly tracks multiple points throughout an entire video, by MetaAI
  • TAPIR - a model for Tracking Any Point (TAP) that effectively tracks a query point in a video sequence, by Google DeepMind
  • DreamTeache - a self-supervised feature representation learning framework that utilizes generative networks for pre-training downstream image backbones, by NVIDIA
  • V-JEPA - Video Joint Embedding Predictive Architecture is an early example of a physical world model excels at detecting and understanding highly detailed interactions between objects
  • I-JEPA, Code - Image Joint Embedding Predictive Architecture is a method for self-supervised learning. At a high level, I-JEPA predicts the representations of part of an image from the representations of other parts of the same image
  • Visual Prompting - an innovative approach that takes text prompting, used in applications such as ChatGPT, to computer vision
  • Tracking Everything Everywhere All at Once - a new test-time optimization method for estimating dense and long-range motion from a video sequence
  • Track-Anything - a flexible and interactive tool for video object tracking and segmentation. It is developed u
  • pon Segment Anything, can specify anything to track and segment via user clicks only
  • EdgeSAM - an accelerated variant of the SAM, optimized for efficient execution on edge devices with minimal compromise in performance
  • Segment Anything Model (SAM) - a new AI model that can "cut out" any object, in any image, with a single click. SAM is a promptable segmentation system with zero-shot generalization to unfamiliar objects and images, without the need for additional training. Blog: Introducing Segment Anything, Code
  • DINOv2 - a new method for training high-performance CV models, state-of-the-art CV models with self-supervised learning
  • Behind the Scenes: Density Fields for Single View Reconstruction - a neural network that predicts an implicit density field from a single image

Video & Animation

  • VideoFX - a new experimental tool powered by Veo. It’s designed to help support creatives through the storytelling journey, by Google
  • Veo - generates high-quality 1080p resolution videos in a wide range of cinematic and visual styles that can go beyond a minute, by Google
  • VideoGigaGAN: Towards Detail-rich Video Super-Resolution - a generative VSR model that can produce videos with high-frequency details and temporal consistency, by Adobe Research
  • VASA-1 - Lifelike Audio-Driven Talking Faces Generated in Real Time, by Microdoft
  • MagicTime - Time-lapse Video Generation Models as Metamorphic Simulators
  • Stable Video Diffusion - a foundation model for generative video based on the image model Stable Diffusiona
  • EMO - Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
  • SORA - a model (a latent diffusion model that learned to transform noise into videos using an encoder-decoder and transformer) that can create realistic and imaginative scenes from text instructions, by OpenAI
  • LUMIERE - A Space-Time Diffusion Model for Video Generation: Text-to-Video, Image-to-Video, Stylized Generation, Video Stylization, Cinemagraphs, Video Inpainting
  • ActAnywhere - Subject-Aware Video Background Generation
  • MagicVideo-V2 - integrates the text-to-image model, video motion generator, reference image embedding module and frame interpolation module into an end-to-end video generation pipeline
  • I2VGen-XL - High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models
  • StreamDiffusion - an innovative diffusion pipeline designed for real-time interactive generation
  • WALT - Window Attention Latent Transformer - a transformer-based method for latent video diffusion models (LVDMs)
  • Hotshot - GIF generator
  • Unscreen - remove video background
  • Motrica - technologies and tools for advanced character animation
  • CoDeF - Content Deformation Fields for Temporally Consistent Video Processing
  • MagicEdit - supports various editing applications, including video stylization, local editing, video-MagicMix and video outpainting
  • To Infinity and Beyond - an approach to generating high-quality episodic content for IP's (Intellectual Property) using LLMs, custom state-of-the art diffusion models and our multi-agent simulation for contextualization, story progression and behavioral control
  • PlazmaPunk - create your own music video with the power of AI
  • Video-LLaMA, Code, Demo: HF - a multi-model LLM that achieves video-grounded conversations between humans and computers by connecting language decoder with off-the-shelf unimodal pre-trained models
  • AnimateDiff prompt travel - AnimateDiff with prompt travel + ControlNet + IP-Adapter
  • AnimateDiff, Code - Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
  • Animate-A-Story - a video storytelling approach which can synthesize high-quality, structure-controlled, and character-controlled videos
  • Zeroscope - a watermark-free Modelscope-based video model optimized for producing high-quality 16:9 compositions and a smooth video output
  • Klap - a tool that analyzes the video and finds short clips
  • Lalamu - low-quality video lip sync with preselected videos/video templates (take clips from videos, give the video new audio, and then the lips will sync up to that new audio within the video)
  • D-ID - uses generative AI to create customized videos featuring talking avatars at a touch of a button for businesses and creators.
  • Rooms.xyz - create & remix interactive rooms from your browser
  • Wonder Dynamics - an AI tool that automatically animates, lights, and composes CG characters into a live-action scene
  • REVELxyz - a tool for creating Animated Avatars from a single photo
  • ANIMATED DRAWINGS - a tool that brings children's drawings to life, by animating characters to move around, by MetaAI
  • RERENDER A VIDEO, Demo: HF - a novel zero-shot text-guided video-to-video translation framework to adapt image models to videos
  • Roop, Code - take a video and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training
  • Text2Performer - Text-Driven Human Video Generation, where a video sequence is synthesized from texts describing the appearance and motions of a target performer
  • DragGAN, Code, Demo: HF - way of controlling GANs, that is, to "drag" any points of the image to precisely reach target points in a user-interactive manner. Through DragGAN, anyone can deform an image with precise control over where pixels go, thus manipulating the pose, shape, expression, and layout of diverse categories such as animals, cars, humans, landscapes, etc
  • DragDiffusion - Harnessing Diffusion Models for Interactive Point-based Image Editing
  • In-N-Out: Face Video Inversion and Editing with Volumetric Decomposition - our core idea is to represent the face in a video using two neural radiance fields, one for in-distribution and the other for out-of-distribution data, and compose them together for reconstruction
  • High-Resolution Video Synthesis with Latent Diffusion Models - Latent Diffusion Models (LDMs) enable high-quality image synthesis while avoiding excessive compute demands by training a diffusion model in a compressed lower-dimensional latent space, by NVIDIA

3D

  • InstantMesh - Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models
  • Spline - Generate 3D objects from text prompts and images
  • SIMA - a Scalable Instructable Multiworld Agent (SIMA) that can follow natural-language instructions to carry out tasks in a variety of video game settings
  • Stable Video 3D - Quality Novel View Synthesis and 3D Generation from Single Images, by Stability AI
  • TripoSR - Fast 3D Object Generation from Single Images, by Stability AI
  • BlendNeRF - 3D-aware Blending with Generative NeRFs
  • 4DGen - Grounded 4D Content Generation with Spatial-temporal Consistency
  • MobileBrick - Building LEGO for 3D Reconstruction on Mobile Devices. A novel data capturing and 3D annotation pipeline to obtain precise 3D ground-truth shapes without relying on expensive 3D scanners
  • PoseGPT - Chatting about 3D Human Pose
  • ProlificDreamer - High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation
  • Stable Zero123 - 3D Object Generation from Single Images
  • SMERF - Streamable Memory Efficient Radiance Fields for Real-Time Large-Scene Exploration
  • DreamCraft3D - a hierarchical 3D content generation method that produces high-fidelity and coherent 3D objects
  • Genie - 3D fundational model, by Lumalabs
  • Masterpiece X - the generative text-to-3D app that allows users to create 3D objects and characters complete with mesh, texture, and animations
  • GAUSSIAN SPLAT - a rasterization technique for 3D reconstruction and rendering
  • SyncDreamer - generating multiview-consistent images from a single-view image
  • MAV3D (Make-A-Video3D) - a method for generating three-dimensional dynamic scenes from text descriptions. Our approach uses a 4D dynamic Neural Radiance Field (NeRF), which is optimized for scene appearance, density, and motion consistency by querying a Text-to-Video (T2V) diffusion-based model
  • HiFA - High-fidelity Text-to-3D with Advanced Diffusion Guidance
  • AutoRecon - a framework named for the automated discovery and reconstruction of an object from multi-view images
  • BITE - enables 3D shape and pose estimation of dogs from a single input image. The model handles a wide range of shapes and breeds, as well as challenging postures far from the available training poses, like sitting or lying on the ground
  • CSM (Common Sense Machines) - generate your own textured 3D assets
  • MotionGPT: Human Motion as Foreign Language - a unified, versatile, and user-friendly motion-language model to handle multiple motion-relevant tasks
  • PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360° - the first 3D-aware generative model that enables high-quality view-consistent image synthesis of full heads in 360° with diverse appearance and detailed geometry using only in-the-wild unstructured images for training
  • AvatarBooth - a text-to-3D model. It creates an animatable 3D model with your word description. Also, it can generate customized model with 4~6 photos from your phone or a character design generated from diffusion model
  • Infinigen, Code - a procedural generator of 3D scenes, creating depth maps and labeling every aspect of the world it generates, by Princeton Vision & Learning Lab
  • USD - Universal Scene Description - an open and extensible framework and ecosystem for describing, composing, simulating and collaborating within 3D worlds, originally developed by Pixar Animation Studios
  • Shap-E: Demo, Code - a conditional generative model for 3D assets, by OpenAI
  • Neural Kernel Surface Reconstruction, Code- a novel method for reconstructing a 3D implicit surface from a large-scale, sparse, and noisy point, by NVIDIA
  • Neuralangelo - a framework for high-fidelity 3D surface reconstruction from RGB video captures. Using ubiquitous mobile devices, we enable users to create digital twins of both object-centric and large-scale real-world scenes with highly detailed 3D geometry, by NVIDIA
  • Rodin Diffusion - a Generative Model for Sculpting 3D Digital Avatars, by Microsoft
  • 3D Gaussian Splatting for Real-Time Radiance Field Rendering - three key elements that allow us to achieve state-of-the-art visual quality while maintaining competitive training times and importantly allow high-quality real-time (≥ 100 fps) novel-view synthesis at 1080p resolution
  • ConsistentNeRF - a method that leverages depth information to regularize both multi-view and single-view 3D consistency among pixels
  • Text2NeRF - a text-driven 3D scene generation framework, combines the neural radiance field (NeRF) and a pre-trained text-to-image diffusion model to generate diverse view-consistent indoor and outdoor 3D scenes from natural language descriptions
  • Zip-NeRF - a technique that combines mip-NeRF 360 and grid-based models such as Instant NGP
  • S-NeRF - a new street-view NeRF (S-NeRF) that considers novel view synthesis of both the large-scale background scenes and the foreground moving vehicles jointly
  • Mip-NeRF 360 - Unbounded Anti-Aliased Neural Radiance Fields, an extension of mip-NeRF that uses a non-linear scene parameterization, online distillation, and a novel distortion-based regularizer to overcome the challenges presented by unbounded scenes
  • 3D-aware Conditional Image Synthesis - a 3D-aware conditional generative model for controllable photorealistic image synthesis. Given a 2D label map, such as a segmentation or edge map, our model synthesizes a photo from different viewpoints
  • Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior - can create high-fidelity 3D content from only a single image
  • Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models - generates textured 3D meshes from a given text prompt using 2D text-to-image models
  • Objaverse-XL - an open dataset of over 10 million 3D objects
  • OmniObject3D - a large vocabulary 3D object dataset with massive high-quality real-scanned 3D objects to facilitate the development of 3D perception, reconstruction, and generation in the real world

Audio & Speech & Music

  • Audiobox - generate voices and sound effects using a combination of voice inputs and natural language text prompts — making it easy to create custom audio for a wide range of use cases
  • Seamless - system that unlocks expressive cross-lingual communication in real time
  • SeamlessM4T - a foundational multilingual and multitask model that seamlessly translates and transcribes across speech and text: automatic speech recognition, speech-to-text and speech-to-speech translation, text-to-text and text-to-speech translation
  • AudioCraft - simple framework that generates high-quality, realistic audio and music from text-based user inputs after training on raw audio signals as opposed to MIDI or piano rolls
    • MusicGen, Demo: HF, Code - a simple and controllable model for music generation
    • AudioGen - an auto-regressive generative model that generates audio samples conditioned on text inputs
    • EnCodec - a neural network that is trained end to end to reconstruct the input signal
  • MuAViC - a Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
  • Voicebox - Text-Guided Multilingual Universal Speech Generation at Scale

Google

  • MusicFX - a new experimental tool that enables users to generate their own music using AI
  • SingSong - a system which generates instrumental music to accompany input vocals
  • SynthID - users can embed a digital watermark directly into AI-generated images or audio they create
  • AudioPaLM - a LLM for speech understanding and generation
  • MusicLM, Demo - a model generating high-fidelity music from text descriptions such as "a calming violin melody backed by a distorted guitar riff"
  • Universal Speech Model (USM) - a state-of-the-art speech AI for 100+ languages
  • Dubbing Studio - a tool, enabling automatic, end-to-end video translation across 29 languages. hands-on control over transcript, translation, timing, and more
  • Speech to Speech - a tool that lets you turn the recording of one voice to sound as if spoken by another
  • Eleven Multilingual v2 - a Foundational AI Speech Model for Nearly 30 Languages
  • Eleven Multilingual v1, Demo - generate top-quality spoken audio in any voice and style with the most advanced and multipurpose AI speech tool out there
  • AI Speech Classifier, Demo - detect whether an audio clip was created using ElevenLab

Other

  • Chatter - an interactive podcast, by Hume
  • OpenVoice, OpenVoice2 - a versatile instant voice cloning approach that requires only a short audio clip from the reference speaker to replicate their voice and generate speech in multiple languages
  • Voice Engine - a model for creating custom voices, by OpenAI
  • Udio - discover, create, and share music with the world
  • Image to SFX - compare sound effects generation models from image caption
  • DubbingAI - AI tool can convert your voice into high-quality cloned voices—from celebrities to your favorite gaming characters—in real time
  • Lyria - AI music generation model
  • StockMusic - a platform for AI-generated tunes that allows you to generate up to 10 minutes of copyright-free music
  • Stable Audio, Stable Audio 2.0 - a system that generates music and sound effects from text
  • RIFFUSION - the model to generate images of spectrograms and can then be converted to an audio clip
  • CLAP - you can extract a latent representation of any given audio and text for your own model, or for different downstream task
  • Vscoped - effortlessly transcribe your video content to boost click-through rates and watch time
  • MERT, Code, Demo: HF - an Acoustic Music Understanding Model with Large-Scale Self-supervised Training
  • Ecoute - a live transcription tool that provides real-time transcripts for both the user's microphone input (You) and the user's speakers output (Speaker) in a textbox. It also generates a suggested response using OpenAI's GPT-3.5 for the user to say based on the live transcription of the conversation
  • SadTalker: Demo - Stylized Audio-Driven Single Image Talking Face Animation
  • Recast - turn your want-to-read articles into rich audio summaries
  • AudioGPT, Demo: HuggingFace, Code - Understanding and Generating Speech, Music, Sound, and Talking Head
  • Chirp - music model, generates realistic audio - including speech, music and sound effects
  • Bark - a transformer-based text-to-audio model created, by Suno. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also produce nonverbal communication like laughing, sighing and crying
  • Whisper - an automatic speech recognition (ASR) system, that approaches human level robustness and accuracy on English speech recognition
  • Musicfy - music like you've never heard. Create and discover AI covers of your favorite songs
  • Jukebox - learned to compress their training set and generated audio from this compressed space
  • Koe Recast - transform your voice using AI

Code & Math

  • Llemma - an open language model for mathematics (repository also contains submodules related to the overlap, fine-tuning, and theorem proving experiments described in the paper)
  • Stable Code Instruct 3B - instruction-tuned Code LM based on Stable Code 3B, handle a variety of tasks such as code generation, math and other software development related queries, by Stability AI
  • Devin - first fully autonomous AI software engineer
  • AlphaCodium - a test-based, multi-stage, code-oriented iterative flow, that improves the performances of LLMs on code problems
  • alphageometry - Solving Olympiad Geometry without Human Demonstrations, by Google DeepMind
  • sketch-2-app - generate code based on sketch
  • GPT Pilot - a true AI developer that writes code, debugs it, talks to you when it needs help, etc
  • FunSearch - a method to search for new solutions in mathematics and computer science
  • MAmmoTH - a series of open-source LLMs specifically tailored for general math problem-solving. The MAmmoTH models are trained on MathInstruct, our meticulously curated instruction tuning dataset
  • Defog - a state-of-the-art LLM for converting natural language questions to SQL queries, which outperforms major open-source models and slightly outperforms gpt-3
  • v0 - a generative user interface system. It generates copy-and-paste friendly React code based on Shadcn UI and Tailwind CSS that people can use in their projects, by Vercel Labs
  • Open Interpreter - an open-source, locally running implementation of OpenAI's Code Interpreter
  • SafeCoder - a code assistant solution built for the enterprise. In marketing speak: “your own on-prem GitHub copilot”, by Hugging Face
  • Code Llama - a state-of-the-art LLM capable of generating code, and natural language about code, from both code and natural language prompts, by MetaAI
  • StableCode - LLM generative AI product for coding designed to assist programmers with their daily work, by Stability AI
  • Teaching Arithmetic to Small Transformers - small transformers, trained from random initialization, can efficiently learn arithmetic operations such as addition, multiplication, and elementary functions like square root, using the next-token prediction objective
  • InterCode - framework of interactive coding as a standard reinforcement learning (RL) environment, with code as actions and execution feedback as observations
  • CodeGen2.5 - LLMs for program synthesis, by Salesforce
  • LeanDojo - set of open-source LLM-based theorem provers without any proprietary datasets and release it under a permissive MIT license to facilitate further research
  • GPT Engineer - is made to be easy to adapt, extend, and make your agent learn how you want your code to look. It generates an entire codebase based on a prompt
  • CodeTF - a one-stop Python transformer-based library for code large language models (Code LLMs) and code intelligence, provides a seamless interface for training and inferencing on code intelligence tasks like code summarization, translation, code generation and so on. It aims to facilitate easy integration of SOTA CodeLLMs into real-world applications
  • Let’s Verify Step by Step - a new state-of-the-art in mathematical problem solving by rewarding each correct step of reasoning (“process supervision”) instead of simply rewarding the correct final answer (“outcome supervision”), by OpenAI
  • 🦍 Gorilla: LLM Connected with Massive APIs - a finetuned LLaMA-based model that surpasses GPT-4 on writing API calls
  • CodeT5 and CodeT5+ - models can be deployed as an AI-powered coding assistant to boost the productivity of software developers, by Salesforce
  • Framer - a tool that constructs a completely unique website for you based on a text prompt
  • Pico - a tool that use GPT4 to instantly build simple, shareable web apps

Games

  • Genie - a foundation world model trained from Internet videos that can generate an endless variety of playable (action-controllable) worlds from synthetic images, photographs, and even sketches, by Google DeepMind
  • PokemonRedExperiments - train RL agents to play Pokemon Red
  • BitMagic - game creation
  • AI Town - a deployable starter kit for building and customizing your own version of AI town - a virtual town where AI characters live, chat and socialize
  • Generative Agents: Interactive Simulacra of Human Behavior - contains our core simulation module for generative agents—computational agents that simulate believable human behaviors—and their game environment
  • STEVE-1 - a Generative Model for Text-to-Behavior in Minecraft
  • Mastering Stratego - DeepNash, an AI agent that learned the game from scratch to a human expert level by playing against itself
  • Voyager: An Open-Ended Embodied Agent with LLMs - the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention

Robotics

Typography

  • ControlNet, Demo: HF, How to make a QR code with Stable Diffusion - QR Code Conditioned ControlNet Models for Stable Diffusion. They provide a solid foundation for generating QR code-based artwork that is aesthetically pleasing, while still maintaining the integral QR code shape
  • Word-As-Image for Semantic Typography - A few examples of our Word-As-Image illustrations in various fonts and for different textual concept. The semantically adjusted letters are created completely automatically using our method, and can then be used for further creative design as we illustrate here
  • DS-Fusion: Artistic Typography via Discriminated and Stylized Diffusion - create artistic typography automatically, a novel method to automatically generate artistic typography by stylizing one or more letter fonts to visually convey the semantics of an input word, while ensuring that the output remains readable

Bio & Med

  • AlphaFold 3 - a new AI model that predict the structure of proteins, DNA, RNA, ligands and more, and how they interact, by Google DeepMind and Isomorphic Labs
  • AMIE - a research AI system for diagnostic medical reasoning and conversations, by Google
  • MentalLLaMA - mental health analysis with LLMs
  • AlphaMissense - an AI model classifying missense variants to help pinpoint the cause of diseases
  • evodiff - combines evolutionary-scale data with diffusion models for controllable protein sequence generation
  • SAM-Med2D - applying the Segment Anything Model (SAM) to medical 2D images
  • Med-Flamingo - a medical vision-language model with multimodal in-context learning abilities
  • Brain2Music - Reconstructing Music from Human Brain Activity
  • Seeing the World through Your Eyes - reconstruct a 3D scene beyond the camera's line-of-sight using portrait images containing eye reflections
  • Mind-Video - High-quality Video Reconstruction from Brain Activity
  • Med-PaLM - a large language model (LLM) designed to provide high-quality answers to medical questions
  • PMC-LLaMA - the official codes for "PMC-LLaMA: Continue Training LLaMA on Medical Papers"

Military

  • AIP Pillars - activate LLMs and other AI on your private network, subject to full control

Climat

Other: Fin, Presentation

  • GNoME - DL tool that dramatically increases the speed and efficiency of discovery by predicting the stability of new materials
  • FinGPT
  • guidde - create documentation/presentation/FAQ from captured video
  • Gamma - create visually appealing presentations
  • Tome - create a compelling starting point for your presentation in minutes