LLMs Tools & Research Projects

The repository contains a list of ready-to-use AI Tools, Open Sources, and Research Projects
Apart from LLMs, you can find here new AI research from other areas such as Computer Vision, etc.
Welcome to contribute.

Large Language Models (LLMs) and Chatbots

Chats & Assistants

Chat	Company	Notes
MetaAI	MetaAI
POE	Quora	talk to ChatGPT, GPT-4, Claude 3 Opus, DALLE 3, and millions of others
Hume	Hume	empathic AI voice chat
Pi	Inflection AI
Gemini	Google
ChatRTX	Nvidia	runs locally on your PC
Copilot	Microsoft
ChatGPT	OpenAI

Open Source Models

Model	Company	Date	Notes
Llama Family	MetaAI
DBRX	Databricks	2024-03-27	a general purpose LLM
Gemma	Google	2024-02-21
phi-2	Microsoft	2023-12-12

Models

	2021-22	2023	2024
Google	LaMDA, GLaM PaLM, Chinchilla	Bard, PaLM-2, Gemini	Gemini 1.5, Gemma, Gemini 1.5 Flash, Gemma 2
OpenAI	ChatGPT	GPT-4, GPT-4 Turbo	GPT-4o
MetaAI	Galactica	LLaMA, LLaMA2: HF Purple Llama	LLaMA3
EleutherAI	GPT-J, GPT-NeoX, GPT Neo	Pythia
Stability AI		Stable Vicuna, StableLM, Stable LM 3B, Stable Beluga, Stable Chat, Stable LM Zephyr 3B	Stable LM 2 1.6B, Stable LM 2 12B
Anthropic	RL-CAI	Claude, Claude2, Claude2.1	Claude 3: Haiku, Sonnet, and Opus
BigScience	Bloom
Microsoft		phi-1, phi-1.5, phi-2
Mistral AI		Mistral, Mixtral of experts	Mistral Large
Inflection AI		Inflection-2	Inflection-2.5
Stanford		Alpaca
Berkeley-BAIR		Koala
Vicuna Team		Vicuna
TII		Falcon
Cohere			Command R+, Rerank 3
xAI			Grok-1, Grok-1.5

Snowflake Arctic - an enterprise-focused Language Learning Model (LLM) designed to provide cost-effective training and openness
Reka Core - Multimodal LLM
Jamba - the world’s first production-grade Mamba based model, by AI21Lab
ChatFlow - a no-code platform that lets you set up an OpenAI-powered chatbot for your website
Perplexity - the AI-chatbot-powered search engine
Smaug-72B-v0.1 - an open-source model to surpass an average score of 80%, by abacus.ai
Ferret - An End-to-End MLLM that Accept Any-Form Referring and Ground Anything in Response, by Apple
NotebookLM - a powerful new interface that lets you shift effortlessly from reading to asking questions to writing, with an AI thought partner helping you at every turn
Amazon Titan - a breadth of high-performing image, multimodal, and text model choices, via a fully managed API, by AWS
Qwen - chat & pretrained LLM, by Alibaba Cloud
Phind, Phind-70B - model that matches and exceeds GPT-4's coding abilities while running 5x faster
FacTool - a tool augmented framework for detecting factual errors of texts generated by LLMs. Factool now supports 4 tasks: knowledge-based QA, code generation, mathematical reasoning, scientific literature review
Nougat - Neural Optical Understanding for Academic Documents, a Visual Transformer model that performs an Optical Character Recognition (OCR) task for processing scientific documents into a markup language, and demonstrate the effectiveness of our model on a new dataset of scientific documents, by MetaAI
TextFX - AI-powered tools for rappers, writers and wordsmiths
Prompt2Model - a system that takes a natural language task description (like the prompts used for LLMs such as ChatGPT) to train a small special-purpose model that is conducive for deployment
Giraffe - a new family of models that are finetuned from base LLaMA and LLaMA2
ToolBench - open-source, large-scale, high-quality instruction tuning SFT data to facilitate the construction of powerful LLMs with general tool-use capability
Platypus - a family of fine-tuned and merged LLMs that achieves the strongest performance and currently stands at first place in HuggingFace's Open LLM Leaderboard as of the release date of this work
OpenFlamingo V2 - an open-source effort to replicate DeepMind's Flamingo models
MetaGPT - a framework involving LLM-based multi-agents that encodes human standardized operating procedures (SOPs) to extend complex problem-solving capabilities that mimic efficient human workflows
Universal and Transferable Adversarial Attacks on Aligned Language Models
FlashAttention - an algorithm to speed up attention and reduce its memory footprint—without any approximation
Quivr - utilizes the power of Generative AI to store and retrieve unstructured information
LongLLaMA - a LLM capable of handling long contexts of 256k tokens or even more
OpenLLaMA - open source reproduction of MetaAI’s LLaMA
BuboGPT - an advanced LLM that incorporates multi-modal inputs including text, image and audio, with a unique ability to ground its responses to visual objects
LAION - Large-scale Artificial Intelligence Open Network
Dalai, Code - run LLaMA and Alpaca on your computer
LLaMAChat - allows you to chat with LLaMa, Alpaca and GPT4All models all running locally on your CPU
GPT4All, Code - an open-source assistant-style LLM that run locally on your CPU
SdkVercelAI - you can input a prompt, pick different LLMS, and compare two side by side
ChatwithData.ai - AI tool that lets you extract valuable insights and information from data files effortlessly
Open Assistant - a completely open-source ChatGPT alternative
HuggingChat - first open-source alternative to ChatGPT Powered by Open Assistant's latest model
ChatPDF - chat with any PDF
PdfGPT - a tool where you can upload pdf and get summaries, answers to your questions by OpenAI
Baize - an open-source chat model trained with LoRA. It uses 100k dialogs generated by letting ChatGPT chat with itself
Chameleon - a compositional reasoning framework designed to enhance LLMs and overcome their inherent limitations, such as outdated information and lack of precise reasoning

Offline-Mode

OpenLLM - an open-source platform designed to facilitate the deployment and operation of LLMs in real-world applications
LM Studio - an easy way to run open-source LLMs locally
Jan - open-source ChatGPT alternative that runs 100% offline on your computer
Pinokio - a browser that lets you install, run, and programmatically control ANY application, automatically

Large Visual Language Models (LVLMs)

PaliGemma - a powerful open VLM inspired by PaLI-3, optimized for image captioning, visual Q&A and other image labeling tasks, by Google
Idefics2 - it can answer questions about images, describe visual content, create stories grounded in multiple images, extract information from documents, and perform basic arithmetic operations
Grok-1.5 Vision - can process a wide variety of visual information, including documents, diagrams, charts, screenshots, and photographs, by xAI
AnyText - Multilingual Visual Text Generation And Editing
Qwen-VL - multimodal version of the large model series. Accepts image, text, and bounding box as inputs, outputs text and bounding box
AnomalyGPT - the LVLM based Industrial Anomaly Detection (IAD) method that can detect anomalies in industrial images without the need for manually specified thresholds
IDEFICS - an open-access VLM based on Flamingo. The model accepts arbitrary sequences of image and text inputs and produces text outputs, aiming to bring transparency to AI systems and serve as a foundation for open research in multimodal AI systems
Prismer - a data- and parameter-efficient VLM that leverages an ensemble of diverse, pre-trained domain experts
MiniGPT-4 - upload an image, and then use chat to identify what's in the picture and learn more about it
MultiModal-GPT - a vision and language model for multi-round dialogue with humans; the model is fine-tuned from OpenFlamingo, with LoRA added in the cross-attention and self-attention parts of the language model
LLaVA - a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding
TaskMatrix - connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting

Evaluation

Vibe-Eval - evaluation suite for measuring progress of multimodal language models, by Reka
FACET (FAirness in Computer Vision EvaluaTion) - a new comprehensive benchmark for evaluating the fairness of computer vision models across classification, detection, instance segmentation, and visual grounding tasks
Arthur Bench - an open-source evaluation tool for comparing LLMs, prompts, and hyperparameters for generative text models
AgentBench - the first benchmark designed to evaluate LLM-as-Agent across a diverse spectrum of different environments
L-Eval - a comprehensive long-context language models evaluation suite with 18 long document tasks across multiple domains that require reasoning over long texts, including summarization, question answering, in-context learning with long CoT examples, topic retrieval, and paper writing assistance
OpenICL - an open-source toolkit for in-context learning and LLM evaluation; supports various state-of-the-art retrieval and inference methods, tasks, and zero-/few-shot evaluation of LLMs
OpenAGI - an open-source AGI research platform, specifically designed to offer complex, multi-step tasks and accompanied by task-specific datasets, evaluation metrics, and a diverse range of extensible models

Leaderboards:

Chatbot Arena - an open platform to evaluate LLMs by human preference in the real-world
Open LLM Leaderboard - evaluate models on 6 key benchmarks using the Eleuther AI Language Model Evaluation Harness, a unified framework to test generative language models on a large number of different evaluation tasks
LLM-Perf Leaderboard - a benchmark the performance (latency, throughput, memory & energy) of LLMs with different hardwares, backends and optimizations using Optimum-Benhcmark
Hallucinations Leaderboard - evaluates the propensity for hallucination in LLMs across a diverse array of tasks, including Closed-book Open-domain QA, Summarization, Reading Comprehension, Instruction Following, Fact-Checking, and Hallucination Detection
NPHardEval leaderboard - a benchmark for assessing the reasoning abilities of LLMs through the lens of computational complexity classes
LLM Safety Leaderboard - evaluation for LLM safety and help researchers and practitioners better understand the capabilities, limitations, and potential risks of LLMs
The Open Medical-LLM Leaderboard - aims to track, rank and evaluate the performance of LLMs on medical question answering tasks
TheFastest.AI - site that provides reliable measurements for the performance of popular models

Libraries

LangChain, docs - a framework for developing applications powered by language models
LlamaIndex, docs - a “data framework” to help you build LLM apps
LLaMA2-Accessory - an open-source toolkit for pre-training, fine-tuning and deployment of LLMs and mutlimodal LLMs
LLaMA-Adapter - a lightweight adaption method for fine-tuning Instruction-following and Multi-modal LLaMA models
streaming-llm - Efficient Streaming Language Models with Attention Sinks
llamafile - run LLMs with a single file
outlines, docs - a library to write reliable programs for interactions with generative models: language models, diffusers, multimodal models, classifiers, etc
OneLLM - One Framework to Align All Modalities with Language
guidance - interleave generation, prompting, and logical control into a single continuous flow matching how the language model actually processes the text
agents - an open-source library/framework for building autonomous language agents
nanoGPT - the simplest, fastest repository for training/finetuning medium-sized GPTs
TorchScale - a PyTorch library that allows researchers and developers to scale up Transformers efficiently and effectively
InvokeAI - an implementation of Stable Diffusion, the open source text-to-image and image-to-image generator
ComfyUI - a powerful and modular Stable Diffusion GUI and backend. This UI will let you design and execute advanced stable diffusion pipelines using a graph/nodes/flowchart based interface
StableSwarmUI - Modular Stable Diffusion Web-User-Interface, with an emphasis on making powertools easily accessible, high performance, and extensibility
Wanda - Pruning LLMs by Weights and Activation: removes weights on a per-output basis, by the product of weight magnitudes and input activation norms
LOMO: LOw-Memory Optimization - a new optimizer, which fuses the gradient computation and the parameter update in one step to reduce memory usage
LMFlow - an extensible, convenient, and efficient toolbox for finetuning large machine learning models, designed to be user-friendly, speedy and reliable, and accessible to the entire community
Heron - a library that seamlessly integrates multiple Vision and Language models, as well as Video and Language models. Additionally, we provide pretrained weights trained on various datasets
Curated Transformers - a transformer library for PyTorch. It provides state-of-the-art models that are composed from a set of reusable components, by Explosion
spacy-llm - integrates LLMs into spaCy, featuring a modular system for fast prototyping and prompting, and turning unstructured responses into robust outputs for various NLP tasks, no training data required, by Explosion
Medusa - a simple framework that democratizes the acceleration techniques for LLM generation with multiple decoding heads
Self-RAG - a new framework to train an arbitrary LM to learn to retrieve, generate, and critique to enhance the factuality and quality of generations, without hurting the versatility of LLMs
OpenAgents - an open platform for using and hosting language agents in the wild of everyday life
Mirascope, docs - a toolkit for developing production-ready LLM-powered tools using Python and Pydantic
gateway — route to 100+ open & closed source models with a unified API. It is also production-ready with support for caching, fallbacks, retries, timeouts, loadbalancing, and can be edge-deployed for minimum latency

Devices

Frame AI glasses - , by Brilliant Labs
Ray-Ban Meta Smart Glasses - a 12 MP camera and five-mic system, updates, by Ray-Ban & MetaAI
LPU Inference Engine - Language Processing Units, by Groq
FigureAI - AI robotics company bringing a general purpose humanoid to life
SanctuaryAI - company on a mission to create the world’s first human-like intelligence in general-purpose robots
Limitless - personalized AI powered by what you’ve seen, said, and heard
rabbit r1 - a personalized operating system through a natural language interface
Open Interpreter - a new computer (the 01) with Open Interpreter at the center

Income

Poe - price-per-message revenue model for AI bot creators
GPTs Store - create custom versions of ChatGPT that combine instructions, extra knowledge, and any combination of skills
Voice Library - share your voice in the Voice Library today and earn cash rewards when it's used
HuggingChat - making the community's best AI chat models available to everyone

Tools

Text-to-Image	Text-to-Music	Text-to-Video	Games	Brand	Prompt Generator
Midjourney	Mubert	GENMO	Leonardo.Ai - Assets	Flair	G-prompter
Adobe Firefly	Waveformer	PIKA LABS	Dreamlab - Animated Sprites	Logolivery	Prompt Builder
Catbird		Kaiber	Didimo		Midjourney PromptHelper1
BlueWillow		Invidio	Scenario - Assets		Midjourney PromptHelper2
Lexica		Moonvalley	Skybox - World-building		FlowGPT
Playground		Morph Studio	ilumine AI		Anthropic
Imgcreator		Haiper	Bezi - 3D Assets
Craiyon		LTX Studio	Charmed - 3D Assets

Text-to-image

	Models
Google	Muse, Imagen, Parti, HyperDreamBooth, DreamBooth StyleDrop, Imagen 2, ImageFX, Imagen 3
OpenAI	CLIP, DALL·E, DALL·E 2, DALL·E 3
MetaAI	CM3leon, Emu Video, Emu Edit, Imagine
stability.ai	Stable Diffusion XL, DreamStudio, Clipdrop, DeepFloyd IF: (Code, Demo: HF) SDXL Turbo, Stable Cascade, Stable Diffusion 3

Distribution Matching Distillation - one-step generator achieves comparable image quality with StableDiffusion v1.5 while being 30x faster
Generative Powers of Ten - a method that uses a text-to-image model to generate consistent content across multiple image scales, enabling extreme semantic zooms into a scene, e.g., ranging from a wide-angle landscape view of a forest to a macro shot of an insect sitting on one of the tree branches
Playground v2 - open weights - an early preview of our efforts to make increasingly powerful graphics models
Delta Denoising Score - a novel scoring function for text-based image editing that guides minimal modifications of an input image towards the content described in a target prompt
Prompt-to-Prompt - editing framework, where the edits are controlled by text only
OpenCLIP - an open source implementation of OpenAI's CLIP (Contrastive Language-Image Pre-training)
LEDITS - combined lightweight approach for real-image editing, incorporating the Edit Friendly DDPM inversion technique with Semantic Guidance, thus extending Semantic Guidance to real image editing, while harnessing the editing capabilities of DDPM inversion
Würstchen - Fast Diffusion for Image Generation
ExactlyAI - create images in seconds with an AI that understands your style
ConceptLab - generative models have enabled us to transform our words into vibrant, captivating imagery
IP-Adapter - Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
MATCHAI - a powerful web app that can copy the color grading from images so you can apply it to your own
Ideogram - AI tools that will make creative expression more accessible, fun, and efficient
Picogen - nonofficial API to Midjourney AI, Stability AI and DALLE-2 AI
FABRIC - Feedback via Attention-Based Reference Image Conditioning - a technique to incorporate iterative feedback into the generative process of diffusion models based on StableDiffusion
Controlling Text-to-Image Diffusion by Orthogonal Finetuning (OFT) - for adapting text-to-image diffusion models to downstream tasks
InstructPix2Pix Learning to Follow Image Editing Instructions - a method for editing images from human instructions: given an input image and a written instruction that tells the model what to do, our model follows these instructions to edit the image
Composer - a large (5 billion parameters) controllable diffusion model trained on billions of (text, image) pairs. It can exponentially expand the control space through composition, leading to an enormous number of ways to generate and manipulate images, i.e., making the infinite use of finite means
GigaGAN: Large-scale GAN for Text-to-Image Synthesis - changing texture with prompting, changing style with prompting, by Adobe Research

Multi-modal

ImageBind, Demo, Code - Image->Audio, Audio->Image, Text->Image&Audio, Aidio&Image->Image, Audio->Generated Image, by MetaAI
GEN-1, Research - use words and images to generate new videos out of existing ones by Runway: AI-Magic-Tools
GEN-2, Research - create videos in any style you can imagine with Text to Video generation by Runway: AI-Magic-Tools
- Mode 01: Text to Video: Synthesize videos in any style you can imagine using nothing but a text prompt. If you can say it, now you can see it
- Mode 02: Text + Image to Video: Generate a video using a driving image and a text prompt
- Mode 03: Image to Video: Generate video using just a driving image (Variations Mode)
- Mode 04: Stylization: Transfer the style of any image or prompt to every frame of your video
- Mode 05: Storyboard: Turn mockups into fully stylized and animated renders
- Mode 06: Mask: Isolate subjects in your video and modify them with simple text prompts
- Mode 07: Render: Turn untextured renders into realistic outputs by applying an input image or prompt
- Mode 08: Customization: Unleash the full power of Gen-2 by customizing the model for even higher fidelity results
MONSTER API
- text-to-image: a latent text-to-image diffusion model capable of generating photo-realistic images conditioned on text descriptions
- image-to-image: a latent diffusion model capable of generating photo-realistic generating image-to-image translations guided by a text prompt
- instruct-pix2pix: a model enables fast and effective image editing based on simple instructions

Images

PhotoMaker - Customizing Realistic Human Photos via Stacked ID Embedding
DeWatermark - Remove Watermark from photos online free with AI; Upscales - Upscale Images with AI upto 4K
NSF - Neural Spline Fields for Burst Image Fusion and Layer Separation
Material Palette - a method to extract Physically-Based-Rendering (PBR) materials from a single real-world image
DiffusionLight - a simple yet effective technique to estimate lighting in a single input image
KREA - generate images and videos with a delightful AI-powered design tool
Magnific - the image Upscaler & Enhancer
Stable Signature - a new method for watermarking images, by MetaAI
wasitai - check if an image was generated by a machine
Textify - a tool for replacing the gibberish in AI-generated images with your desired text
Interpolating between Images with Diffusion Models - a method for zero-shot controllable interpolation using latent diffusion models
AnyDoor: Zero-shot Object-level Image Customization - a diffusion-based image generator with the power to move target objects to new scenes at user-specified locations in a harmonious way
Matting Anything, Code, Demo: HF - an efficient and versatile framework for estimating the alpha matte of any instance in an image with user-prompt guidance
Plug-and-Play, Code - a large-scale text-to-image generative models have been a revolutionary breakthrough in the evolution of generative AI, allowing us to synthesize diverse images that convey highly complex visual concepts
Real-Time Neural Appearance Models - a complete system for real-time rendering of scenes with complex appearance previously reserved for offline use, by NVIDIA
Designer, Microsoft Designer expands preview with new AI design features by Microsoft. Designer has all the tools you’d expect, plus a few AI superpowers. Generate stunning designs and original images just by typing what you want. Get writing assistance and automatic layout suggestions for anything you add. Designer can even propose captions and hashtags to make social media sharing effortless
Scribble Diffusion - turn your sketch into a refined image using AI
StudioGPT - a tool for reimagining an existing image

Computer Vision

TAO-Amodal - benchmark is a dataset that includes amodal and modal bounding boxes for visible and occluded objects
OMG-Seg - One Model that is Good enough to efficiently and effectively handle all the segmentation tasks, including image semantic, instance, and panoptic segmentation, as well as their video counterparts, open vocabulary settings, prompt-driven, interactive segmentation like SAM, and video object segmentation
PUG (Photorealistic Unreal Graphics) - 3 datasets for representation learning research
Tracking Anything in High Quality - a framework for high performance video object tracking and segmentation
DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data - a new benchmark of synthetic image triplets that span a wide range of mid-level variations, labeled with human similarity judgments
CoTracker - an architecture that jointly tracks multiple points throughout an entire video, by MetaAI
TAPIR - a model for Tracking Any Point (TAP) that effectively tracks a query point in a video sequence, by Google DeepMind
DreamTeache - a self-supervised feature representation learning framework that utilizes generative networks for pre-training downstream image backbones, by NVIDIA
V-JEPA - Video Joint Embedding Predictive Architecture is an early example of a physical world model excels at detecting and understanding highly detailed interactions between objects
I-JEPA, Code - Image Joint Embedding Predictive Architecture is a method for self-supervised learning. At a high level, I-JEPA predicts the representations of part of an image from the representations of other parts of the same image
Visual Prompting - an innovative approach that takes text prompting, used in applications such as ChatGPT, to computer vision
Tracking Everything Everywhere All at Once - a new test-time optimization method for estimating dense and long-range motion from a video sequence
Track-Anything - a flexible and interactive tool for video object tracking and segmentation. It is developed u
pon Segment Anything, can specify anything to track and segment via user clicks only
EdgeSAM - an accelerated variant of the SAM, optimized for efficient execution on edge devices with minimal compromise in performance
Segment Anything Model (SAM) - a new AI model that can "cut out" any object, in any image, with a single click. SAM is a promptable segmentation system with zero-shot generalization to unfamiliar objects and images, without the need for additional training. Blog: Introducing Segment Anything, Code
DINOv2 - a new method for training high-performance CV models, state-of-the-art CV models with self-supervised learning
Behind the Scenes: Density Fields for Single View Reconstruction - a neural network that predicts an implicit density field from a single image

Video & Animation

VideoFX - a new experimental tool powered by Veo. It’s designed to help support creatives through the storytelling journey, by Google
Veo - generates high-quality 1080p resolution videos in a wide range of cinematic and visual styles that can go beyond a minute, by Google
VideoGigaGAN: Towards Detail-rich Video Super-Resolution - a generative VSR model that can produce videos with high-frequency details and temporal consistency, by Adobe Research
VASA-1 - Lifelike Audio-Driven Talking Faces Generated in Real Time, by Microdoft
MagicTime - Time-lapse Video Generation Models as Metamorphic Simulators
Stable Video Diffusion - a foundation model for generative video based on the image model Stable Diffusiona
EMO - Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
SORA - a model (a latent diffusion model that learned to transform noise into videos using an encoder-decoder and transformer) that can create realistic and imaginative scenes from text instructions, by OpenAI
LUMIERE - A Space-Time Diffusion Model for Video Generation: Text-to-Video, Image-to-Video, Stylized Generation, Video Stylization, Cinemagraphs, Video Inpainting
ActAnywhere - Subject-Aware Video Background Generation
MagicVideo-V2 - integrates the text-to-image model, video motion generator, reference image embedding module and frame interpolation module into an end-to-end video generation pipeline
I2VGen-XL - High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models
StreamDiffusion - an innovative diffusion pipeline designed for real-time interactive generation
WALT - Window Attention Latent Transformer - a transformer-based method for latent video diffusion models (LVDMs)
Hotshot - GIF generator
Unscreen - remove video background
Motrica - technologies and tools for advanced character animation
CoDeF - Content Deformation Fields for Temporally Consistent Video Processing
MagicEdit - supports various editing applications, including video stylization, local editing, video-MagicMix and video outpainting
To Infinity and Beyond - an approach to generating high-quality episodic content for IP's (Intellectual Property) using LLMs, custom state-of-the art diffusion models and our multi-agent simulation for contextualization, story progression and behavioral control
PlazmaPunk - create your own music video with the power of AI
Video-LLaMA, Code, Demo: HF - a multi-model LLM that achieves video-grounded conversations between humans and computers by connecting language decoder with off-the-shelf unimodal pre-trained models
AnimateDiff prompt travel - AnimateDiff with prompt travel + ControlNet + IP-Adapter
AnimateDiff, Code - Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
Animate-A-Story - a video storytelling approach which can synthesize high-quality, structure-controlled, and character-controlled videos
Zeroscope - a watermark-free Modelscope-based video model optimized for producing high-quality 16:9 compositions and a smooth video output
Klap - a tool that analyzes the video and finds short clips
Lalamu - low-quality video lip sync with preselected videos/video templates (take clips from videos, give the video new audio, and then the lips will sync up to that new audio within the video)
D-ID - uses generative AI to create customized videos featuring talking avatars at a touch of a button for businesses and creators.
Rooms.xyz - create & remix interactive rooms from your browser
Wonder Dynamics - an AI tool that automatically animates, lights, and composes CG characters into a live-action scene
REVELxyz - a tool for creating Animated Avatars from a single photo
ANIMATED DRAWINGS - a tool that brings children's drawings to life, by animating characters to move around, by MetaAI
RERENDER A VIDEO, Demo: HF - a novel zero-shot text-guided video-to-video translation framework to adapt image models to videos
Roop, Code - take a video and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training
Text2Performer - Text-Driven Human Video Generation, where a video sequence is synthesized from texts describing the appearance and motions of a target performer
DragGAN, Code, Demo: HF - way of controlling GANs, that is, to "drag" any points of the image to precisely reach target points in a user-interactive manner. Through DragGAN, anyone can deform an image with precise control over where pixels go, thus manipulating the pose, shape, expression, and layout of diverse categories such as animals, cars, humans, landscapes, etc
DragDiffusion - Harnessing Diffusion Models for Interactive Point-based Image Editing
In-N-Out: Face Video Inversion and Editing with Volumetric Decomposition - our core idea is to represent the face in a video using two neural radiance fields, one for in-distribution and the other for out-of-distribution data, and compose them together for reconstruction
High-Resolution Video Synthesis with Latent Diffusion Models - Latent Diffusion Models (LDMs) enable high-quality image synthesis while avoiding excessive compute demands by training a diffusion model in a compressed lower-dimensional latent space, by NVIDIA

3D

InstantMesh - Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models
Spline - Generate 3D objects from text prompts and images
SIMA - a Scalable Instructable Multiworld Agent (SIMA) that can follow natural-language instructions to carry out tasks in a variety of video game settings
Stable Video 3D - Quality Novel View Synthesis and 3D Generation from Single Images, by Stability AI
TripoSR - Fast 3D Object Generation from Single Images, by Stability AI
BlendNeRF - 3D-aware Blending with Generative NeRFs
4DGen - Grounded 4D Content Generation with Spatial-temporal Consistency
MobileBrick - Building LEGO for 3D Reconstruction on Mobile Devices. A novel data capturing and 3D annotation pipeline to obtain precise 3D ground-truth shapes without relying on expensive 3D scanners
PoseGPT - Chatting about 3D Human Pose
ProlificDreamer - High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation
Stable Zero123 - 3D Object Generation from Single Images
SMERF - Streamable Memory Efficient Radiance Fields for Real-Time Large-Scene Exploration
DreamCraft3D - a hierarchical 3D content generation method that produces high-fidelity and coherent 3D objects
Genie - 3D fundational model, by Lumalabs
Masterpiece X - the generative text-to-3D app that allows users to create 3D objects and characters complete with mesh, texture, and animations
GAUSSIAN SPLAT - a rasterization technique for 3D reconstruction and rendering
SyncDreamer - generating multiview-consistent images from a single-view image
MAV3D (Make-A-Video3D) - a method for generating three-dimensional dynamic scenes from text descriptions. Our approach uses a 4D dynamic Neural Radiance Field (NeRF), which is optimized for scene appearance, density, and motion consistency by querying a Text-to-Video (T2V) diffusion-based model
HiFA - High-fidelity Text-to-3D with Advanced Diffusion Guidance
AutoRecon - a framework named for the automated discovery and reconstruction of an object from multi-view images
BITE - enables 3D shape and pose estimation of dogs from a single input image. The model handles a wide range of shapes and breeds, as well as challenging postures far from the available training poses, like sitting or lying on the ground
CSM (Common Sense Machines) - generate your own textured 3D assets
MotionGPT: Human Motion as Foreign Language - a unified, versatile, and user-friendly motion-language model to handle multiple motion-relevant tasks
PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360° - the first 3D-aware generative model that enables high-quality view-consistent image synthesis of full heads in 360° with diverse appearance and detailed geometry using only in-the-wild unstructured images for training
AvatarBooth - a text-to-3D model. It creates an animatable 3D model with your word description. Also, it can generate customized model with 4~6 photos from your phone or a character design generated from diffusion model
Infinigen, Code - a procedural generator of 3D scenes, creating depth maps and labeling every aspect of the world it generates, by Princeton Vision & Learning Lab
USD - Universal Scene Description - an open and extensible framework and ecosystem for describing, composing, simulating and collaborating within 3D worlds, originally developed by Pixar Animation Studios
Shap-E: Demo, Code - a conditional generative model for 3D assets, by OpenAI
Neural Kernel Surface Reconstruction, Code- a novel method for reconstructing a 3D implicit surface from a large-scale, sparse, and noisy point, by NVIDIA
Neuralangelo - a framework for high-fidelity 3D surface reconstruction from RGB video captures. Using ubiquitous mobile devices, we enable users to create digital twins of both object-centric and large-scale real-world scenes with highly detailed 3D geometry, by NVIDIA
Rodin Diffusion - a Generative Model for Sculpting 3D Digital Avatars, by Microsoft
3D Gaussian Splatting for Real-Time Radiance Field Rendering - three key elements that allow us to achieve state-of-the-art visual quality while maintaining competitive training times and importantly allow high-quality real-time (≥ 100 fps) novel-view synthesis at 1080p resolution
ConsistentNeRF - a method that leverages depth information to regularize both multi-view and single-view 3D consistency among pixels
Text2NeRF - a text-driven 3D scene generation framework, combines the neural radiance field (NeRF) and a pre-trained text-to-image diffusion model to generate diverse view-consistent indoor and outdoor 3D scenes from natural language descriptions
Zip-NeRF - a technique that combines mip-NeRF 360 and grid-based models such as Instant NGP
S-NeRF - a new street-view NeRF (S-NeRF) that considers novel view synthesis of both the large-scale background scenes and the foreground moving vehicles jointly
Mip-NeRF 360 - Unbounded Anti-Aliased Neural Radiance Fields, an extension of mip-NeRF that uses a non-linear scene parameterization, online distillation, and a novel distortion-based regularizer to overcome the challenges presented by unbounded scenes
3D-aware Conditional Image Synthesis - a 3D-aware conditional generative model for controllable photorealistic image synthesis. Given a 2D label map, such as a segmentation or edge map, our model synthesizes a photo from different viewpoints
Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior - can create high-fidelity 3D content from only a single image
Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models - generates textured 3D meshes from a given text prompt using 2D text-to-image models
Objaverse-XL - an open dataset of over 10 million 3D objects
OmniObject3D - a large vocabulary 3D object dataset with massive high-quality real-scanned 3D objects to facilitate the development of 3D perception, reconstruction, and generation in the real world

Audio & Speech & Music

MetaAI

Audiobox - generate voices and sound effects using a combination of voice inputs and natural language text prompts — making it easy to create custom audio for a wide range of use cases
Seamless - system that unlocks expressive cross-lingual communication in real time
SeamlessM4T - a foundational multilingual and multitask model that seamlessly translates and transcribes across speech and text: automatic speech recognition, speech-to-text and speech-to-speech translation, text-to-text and text-to-speech translation
AudioCraft - simple framework that generates high-quality, realistic audio and music from text-based user inputs after training on raw audio signals as opposed to MIDI or piano rolls
- MusicGen, Demo: HF, Code - a simple and controllable model for music generation
- AudioGen - an auto-regressive generative model that generates audio samples conditioned on text inputs
- EnCodec - a neural network that is trained end to end to reconstruct the input signal
MuAViC - a Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
Voicebox - Text-Guided Multilingual Universal Speech Generation at Scale

Google

MusicFX - a new experimental tool that enables users to generate their own music using AI
SingSong - a system which generates instrumental music to accompany input vocals
SynthID - users can embed a digital watermark directly into AI-generated images or audio they create
AudioPaLM - a LLM for speech understanding and generation
MusicLM, Demo - a model generating high-fidelity music from text descriptions such as "a calming violin melody backed by a distorted guitar riff"
Universal Speech Model (USM) - a state-of-the-art speech AI for 100+ languages

Eleven Labs

Dubbing Studio - a tool, enabling automatic, end-to-end video translation across 29 languages. hands-on control over transcript, translation, timing, and more
Speech to Speech - a tool that lets you turn the recording of one voice to sound as if spoken by another
Eleven Multilingual v2 - a Foundational AI Speech Model for Nearly 30 Languages
Eleven Multilingual v1, Demo - generate top-quality spoken audio in any voice and style with the most advanced and multipurpose AI speech tool out there
AI Speech Classifier, Demo - detect whether an audio clip was created using ElevenLab

Other

Chatter - an interactive podcast, by Hume
OpenVoice, OpenVoice2 - a versatile instant voice cloning approach that requires only a short audio clip from the reference speaker to replicate their voice and generate speech in multiple languages
Voice Engine - a model for creating custom voices, by OpenAI
Udio - discover, create, and share music with the world
Image to SFX - compare sound effects generation models from image caption
DubbingAI - AI tool can convert your voice into high-quality cloned voices—from celebrities to your favorite gaming characters—in real time
Lyria - AI music generation model
StockMusic - a platform for AI-generated tunes that allows you to generate up to 10 minutes of copyright-free music
Stable Audio, Stable Audio 2.0 - a system that generates music and sound effects from text
RIFFUSION - the model to generate images of spectrograms and can then be converted to an audio clip
CLAP - you can extract a latent representation of any given audio and text for your own model, or for different downstream task
Vscoped - effortlessly transcribe your video content to boost click-through rates and watch time
MERT, Code, Demo: HF - an Acoustic Music Understanding Model with Large-Scale Self-supervised Training
Ecoute - a live transcription tool that provides real-time transcripts for both the user's microphone input (You) and the user's speakers output (Speaker) in a textbox. It also generates a suggested response using OpenAI's GPT-3.5 for the user to say based on the live transcription of the conversation
SadTalker: Demo - Stylized Audio-Driven Single Image Talking Face Animation
Recast - turn your want-to-read articles into rich audio summaries
AudioGPT, Demo: HuggingFace, Code - Understanding and Generating Speech, Music, Sound, and Talking Head
Chirp - music model, generates realistic audio - including speech, music and sound effects
Bark - a transformer-based text-to-audio model created, by Suno. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also produce nonverbal communication like laughing, sighing and crying
Whisper - an automatic speech recognition (ASR) system, that approaches human level robustness and accuracy on English speech recognition
Musicfy - music like you've never heard. Create and discover AI covers of your favorite songs
Jukebox - learned to compress their training set and generated audio from this compressed space
Koe Recast - transform your voice using AI

Code & Math

Llemma - an open language model for mathematics (repository also contains submodules related to the overlap, fine-tuning, and theorem proving experiments described in the paper)
Stable Code Instruct 3B - instruction-tuned Code LM based on Stable Code 3B, handle a variety of tasks such as code generation, math and other software development related queries, by Stability AI
Devin - first fully autonomous AI software engineer
AlphaCodium - a test-based, multi-stage, code-oriented iterative flow, that improves the performances of LLMs on code problems
alphageometry - Solving Olympiad Geometry without Human Demonstrations, by Google DeepMind
sketch-2-app - generate code based on sketch
GPT Pilot - a true AI developer that writes code, debugs it, talks to you when it needs help, etc
FunSearch - a method to search for new solutions in mathematics and computer science
MAmmoTH - a series of open-source LLMs specifically tailored for general math problem-solving. The MAmmoTH models are trained on MathInstruct, our meticulously curated instruction tuning dataset
Defog - a state-of-the-art LLM for converting natural language questions to SQL queries, which outperforms major open-source models and slightly outperforms gpt-3
v0 - a generative user interface system. It generates copy-and-paste friendly React code based on Shadcn UI and Tailwind CSS that people can use in their projects, by Vercel Labs
Open Interpreter - an open-source, locally running implementation of OpenAI's Code Interpreter
SafeCoder - a code assistant solution built for the enterprise. In marketing speak: “your own on-prem GitHub copilot”, by Hugging Face
Code Llama - a state-of-the-art LLM capable of generating code, and natural language about code, from both code and natural language prompts, by MetaAI
StableCode - LLM generative AI product for coding designed to assist programmers with their daily work, by Stability AI
Teaching Arithmetic to Small Transformers - small transformers, trained from random initialization, can efficiently learn arithmetic operations such as addition, multiplication, and elementary functions like square root, using the next-token prediction objective
InterCode - framework of interactive coding as a standard reinforcement learning (RL) environment, with code as actions and execution feedback as observations
CodeGen2.5 - LLMs for program synthesis, by Salesforce
LeanDojo - set of open-source LLM-based theorem provers without any proprietary datasets and release it under a permissive MIT license to facilitate further research
GPT Engineer - is made to be easy to adapt, extend, and make your agent learn how you want your code to look. It generates an entire codebase based on a prompt
CodeTF - a one-stop Python transformer-based library for code large language models (Code LLMs) and code intelligence, provides a seamless interface for training and inferencing on code intelligence tasks like code summarization, translation, code generation and so on. It aims to facilitate easy integration of SOTA CodeLLMs into real-world applications
Let’s Verify Step by Step - a new state-of-the-art in mathematical problem solving by rewarding each correct step of reasoning (“process supervision”) instead of simply rewarding the correct final answer (“outcome supervision”), by OpenAI
🦍 Gorilla: LLM Connected with Massive APIs - a finetuned LLaMA-based model that surpasses GPT-4 on writing API calls
CodeT5 and CodeT5+ - models can be deployed as an AI-powered coding assistant to boost the productivity of software developers, by Salesforce
Framer - a tool that constructs a completely unique website for you based on a text prompt
Pico - a tool that use GPT4 to instantly build simple, shareable web apps

Games

Genie - a foundation world model trained from Internet videos that can generate an endless variety of playable (action-controllable) worlds from synthetic images, photographs, and even sketches, by Google DeepMind
PokemonRedExperiments - train RL agents to play Pokemon Red
BitMagic - game creation
AI Town - a deployable starter kit for building and customizing your own version of AI town - a virtual town where AI characters live, chat and socialize
Generative Agents: Interactive Simulacra of Human Behavior - contains our core simulation module for generative agents—computational agents that simulate believable human behaviors—and their game environment
STEVE-1 - a Generative Model for Text-to-Behavior in Minecraft
Mastering Stratego - DeepNash, an AI agent that learned the game from scratch to a human expert level by playing against itself
Voyager: An Open-Ended Embodied Agent with LLMs - the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention

Robotics

LeRobot - aims to provide models, datasets, and tools for real-world robotics in PyTorch
DrEurek - Language Model Guided Sim-To-Real Transfer
UniSim - a real-world simulator range from controllable content creation in games and movies to training embodied agents purely in simulation that can be directly deployed in the real world
JAT (Jack of All Trades) - a transformer-based agent capable of playing video games, controlling a robot to perform a wide variety of tasks, understanding and executing commands in a simple navigation environment
OpenEQA - from word models to world models, by MetaAI
Mobile ALOHA - Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation, by Stanford
AutoRT, SARA-RT and RT-Trajectory - by Google DeepMind
Robot Parkour Learning - a system for learning a single end-to-end vision-based parkour policy of diverse parkour skills using a simple reward without any reference motion data
Open X-Embodiment - Robotic Learning Datasets and RT-X Models
Eureka - a human-level reward design algorithm powered by LLMs, by NVIDIA
Language to rewards for robotic skill synthesis - an approach to teaching robots novel actions through natural language input is proposed, using reward functions as an interface to bridge the gap between language and low-level robot actions
VIMA - General Robot Manipulation with Multimodal Prompts
RT-2 - a novel vision-language-action (VLA) model that learns from both web and robotics data, and translates this knowledge into generalised instructions for robotic control, by Google DeepMind
Robots That Ask For Help - a framework for measuring and aligning the uncertainty of LLM-based planners such that they know when they don't know and ask for help when needed
ViNT: A Foundation Model for Visual Navigation - a goal-conditioned navigation policy trained on diverse, cross-embodiment training data, and can control many different robots in zero-shot
Navigating to Objects in the Real World -
RVT: Robotic View Transformer - a multi-view transformer for 3D manipulation that is both scalable and accurate. RVT takes camera images and task language description as inputs and predicts the gripper pose action, by NVIDIA
TidyBot - personalized Robot Assistance with Large Language Models
Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning - by OP3 Soccer Team, by Google DeepMind
PaLM-E: An Embodied Multimodal Language Model - embodied language models to directly incorporate real-world continuous sensor modalities into language models and thereby establish the link between words and percepts
Scaling Robot Learning with Semantically Imagined Experience -
Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware - low-cost system that performs end-to-end imitation learning directly from real demonstrations, collected with a custom teleoperation interface

Typography

ControlNet, Demo: HF, How to make a QR code with Stable Diffusion - QR Code Conditioned ControlNet Models for Stable Diffusion. They provide a solid foundation for generating QR code-based artwork that is aesthetically pleasing, while still maintaining the integral QR code shape
Word-As-Image for Semantic Typography - A few examples of our Word-As-Image illustrations in various fonts and for different textual concept. The semantically adjusted letters are created completely automatically using our method, and can then be used for further creative design as we illustrate here
DS-Fusion: Artistic Typography via Discriminated and Stylized Diffusion - create artistic typography automatically, a novel method to automatically generate artistic typography by stylizing one or more letter fonts to visually convey the semantics of an input word, while ensuring that the output remains readable

Bio & Med

AlphaFold 3 - a new AI model that predict the structure of proteins, DNA, RNA, ligands and more, and how they interact, by Google DeepMind and Isomorphic Labs
AMIE - a research AI system for diagnostic medical reasoning and conversations, by Google
MentalLLaMA - mental health analysis with LLMs
AlphaMissense - an AI model classifying missense variants to help pinpoint the cause of diseases
evodiff - combines evolutionary-scale data with diffusion models for controllable protein sequence generation
SAM-Med2D - applying the Segment Anything Model (SAM) to medical 2D images
Med-Flamingo - a medical vision-language model with multimodal in-context learning abilities
Brain2Music - Reconstructing Music from Human Brain Activity
Seeing the World through Your Eyes - reconstruct a 3D scene beyond the camera's line-of-sight using portrait images containing eye reflections
Mind-Video - High-quality Video Reconstruction from Brain Activity
Med-PaLM - a large language model (LLM) designed to provide high-quality answers to medical questions
PMC-LLaMA - the official codes for "PMC-LLaMA: Continue Training LLaMA on Medical Papers"

Military

AIP Pillars - activate LLMs and other AI on your private network, subject to full control

Climat

GraphCast - AI model for faster and more accurate global weather forecasting
ClimaX A foundation model for weather and climate - a flexible and generalizable deep learning model for weather and climate science. Introducing ClimaX: The first foundation model for weather and climate

Other: Fin, Presentation

GNoME - DL tool that dramatically increases the speed and efficiency of discovery by predicting the stability of new materials
FinGPT
guidde - create documentation/presentation/FAQ from captured video
Gamma - create visually appealing presentations
Tome - create a compelling starting point for your presentation in minutes

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
LICENSE		LICENSE
README.md		README.md

License

PetroIvaniuk/llms-tools

Folders and files

Latest commit

History

Repository files navigation