Skip to content

dylanhogg/awesome-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome Python

Awesome Last commit License: MIT

Hand-picked awesome Python libraries and frameworks, organised by category 🐍

Interactive version: www.awesomepython.org

Updated 13 Jul 2025

Categories

  • Newly Created Repositories - Awesome Python is regularly updated, and this category lists the most recently created GitHub repositories from all the other repositories here (10 repos)
  • Agentic AI - Agentic AI libraries, frameworks and tools: AI agents, workflows, autonomous decision-making, goal-oriented tasks, and API integrations (91 repos)
  • Code Quality - Code quality tooling: linters, formatters, pre-commit hooks, unused code removal (17 repos)
  • Crypto and Blockchain - Cryptocurrency and blockchain libraries: trading bots, API integration, Ethereum virtual machine, solidity (14 repos)
  • Data - General data libraries: data processing, serialisation, formats, databases, SQL, connectors, web crawlers, data generation/augmentation/checks (120 repos)
  • Debugging - Debugging and tracing tools (10 repos)
  • Diffusion Text to Image - Text-to-image diffusion model libraries, tools and apps for generating images from natural language (42 repos)
  • Finance - Financial and quantitative libraries: investment research tools, market data, algorithmic trading, backtesting, financial derivatives (34 repos)
  • Game Development - Game development tools, engines and libraries (8 repos)
  • GIS - Geospatial libraries: raster and vector data formats, interactive mapping and visualisation, computing frameworks for processing images, projections (29 repos)
  • Graph - Graphs and network libraries: network analysis, graph machine learning, visualisation (6 repos)
  • GUI - Graphical user interface libraries and toolkits (8 repos)
  • Jupyter - Jupyter and JupyterLab and Notebook tools, libraries and plugins (28 repos)
  • LLMs and ChatGPT - Large language model and GPT libraries and frameworks: auto-gpt, agents, QnA, chain-of-thought workflows, API integations. Also see the Natural Language Processing category for crossover (340 repos)
  • Math and Science - Mathematical, numerical and scientific libraries (29 repos)
  • Machine Learning - General - General and classical machine learning libraries. See below for other sections covering specialised ML areas (166 repos)
  • Machine Learning - Deep Learning - Machine learning libraries that cross over with deep learning in some way (79 repos)
  • Machine Learning - Interpretability - Machine learning interpretability libraries. Covers explainability, prediction explainations, dashboards, understanding knowledge development in training (27 repos)
  • Machine Learning - Ops - MLOps tools, frameworks and libraries: intersection of machine learning, data engineering and DevOps; deployment, health, diagnostics and governance of ML models (47 repos)
  • Machine Learning - Reinforcement - Machine learning libraries and toolkits that cross over with reinforcement learning in some way: agent reinforcement learning, agent environemnts, RLHF (23 repos)
  • Machine Learning - Time Series - Machine learning and classical timeseries libraries: forecasting, seasonality, anomaly detection, econometrics (20 repos)
  • Natural Language Processing - Natural language processing libraries and toolkits: text processing, topic modelling, tokenisers, chatbots. Also see the LLMs and ChatGPT category for crossover (88 repos)
  • Packaging - Python packaging, dependency management and bundling (28 repos)
  • Pandas - Pandas and dataframe libraries: data analysis, statistical reporting, pandas GUIs, pandas performance optimisations (25 repos)
  • Performance - Performance, parallelisation and low level libraries (28 repos)
  • Profiling - Memory and CPU/GPU profiling tools and libraries (11 repos)
  • Security - Security related libraries: vulnerability discovery, SQL injection, environment auditing (16 repos)
  • Simulation - Simulation libraries: robotics, economic, agent-based, traffic, physics, astronomy, chemistry, quantum simulation. Also see the Maths and Science category for crossover (38 repos)
  • Study - Miscellaneous study resources: algorithms, general resources, system design, code repos for textbooks, best practices, tutorials (66 repos)
  • Template - Template tools and libraries: cookiecutter repos, generators, quick-starts (11 repos)
  • Terminal - Terminal and console tools and libraries: CLI tools, terminal based formatters, progress bars (21 repos)
  • Testing - Testing libraries: unit testing, load testing, acceptance testing, code coverage, browser automation, plugins (24 repos)
  • Typing - Typing libraries: static and run-time type checking, annotations (15 repos)
  • Utility - General utility libraries: miscellaneous tools, linters, code formatters, version management, package tools, documentation tools (217 repos)
  • Vizualisation - Vizualisation tools and libraries. Application frameworks, 2D/3D plotting, dashboards, WebGL (37 repos)
  • Web - Web related frameworks and libraries: webapp servers, WSGI, ASGI, asyncio, HTTP, REST, user management (61 repos)

Newly Created Repositories

Awesome Python is regularly updated, and this category lists the most recently created GitHub repositories from all the other repositories here.

  1. google-gemini/gemini-fullstack-langgraph-quickstart ⭐ 15,479
    Demonstrates a fullstack application using a React and LangGraph-powered backend agent. The agent is designed to perform comprehensive research on a user's query.
    🔗 ai.google.dev/gemini-api/docs/google-search

  2. bytedance/deer-flow ⭐ 15,067
    DeerFlow is a community-driven Deep Research framework, combining language models with tools like web search, crawling, and Python execution, while contributing back to the open-source community.
    🔗 deerflow.tech

  3. openai/openai-cs-agents-demo ⭐ 5,540
    Demo of a Customer Service Agent interface built on top of the OpenAI Agents SDK

  4. geeeekexplorer/nano-vllm ⭐ 5,124
    A lightweight vLLM implementation built from scratch.

  5. ag-ui-protocol/ag-ui ⭐ 4,972
    AG-UI: the Agent-User Interaction Protocol. Bring Agents into Frontend Applications.
    🔗 ag-ui.com

  6. yuliang-liu/MonkeyOCR ⭐ 3,838
    A lightweight LMM-based Document Parsing Model with a Structure-Recognition-Relation Triplet Paradigm

  7. codelion/openevolve ⭐ 3,170
    Evolutionary coding agent (like AlphaEvolve) enabling automated scientific and algorithmic discovery

  8. strands-agents/sdk-python ⭐ 1,997
    A model-driven approach to building AI agents in just a few lines of code.
    🔗 strandsagents.com

  9. jennyzzt/dgm ⭐ 1,501
    Self-improving system that iteratively modifies its own code and empirically validates each change

  10. mannaandpoem/OpenManus ⭐ 30
    Open source version of Manus, the general AI agent

Agentic AI

Agentic AI libraries, frameworks and tools: AI agents, workflows, autonomous decision-making, goal-oriented tasks, and API integrations.

  1. langchain-ai/langchain ⭐ 111,211
    🦜🔗 Build context-aware reasoning applications
    🔗 python.langchain.com

  2. langgenius/dify ⭐ 106,582
    Production-ready platform for agentic workflow development.
    🔗 dify.ai

  3. logspace-ai/langflow ⭐ 84,456
    Langflow is a powerful tool for building and deploying AI-powered agents and workflows.
    🔗 www.langflow.org

  4. browser-use/browser-use ⭐ 65,222
    Browser use is the easiest way to connect your AI agents with the browser.
    🔗 browser-use.com

  5. geekan/MetaGPT ⭐ 57,123
    🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming
    🔗 mgx.dev

  6. microsoft/autogen ⭐ 47,125
    AutoGen is a framework for creating multi-agent AI applications that can act autonomously or work alongside humans.
    🔗 microsoft.github.io/autogen

  7. run-llama/llama_index ⭐ 43,023
    LlamaIndex is the leading framework for building LLM-powered agents over your data.
    🔗 docs.llamaindex.ai

  8. mem0ai/mem0 ⭐ 36,622
    Enhances AI assistants and agents with an intelligent memory layer, enabling personalized AI interactions
    🔗 mem0.ai

  9. crewaiinc/crewAI ⭐ 34,041
    Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.
    🔗 crewai.com

  10. agno-agi/agno ⭐ 29,627
    Full-stack framework for building Multi-Agent Systems with memory, knowledge and reasoning.
    🔗 docs.agno.com

  11. openbmb/ChatDev ⭐ 27,134
    ChatDev stands as a virtual software company that operates through various intelligent agents holding different roles, including Chief Executive Officer, Chief Product Officer etc
    🔗 arxiv.org/abs/2307.07924

  12. stanford-oval/storm ⭐ 26,507
    An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
    🔗 storm.genie.stanford.edu

  13. composiohq/composio ⭐ 25,556
    Composio equips your AI agents & LLMs with 100+ high-quality integrations via function calling
    🔗 docs.composio.dev

  14. microsoft/OmniParser ⭐ 22,620
    OmniParser is a comprehensive method for parsing user interface screenshots into structured and easy-to-understand elements

  15. assafelovic/gpt-researcher ⭐ 22,287
    LLM based autonomous agent that conducts deep local and web research on any topic and generates a long report with citations.
    🔗 gptr.dev

  16. yoheinakajima/babyagi ⭐ 21,635
    GPT-4 powered task-driven autonomous agent
    🔗 babyagi.org

  17. huggingface/smolagents ⭐ 21,121
    🤗 smolagents: a barebones library for agents that think in code.
    🔗 huggingface.co/docs/smolagents

  18. openai/swarm ⭐ 20,078
    A framework exploring ergonomic, lightweight multi-agent orchestration.

  19. unity-technologies/ml-agents ⭐ 18,381
    The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.
    🔗 unity.com/products/machine-learning-agents

  20. a2aproject/A2A ⭐ 18,227
    An open protocol enabling communication and interoperability between opaque agentic applications.
    🔗 a2aproject.github.io/a2a

  21. camel-ai/owl ⭐ 17,424
    🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation

  22. letta-ai/letta ⭐ 17,246
    Letta (formerly MemGPT) is a framework for creating LLM services with memory.
    🔗 docs.letta.com

  23. dzhng/deep-research ⭐ 16,977
    An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models.

  24. langchain-ai/langgraph ⭐ 15,529
    LangGraph is a library for building stateful, multi-actor applications with LLMs, built on top of (and intended to be used with) LangChain.
    🔗 langchain-ai.github.io/langgraph

  25. google-gemini/gemini-fullstack-langgraph-quickstart ⭐ 15,479
    Demonstrates a fullstack application using a React and LangGraph-powered backend agent. The agent is designed to perform comprehensive research on a user's query.
    🔗 ai.google.dev/gemini-api/docs/google-search

  26. bytedance/deer-flow ⭐ 15,067
    DeerFlow is a community-driven Deep Research framework, combining language models with tools like web search, crawling, and Python execution, while contributing back to the open-source community.
    🔗 deerflow.tech

  27. nirdiamant/GenAI_Agents ⭐ 14,307
    Tutorials and implementations for various Generative AI Agent techniques, from basic to advanced. It serves as a comprehensive guide for building intelligent, interactive AI systems.

  28. camel-ai/camel ⭐ 13,300
    🐫 CAMEL: The first and the best multi-agent framework. Finding the Scaling Law of Agents. https://www.camel-ai.org
    🔗 docs.camel-ai.org

  29. openai/openai-agents-python ⭐ 12,497
    A lightweight yet powerful framework for building multi-agent workflows. It is provider-agnostic, supporting the OpenAI Responses and Chat Completions APIs, as well as 100+ other LLMs.
    🔗 openai.github.io/openai-agents-python

  30. smol-ai/developer ⭐ 12,072
    the first library to let you embed a developer agent in your own app!
    🔗 twitter.com/smolmodels

  31. sakanaai/AI-Scientist ⭐ 11,256
    The AI Scientist, the first comprehensive system for fully automatic scientific discovery, enabling Foundation Models such as Large Language Models (LLMs) to perform research independently.

  32. google/adk-python ⭐ 10,817
    An open-source, code-first Python toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control.
    🔗 google.github.io/adk-docs

  33. pydantic/pydantic-ai ⭐ 10,726
    PydanticAI is a Python Agent Framework designed to make it less painful to build production grade applications with Generative AI.
    🔗 ai.pydantic.dev

  34. asyncfuncai/deepwiki-open ⭐ 8,026
    Custom implementation of DeepWiki, automatically creates beautiful, interactive wikis for any GitHub, GitLab, or BitBucket repository

  35. meta-llama/llama-stack ⭐ 7,901
    Llama Stack standardizes the building blocks needed to bring genai applications to market. These blocks cover model training and fine-tuning, evaluation, and running AI agents in production
    🔗 llama-stack.readthedocs.io

  36. upsonic/Upsonic ⭐ 7,572
    Upsonic is a reliability-focused framework designed for real-world applications. It enables trusted agent workflows in your organization through advanced reliability features, including verification layers, triangular architecture, validator agents, and output evaluation systems.
    🔗 docs.upsonic.ai

  37. zilliztech/deep-searcher ⭐ 6,492
    DeepSearcher combines reasoning LLMs and VectorDBs o perform search, evaluation, and reasoning based on private data, providing highly accurate answer and comprehensive report
    🔗 zilliztech.github.io/deep-searcher

  38. awslabs/agent-squad ⭐ 6,230
    Flexible, lightweight open-source framework for orchestrating multiple AI agents to handle complex conversations
    🔗 awslabs.github.io/agent-squad

  39. mnotgod96/AppAgent ⭐ 5,997
    AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.
    🔗 appagent-official.github.io

  40. prefecthq/marvin ⭐ 5,808
    an ambient intelligence library
    🔗 marvin.mintlify.app

  41. openai/openai-cs-agents-demo ⭐ 5,540
    Demo of a Customer Service Agent interface built on top of the OpenAI Agents SDK

  42. pyspur-dev/pyspur ⭐ 5,280
    A visual playground for agentic workflows: Iterate over your agents 10x faster
    🔗 pyspur.dev

  43. kyegomez/swarms ⭐ 5,003
    The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework. Website: https://swarms.ai
    🔗 docs.swarms.world

  44. ag-ui-protocol/ag-ui ⭐ 4,972
    AG-UI: the Agent-User Interaction Protocol. Bring Agents into Frontend Applications.
    🔗 ag-ui.com

  45. landing-ai/vision-agent ⭐ 4,927
    VisionAgent is a library that helps you utilize agent frameworks to generate code to solve your vision task

  46. crewaiinc/crewAI-examples ⭐ 4,445
    A collection of examples that show how to use CrewAI framework to automate workflows.

  47. x-plug/MobileAgent ⭐ 4,445
    Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
    🔗 arxiv.org/abs/2501.11733

  48. langchain-ai/open_deep_research ⭐ 4,304
    Open Deep Research is an open source assistant that automates research and produces customizable reports on any topic

  49. meta-llama/llama-stack-apps ⭐ 4,265
    Agentic components of the Llama Stack APIs

  50. brainblend-ai/atomic-agents ⭐ 4,143
    Atomic Agents provides a set of tools and agents that can be combined to create powerful applications. It is built on top of Instructor and leverages the power of Pydantic for data and schema validation and serialization.

  51. langroid/langroid ⭐ 3,468
    Harness LLMs with Multi-Agent Programming
    🔗 langroid.github.io/langroid

  52. rowboatlabs/rowboat ⭐ 3,307
    AI-powered multi-agent builder
    🔗 www.rowboatlabs.com

  53. joshuac215/agent-service-toolkit ⭐ 3,286
    A full toolkit for running an AI agent service built with LangGraph, FastAPI and Streamlit.
    🔗 agent-service-toolkit.streamlit.app

  54. emcie-co/parlant ⭐ 3,281
    LLM agents built for control. Designed for real-world use. Deployed in minutes.
    🔗 www.parlant.io

  55. codelion/openevolve ⭐ 3,170
    Evolutionary coding agent (like AlphaEvolve) enabling automated scientific and algorithmic discovery

  56. openmanus/OpenManus-RL ⭐ 3,157
    OpenManus-RL is an open-source initiative collaboratively led by Ulab-UIUC and MetaGPT. This project is an extended version of the original OpenManus initiative.

  57. ag2ai/ag2 ⭐ 2,983
    AG2 (formerly AutoGen) is an open-source programming framework for building AI agents and facilitating cooperation among multiple agents to solve tasks.
    🔗 ag2.ai

  58. going-doer/Paper2Code ⭐ 2,886
    A multi-agent LLM system that transforms paper into a code repository. It follows a three-stage pipeline: planning, analysis, and code generation, each handled by specialized agents.

  59. facebookresearch/Pearl ⭐ 2,880
    A Production-ready Reinforcement Learning AI Agent Library brought by the Applied Reinforcement Learning team at Meta.

  60. cheshire-cat-ai/core ⭐ 2,818
    AI agent microservice
    🔗 cheshirecat.ai

  61. i-am-bee/beeai-framework ⭐ 2,620
    Build production-ready AI agents in both Python and Typescript.
    🔗 framework.beeai.dev

  62. om-ai-lab/OmAgent ⭐ 2,524
    OmAgent is python library for building multimodal language agents with ease. We try to keep the library simple without too much overhead like other agent framework.
    🔗 om-agent.com

  63. griptape-ai/griptape ⭐ 2,339
    Modular Python framework for AI agents and workflows with chain-of-thought reasoning, tools, and memory.
    🔗 www.griptape.ai

  64. run-llama/llama_deploy ⭐ 2,035
    Async-first framework for deploying, scaling, and productionizing agentic multi-service systems based on workflows from llama_index.
    🔗 docs.llamaindex.ai/en/stable/module_guides/llama_deploy

  65. strands-agents/sdk-python ⭐ 1,997
    A model-driven approach to building AI agents in just a few lines of code.
    🔗 strandsagents.com

  66. btahir/open-deep-research ⭐ 1,993
    Open source alternative to Gemini Deep Research. Generate reports with AI based on search results.
    🔗 opendeepresearch.vercel.app

  67. langchain-ai/executive-ai-assistant ⭐ 1,909
    Executive AI Assistant (EAIA) is an AI agent that attempts to do the job of an Executive Assistant (EA).

  68. agentops-ai/AgentStack ⭐ 1,884
    AgentStack scaffolds your agent stack - The tech stack that collectively is your agent

  69. openautocoder/Agentless ⭐ 1,782
    Agentless🐱: an agentless approach to automatically solve software development problems

  70. msoedov/agentic_security ⭐ 1,530
    An open-source vulnerability scanner for Agent Workflows and LLMs. Protecting AI systems from jailbreaks, fuzzing, and multimodal attacks.
    🔗 agentic-security.vercel.app

  71. sakanaai/AI-Scientist-v2 ⭐ 1,427
    The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

  72. link-agi/AutoAgents ⭐ 1,379
    [IJCAI 2024] Generate different roles for GPTs to form a collaborative entity for complex tasks.
    🔗 huggingface.co/spaces/linksoul/autoagents

  73. agentera/Agently ⭐ 1,377
    Agently is a development framework that helps developers build AI agent native application really fast.
    🔗 agently.tech

  74. shengranhu/ADAS ⭐ 1,369
    Automated Design of Agentic Systems using Meta Agent Search to show agents can invent novel and powerful agent designs
    🔗 www.shengranhu.com/adas

  75. prefecthq/ControlFlow ⭐ 1,339
    ControlFlow provides a structured, developer-focused framework for defining workflows and delegating work to LLMs, without sacrificing control or transparency
    🔗 controlflow.ai

  76. szczyglis-dev/py-gpt ⭐ 1,104
    Desktop AI Assistant powered by o1, o3, GPT-4, Gemini, Claude, Ollama, DeepSeek, Grok, Bielik, chat, vision, voice control, image generation and analysis, agents, command execution, file upload/download, speech synthesis and recognition, access to Web, memory, presets, assistants, plugins, and more. Linux, Windows, Mac
    🔗 pygpt.net

  77. plurai-ai/intellagent ⭐ 1,084
    Simulate interactions, analyze performance, and gain actionable insights for conversational agents. Test, evaluate, and optimize your agent to ensure reliable real-world deployment.
    🔗 intellagent-doc.plurai.ai

  78. langchain-ai/langgraph-swarm-py ⭐ 1,024
    A library for creating swarm-style multi-agent systems using LangGraph. A swarm is a type of multi-agent architecture where agents dynamically hand off control to one another based on their specializations
    🔗 langchain-ai.github.io/langgraph/concepts/multi_agent

  79. thudm/CogAgent ⭐ 994
    An open-sourced end-to-end VLM-based GUI Agent

  80. victordibia/autogen-ui ⭐ 950
    Web UI for AutoGen (A Framework Multi-Agent LLM Applications)

  81. humanlayer/humanlayer ⭐ 949
    HumanLayer is an API and SDK that enables AI Agents to contact humans for help, feedback, and approvals.
    🔗 humanlayer.dev

  82. google-deepmind/concordia ⭐ 930
    Concordia is a library to facilitate construction and use of generative agent-based models to simulate interactions of agents in grounded physical, social, or digital space.

  83. thytu/Agentarium ⭐ 923
    Framework for managing and orchestrating AI agents with ease. Agentarium provides a flexible and intuitive way to create, manage, and coordinate interactions between multiple AI agents in various environments.

  84. strnad/CrewAI-Studio ⭐ 910
    agentic,gui,automation

  85. deedy/mac_computer_use ⭐ 808
    A fork of Anthropic Computer Use that you can run on Mac computers to give Claude and other AI models autonomous access to your computer.
    🔗 x.com/deedydas/status/1849481225041559910

  86. salesforceairesearch/AgentLite ⭐ 611
    AgentLite is a research-oriented library designed for building and advancing LLM-based task-oriented agent systems. It simplifies the implementation of new agent/multi-agent architectures, enabling easy orchestration of multiple agents through a manager agent.

  87. codingmoh/open-codex ⭐ 577
    Open Codex is a fully open-source command-line AI assistant inspired by OpenAI Codex, supporting local language models like phi-4-mini and full integration with Ollama.

  88. quantalogic/quantalogic ⭐ 433
    QuantaLogic is a ReAct (Reasoning & Action) framework for building advanced AI agents. The cli version include coding capabilities comparable to Aider.

  89. sakanaai/AI-Scientist-ICLR2025-Workshop-Experiment ⭐ 262
    A paper produced by The AI Scientist passed a peer-review process at a workshop in a top machine learning conference

  90. prithivirajdamodaran/Route0x ⭐ 105
    A production-grade query routing solution, leveraging LLMs while optimizing for cost per query

  91. mannaandpoem/OpenManus ⭐ 30
    Open source version of Manus, the general AI agent

Code Quality

Code quality tooling: linters, formatters, pre-commit hooks, unused code removal.

  1. astral-sh/ruff ⭐ 40,712
    An extremely fast Python linter and code formatter, written in Rust.
    🔗 docs.astral.sh/ruff

  2. psf/black ⭐ 40,483
    The uncompromising Python code formatter
    🔗 black.readthedocs.io/en/stable

  3. pre-commit/pre-commit ⭐ 13,996
    A framework for managing and maintaining multi-language pre-commit hooks.
    🔗 pre-commit.com

  4. google/yapf ⭐ 13,913
    A formatter for Python files

  5. sqlfluff/sqlfluff ⭐ 9,012
    A modular SQL linter and auto-formatter with support for multiple dialects and templated code.
    🔗 www.sqlfluff.com

  6. pycqa/isort ⭐ 6,779
    A Python utility / library to sort imports.
    🔗 pycqa.github.io/isort

  7. davidhalter/jedi ⭐ 5,971
    Awesome autocompletion, static analysis and refactoring library for python
    🔗 jedi.readthedocs.io

  8. pycqa/pylint ⭐ 5,506
    It's not just a linter that annoys you!
    🔗 pylint.readthedocs.io/en/latest

  9. jendrikseipp/vulture ⭐ 3,912
    Find dead Python code

  10. asottile/pyupgrade ⭐ 3,826
    A tool (and pre-commit hook) to automatically upgrade syntax for newer versions of the language.

  11. pycqa/flake8 ⭐ 3,645
    flake8 is a python tool that glues together pycodestyle, pyflakes, mccabe, and third-party plugins to check the style and quality of some python code.
    🔗 flake8.pycqa.org

  12. wemake-services/wemake-python-styleguide ⭐ 2,741
    The strictest and most opinionated python linter ever!
    🔗 wemake-python-styleguide.rtfd.io

  13. python-lsp/python-lsp-server ⭐ 2,270
    Fork of the python-language-server project, maintained by the Spyder IDE team and the community

  14. codespell-project/codespell ⭐ 2,167
    check code for common misspellings

  15. sourcery-ai/sourcery ⭐ 1,685
    Instant AI code reviews
    🔗 sourcery.ai

  16. callowayproject/bump-my-version ⭐ 501
    A small command line tool to simplify releasing software by updating all version strings in your source code by the correct increment and optionally commit and tag the changes.
    🔗 callowayproject.github.io/bump-my-version

  17. tconbeer/sqlfmt ⭐ 464
    sqlfmt formats your dbt SQL files so you don't have to
    🔗 sqlfmt.com

Crypto and Blockchain

Cryptocurrency and blockchain libraries: trading bots, API integration, Ethereum virtual machine, solidity.

  1. freqtrade/freqtrade ⭐ 40,352
    Free, open source crypto trading bot
    🔗 www.freqtrade.io

  2. ccxt/ccxt ⭐ 37,197
    A JavaScript / TypeScript / Python / C# / PHP / Go cryptocurrency trading API with support for more than 100 bitcoin/altcoin exchanges
    🔗 docs.ccxt.com

  3. crytic/slither ⭐ 5,773
    Static Analyzer for Solidity and Vyper
    🔗 blog.trailofbits.com/2018/10/19/slither-a-solidity-static-analysis-framework

  4. ethereum/web3.py ⭐ 5,310
    A python interface for interacting with the Ethereum blockchain and ecosystem.
    🔗 web3py.readthedocs.io

  5. ethereum/consensus-specs ⭐ 3,749
    Ethereum Proof-of-Stake Consensus Specifications

  6. cyberpunkmetalhead/Binance-volatility-trading-bot ⭐ 3,463
    This is a fully functioning Binance trading bot that measures the volatility of every coin on Binance and places trades with the highest gaining coins If you like this project consider donating though the Brave browser to allow me to continuously improve the script.

  7. bmoscon/cryptofeed ⭐ 2,485
    Cryptocurrency Exchange Websocket Data Feed Handler

  8. ethereum/py-evm ⭐ 2,346
    A Python implementation of the Ethereum Virtual Machine
    🔗 py-evm.readthedocs.io/en/latest

  9. binance/binance-public-data ⭐ 1,924
    Details on how to get Binance public data

  10. ofek/bit ⭐ 1,300
    Bitcoin made easy.
    🔗 ofek.dev/bit

  11. man-c/pycoingecko ⭐ 1,080
    Python wrapper for the CoinGecko API

  12. palkeo/panoramix ⭐ 863
    Ethereum decompiler

  13. coinbase/agentkit ⭐ 764
    AgentKit is Coinbase Developer Platform's framework for easily enabling AI agents to take actions onchain. It is designed to be framework-agnostic, so you can use it with any AI framework, and wallet-agnostic
    🔗 docs.cdp.coinbase.com/agentkit/docs/welcome

  14. dylanhogg/awesome-crypto ⭐ 77
    A list of awesome crypto and blockchain projects
    🔗 www.awesomecrypto.xyz

Data

General data libraries: data processing, serialisation, formats, databases, SQL, connectors, web crawlers, data generation/augmentation/checks.

  1. microsoft/markitdown ⭐ 60,175
    A utility for converting files to Markdown, supports: PDF, PPT, Word, Excel, Images etc

  2. scrapy/scrapy ⭐ 57,531
    Scrapy, a fast high-level web crawling & scraping framework for Python.
    🔗 scrapy.org

  3. apache/spark ⭐ 41,437
    Apache Spark - A unified analytics engine for large-scale data processing
    🔗 spark.apache.org

  4. ds4sd/docling ⭐ 34,090
    Docling parses documents and exports them to the desired format with ease and speed.
    🔗 docling-project.github.io/docling

  5. mindsdb/mindsdb ⭐ 33,455
    AI's query engine - Platform for building AI that can answer questions over large scale federated data. - The only MCP Server you'll ever need
    🔗 mindsdb.com

  6. pathwaycom/pathway ⭐ 28,641
    Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
    🔗 pathway.com

  7. getredash/redash ⭐ 27,519
    Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
    🔗 redash.io

  8. jaidedai/EasyOCR ⭐ 27,245
    Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
    🔗 www.jaided.ai

  9. qdrant/qdrant ⭐ 24,599
    Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
    🔗 qdrant.tech

  10. humansignal/label-studio ⭐ 23,194
    Label Studio is an open source data labeling tool. It lets you label data types like audio, text, images, videos, and time series with a simple and straightforward UI and export to various model formats.
    🔗 labelstud.io

  11. chroma-core/chroma ⭐ 20,981
    the AI-native open-source embedding database
    🔗 www.trychroma.com

  12. airbytehq/airbyte ⭐ 18,675
    The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
    🔗 airbyte.com

  13. joke2k/faker ⭐ 18,535
    Faker is a Python package that generates fake data for you.
    🔗 faker.readthedocs.io

  14. avaiga/taipy ⭐ 18,236
    Turns Data and AI algorithms into production-ready web applications in no time.
    🔗 www.taipy.io

  15. binux/pyspider ⭐ 16,707
    A Powerful Spider(Web Crawler) System in Python.
    🔗 docs.pyspider.org

  16. tiangolo/sqlmodel ⭐ 16,317
    SQL databases in Python, designed for simplicity, compatibility, and robustness.
    🔗 sqlmodel.tiangolo.com

  17. twintproject/twint ⭐ 16,142
    An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.

  18. apache/arrow ⭐ 15,685
    Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
    🔗 arrow.apache.org

  19. weaviate/weaviate ⭐ 13,878
    Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database​.
    🔗 weaviate.io/developers/weaviate

  20. redis/redis-py ⭐ 13,118
    Redis Python client

  21. s0md3v/Photon ⭐ 11,700
    Incredibly fast crawler designed for OSINT.

  22. coleifer/peewee ⭐ 11,614
    a small, expressive orm -- supports postgresql, mysql, sqlite and cockroachdb
    🔗 docs.peewee-orm.com

  23. cyclotruc/gitingest ⭐ 10,836
    Turn any Git repository into a prompt-friendly text ingest for LLMs.
    🔗 gitingest.com

  24. sqlalchemy/sqlalchemy ⭐ 10,628
    The Database Toolkit for Python
    🔗 www.sqlalchemy.org

  25. simonw/datasette ⭐ 10,180
    An open source multi-tool for exploring and publishing data
    🔗 datasette.io

  26. bigscience-workshop/petals ⭐ 9,709
    🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
    🔗 petals.dev

  27. voxel51/fiftyone ⭐ 9,700
    Refine high-quality datasets and visual AI models
    🔗 fiftyone.ai

  28. yzhao062/pyod ⭐ 9,325
    A Python Library for Outlier and Anomaly Detection, Integrating Classical and Deep Learning Techniques
    🔗 pyod.readthedocs.io

  29. gristlabs/grist-core ⭐ 8,631
    Grist is the evolution of spreadsheets.
    🔗 www.getgrist.com

  30. tobymao/sqlglot ⭐ 7,977
    Python SQL Parser and Transpiler
    🔗 sqlglot.com

  31. lancedb/lancedb ⭐ 7,001
    Developer-friendly, embedded retrieval engine for multimodal AI. Search More; Manage Less.
    🔗 lancedb.github.io/lancedb

  32. alirezamika/autoscraper ⭐ 6,843
    A Smart, Automatic, Fast and Lightweight Web Scraper for Python

  33. kaggle/kaggle-api ⭐ 6,718
    Official Kaggle API

  34. madmaze/pytesseract ⭐ 6,159
    A Python wrapper for Google Tesseract

  35. ibis-project/ibis ⭐ 5,914
    Ibis is a Python library that provides a lightweight, universal interface for data wrangling. It helps Python users explore and transform data of any size, stored anywhere.
    🔗 ibis-project.org

  36. vi3k6i5/flashtext ⭐ 5,659
    Extract Keywords from sentence or Replace keywords in sentences.

  37. airbnb/knowledge-repo ⭐ 5,526
    A next-generation curated knowledge sharing platform for data scientists and other technical professions.

  38. googleapis/genai-toolbox ⭐ 5,273
    MCP Toolbox for Databases is an open source MCP server for databases. Develop tools easier, faster, and more securely by handling connection pooling, authentication.
    🔗 googleapis.github.io/genai-toolbox/getting-started/introduction

  39. superduperdb/superduper ⭐ 5,096
    Superduper: End-to-end framework for building custom AI applications and agents.
    🔗 superduper.io

  40. facebookresearch/AugLy ⭐ 5,021
    A data augmentations library for audio, image, text, and video.
    🔗 ai.facebook.com/blog/augly-a-new-data-augmentation-library-to-help-build-more-robust-ai-models

  41. jazzband/tablib ⭐ 4,704
    Python Module for Tabular Datasets in XLS, CSV, JSON, YAML, &c.
    🔗 tablib.readthedocs.io

  42. giskard-ai/giskard ⭐ 4,693
    🐢 Open-Source Evaluation & Testing for AI & LLM systems
    🔗 docs.giskard.ai

  43. amundsen-io/amundsen ⭐ 4,613
    Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
    🔗 www.amundsen.io/amundsen

  44. lk-geimfari/mimesis ⭐ 4,595
    Mimesis is a robust data generator for Python that can produce a wide range of fake data in multiple languages.
    🔗 mimesis.name

  45. rapidai/RapidOCR ⭐ 4,515
    📄 Awesome OCR multiple programing languages toolkits based on ONNXRuntime, OpenVINO, PaddlePaddle and PyTorch.
    🔗 rapidai.github.io/rapidocrdocs

  46. adbar/trafilatura ⭐ 4,473
    Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
    🔗 trafilatura.readthedocs.io

  47. mongodb/mongo-python-driver ⭐ 4,242
    PyMongo - the Official MongoDB Python driver
    🔗 www.mongodb.com/docs/languages/python/pymongo-driver/current

  48. rom1504/img2dataset ⭐ 4,084
    Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

  49. andialbrecht/sqlparse ⭐ 3,894
    A non-validating SQL parser module for Python

  50. dlt-hub/dlt ⭐ 3,868
    data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
    🔗 dlthub.com/docs

  51. deepchecks/deepchecks ⭐ 3,838
    Deepchecks: Tests for Continuous Validation of ML Models & Data. Deepchecks is a holistic open-source solution for all of your AI & ML validation needs, enabling to thoroughly test your data and models from research to production.
    🔗 docs.deepchecks.com/stable

  52. jmcnamara/XlsxWriter ⭐ 3,807
    A Python module for creating Excel XLSX files.
    🔗 xlsxwriter.readthedocs.io

  53. praw-dev/praw ⭐ 3,753
    PRAW, an acronym for "Python Reddit API Wrapper", is a python package that allows for simple access to Reddit's API.
    🔗 praw.readthedocs.io

  54. run-llama/llama-hub ⭐ 3,475
    A library of data loaders for LLMs made by the community -- to be used with LlamaIndex and/or LangChain
    🔗 llamahub.ai

  55. sqlalchemy/alembic ⭐ 3,454
    A database migrations tool for SQLAlchemy.

  56. mlabonne/llm-datasets ⭐ 3,251
    Curated list of datasets and tools for post-training.
    🔗 mlabonne.github.io/blog

  57. pyeve/cerberus ⭐ 3,223
    Lightweight, extensible data validation library for Python
    🔗 python-cerberus.org

  58. zoomeranalytics/xlwings ⭐ 3,176
    xlwings is a Python library that makes it easy to call Python from Excel and vice versa. It works with Excel on Windows and macOS as well as with Google Sheets and Excel on the web.
    🔗 www.xlwings.org

  59. docarray/docarray ⭐ 3,080
    Represent, send, store and search multimodal data
    🔗 docs.docarray.org

  60. sdv-dev/SDV ⭐ 3,058
    Synthetic data generation for tabular data
    🔗 docs.sdv.dev/sdv

  61. pallets/itsdangerous ⭐ 3,031
    Safely pass trusted data to untrusted environments and back.
    🔗 itsdangerous.palletsprojects.com

  62. datafold/data-diff ⭐ 2,975
    Compare tables within or across databases
    🔗 docs.datafold.com

  63. goldsmith/Wikipedia ⭐ 2,957
    A Pythonic wrapper for the Wikipedia API
    🔗 wikipedia.readthedocs.org

  64. awslabs/amazon-redshift-utils ⭐ 2,808
    Amazon Redshift Utils contains utilities, scripts and view which are useful in a Redshift environment

  65. kayak/pypika ⭐ 2,695
    PyPika is a python SQL query builder that exposes the full richness of the SQL language using a syntax that reflects the resulting query. PyPika excels at all sorts of SQL queries but is especially useful for data analysis.
    🔗 pypika.readthedocs.io/en/latest

  66. samuelcolvin/arq ⭐ 2,567
    Fast job queuing and RPC in python with asyncio and redis.
    🔗 arq-docs.helpmanual.io

  67. pynamodb/PynamoDB ⭐ 2,518
    A pythonic interface to Amazon's DynamoDB
    🔗 pynamodb.readthedocs.io

  68. huggingface/datatrove ⭐ 2,463
    DataTrove is a library to process, filter and deduplicate text data at a very large scale. It provides a set of prebuilt commonly used processing blocks with a framework to easily add custom functionality

  69. mangiucugna/json_repair ⭐ 2,440
    A python module to repair invalid JSON from LLMs
    🔗 pypi.org/project/json-repair

  70. pikepdf/pikepdf ⭐ 2,404
    A Python library for reading and writing PDF, powered by QPDF
    🔗 pikepdf.readthedocs.io

  71. uqfoundation/dill ⭐ 2,365
    serialize all of Python
    🔗 dill.rtfd.io

  72. sfu-db/connector-x ⭐ 2,353
    Fastest library to load data from DB to DataFrames in Rust and Python
    🔗 sfu-db.github.io/connector-x

  73. emirozer/fake2db ⭐ 2,330
    Generate fake but valid data filled databases for test purposes using most popular patterns(AFAIK). Current support is sqlite, mysql, postgresql, mongodb, redis, couchdb.

  74. graphistry/pygraphistry ⭐ 2,289
    PyGraphistry is a Python library to quickly load, shape, embed, and explore big graphs with the GPU-accelerated Graphistry visual graph analyzer

  75. aminalaee/sqladmin ⭐ 2,249
    SQLAlchemy Admin for FastAPI and Starlette
    🔗 aminalaee.github.io/sqladmin

  76. accenture/AmpliGraph ⭐ 2,216
    Python library for Representation Learning on Knowledge Graphs https://docs.ampligraph.org

  77. milvus-io/bootcamp ⭐ 2,169
    Dealing with all unstructured data, such as reverse image search, audio search, molecular search, video analysis, question and answer systems, NLP, etc.
    🔗 milvus.io

  78. agronholm/sqlacodegen ⭐ 2,125
    Automatic model code generator for SQLAlchemy

  79. simonw/sqlite-utils ⭐ 1,862
    Python CLI utility and library for manipulating SQLite databases
    🔗 sqlite-utils.datasette.io

  80. uber/petastorm ⭐ 1,847
    Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

  81. aio-libs/aiomysql ⭐ 1,822
    aiomysql is a library for accessing a MySQL database from the asyncio
    🔗 aiomysql.rtfd.io

  82. simple-salesforce/simple-salesforce ⭐ 1,787
    A very simple Salesforce.com REST API client for Python

  83. collerek/ormar ⭐ 1,757
    python async orm with fastapi in mind and pydantic validation
    🔗 collerek.github.io/ormar

  84. zarr-developers/zarr-python ⭐ 1,737
    An implementation of chunked, compressed, N-dimensional arrays for Python.
    🔗 zarr.readthedocs.io

  85. matthewwithanm/python-markdownify ⭐ 1,707
    Convert HTML to Markdown

  86. scholarly-python-package/scholarly ⭐ 1,666
    Retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to worry about CAPTCHAs!
    🔗 scholarly.readthedocs.io

  87. eleutherai/the-pile ⭐ 1,581
    The Pile is a large, diverse, open source language modelling data set that consists of many smaller datasets combined together.

  88. ydataai/ydata-synthetic ⭐ 1,558
    Synthetic data generators for tabular and time-series data
    🔗 docs.sdk.ydata.ai

  89. d-star-ai/dsRAG ⭐ 1,442
    A retrieval engine for unstructured data. It is especially good at handling challenging queries over dense text, like financial reports, legal documents, and academic papers.

  90. mchong6/JoJoGAN ⭐ 1,430
    Official PyTorch repo for JoJoGAN: One Shot Face Stylization

  91. google/tensorstore ⭐ 1,422
    Library for reading and writing large multi-dimensional arrays.
    🔗 google.github.io/tensorstore

  92. sdispater/orator ⭐ 1,417
    The Orator ORM provides a simple yet beautiful ActiveRecord implementation.
    🔗 orator-orm.com

  93. quixio/quix-streams ⭐ 1,402
    Python Streaming DataFrames for Kafka
    🔗 docs.quix.io

  94. aio-libs/aiocache ⭐ 1,308
    Asyncio cache manager for redis, memcached and memory
    🔗 aiocache.readthedocs.io

  95. eliasdabbas/advertools ⭐ 1,248
    advertools - online marketing productivity and analysis tools
    🔗 advertools.readthedocs.io

  96. pytorch/data ⭐ 1,211
    A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

  97. igorbenav/fastcrud ⭐ 1,190
    FastCRUD is a Python package for FastAPI, offering robust async CRUD operations and flexible endpoint creation utilities.
    🔗 benavlabs.github.io/fastcrud

  98. duckdb/dbt-duckdb ⭐ 1,110
    dbt (http://getdbt.com) adapter for DuckDB (http://duckdb.org)

  99. brettkromkamp/contextualise ⭐ 1,078
    Contextualise is an effective tool particularly suited for organising information-heavy projects and activities consisting of unstructured and widely diverse data and information resources
    🔗 contextualise.dev

  100. intake/intake ⭐ 1,048
    Intake is a lightweight package for finding, investigating, loading and disseminating data.
    🔗 intake.readthedocs.io

  101. uber/fiber ⭐ 1,044
    Distributed Computing for AI Made Simple
    🔗 uber.github.io/fiber

  102. meta-llama/synthetic-data-kit ⭐ 1,006
    Tool for generating high-quality synthetic datasets to fine-tune LLMs. Generate Reasoning Traces, QA Pairs, save them to a fine-tuning format with a simple CLI.
    🔗 pypi.org/project/synthetic-data-kit

  103. goccy/bigquery-emulator ⭐ 959
    BigQuery emulator provides a way to launch a BigQuery server on your local machine for testing and development.

  104. scikit-hep/awkward ⭐ 896
    Manipulate JSON-like data with NumPy-like idioms.
    🔗 awkward-array.org

  105. macbre/sql-metadata ⭐ 864
    Uses tokenized query returned by python-sqlparse and generates query metadata
    🔗 pypi.python.org/pypi/sql-metadata

  106. koaning/human-learn ⭐ 817
    Natural Intelligence is still a pretty good idea.
    🔗 koaning.github.io/human-learn

  107. weaviate/recipes ⭐ 802
    This repository shares end-to-end notebooks on how to use various Weaviate features and integrations!

  108. apache/iceberg-python ⭐ 801
    PyIceberg is a Python library for programmatic access to Iceberg table metadata as well as to table data in Iceberg format.
    🔗 py.iceberg.apache.org

  109. googleapis/python-bigquery ⭐ 779
    Python Client for Google BigQuery

  110. unstructured-io/unstructured-api ⭐ 767
    API for Open-Source Pre-Processing Tools for Unstructured Data

  111. kagisearch/vectordb ⭐ 735
    A minimal Python package for storing and retrieving text using chunking, embeddings, and vector search.
    🔗 vectordb.com

  112. ibm/data-prep-kit ⭐ 728
    Data Prep Kit is a community project to democratize and accelerate unstructured data preparation for LLM app developers
    🔗 data-prep-kit.github.io/data-prep-kit

  113. hyperqueryhq/whale ⭐ 728
    🐳 The stupidly simple CLI workspace for your data warehouse.
    🔗 rsyi.gitbook.io/whale

  114. dgarnitz/vectorflow ⭐ 694
    VectorFlow is a high volume vector embedding pipeline that ingests raw data, transforms it into vectors and writes it to a vector DB of your choice.
    🔗 www.getvectorflow.com

  115. jina-ai/vectordb ⭐ 626
    A Python vector database you just need - no more, no less.

  116. koaning/bulk ⭐ 587
    Bulk is a quick UI developer tool to apply some bulk labels.

  117. koaning/doubtlab ⭐ 513
    Doubt your data, find bad labels.
    🔗 koaning.github.io/doubtlab

  118. koaning/embetter ⭐ 501
    just a bunch of useful embeddings for scikit-learn pipelines
    🔗 koaning.github.io/embetter

  119. titan-systems/titan ⭐ 471
    Snowflake infrastructure-as-code. Provision environments, automate deploys, CI/CD. Manage RBAC, users, roles, and data access. Declarative Python Resource API.

  120. stackloklabs/promptwright ⭐ 432
    Promptwright is a Python library designed for generating large synthetic datasets using LLMs

Debugging

Debugging and tracing tools.

  1. cool-rr/PySnooper ⭐ 16,510
    Never use print for debugging again

  2. gruns/icecream ⭐ 9,817
    🍦 Never use print() to debug again.

  3. shobrook/rebound ⭐ 4,126
    Instant Stack Overflow results whenever an exception is thrown

  4. inducer/pudb ⭐ 3,113
    Full-screen console debugger for Python
    🔗 documen.tician.de/pudb

  5. gotcha/ipdb ⭐ 1,925
    Integration of IPython pdb

  6. alexmojaki/heartrate ⭐ 1,818
    Simple real time visualisation of the execution of a Python program.

  7. alexmojaki/birdseye ⭐ 1,682
    Graphical Python debugger which lets you easily view the values of all evaluated expressions
    🔗 birdseye.readthedocs.io

  8. pdbpp/pdbpp ⭐ 1,391
    pdb++, a drop-in replacement for pdb (the Python debugger)

  9. alexmojaki/snoop ⭐ 1,367
    A powerful set of Python debugging tools, based on PySnooper

  10. samuelcolvin/python-devtools ⭐ 1,038
    Dev tools for python
    🔗 python-devtools.helpmanual.io

Diffusion Text to Image

Text-to-image diffusion model libraries, tools and apps for generating images from natural language.

  1. automatic1111/stable-diffusion-webui ⭐ 154,377
    Stable Diffusion web UI

  2. comfyanonymous/ComfyUI ⭐ 82,156
    The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
    🔗 www.comfy.org

  3. compvis/stable-diffusion ⭐ 71,105
    A latent text-to-image diffusion model
    🔗 ommer-lab.com/research/latent-diffusion-models

  4. stability-ai/stablediffusion ⭐ 41,327
    High-Resolution Image Synthesis with Latent Diffusion Models

  5. lllyasviel/ControlNet ⭐ 32,709
    Let us control diffusion models!

  6. huggingface/diffusers ⭐ 29,738
    🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
    🔗 huggingface.co/docs/diffusers

  7. invoke-ai/InvokeAI ⭐ 25,476
    Invoke is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, and serves as the foundation for multiple commercial products.
    🔗 invoke-ai.github.io/invokeai

  8. openbmb/MiniCPM-o ⭐ 19,800
    MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

  9. apple/ml-stable-diffusion ⭐ 17,474
    Stable Diffusion with Core ML on Apple Silicon

  10. borisdayma/dalle-mini ⭐ 14,816
    DALL·E Mini - Generate images from a text prompt
    🔗 www.craiyon.com

  11. divamgupta/diffusionbee-stable-diffusion-ui ⭐ 13,305
    Diffusion Bee is the easiest way to run Stable Diffusion locally on your M1 Mac. Comes with a one-click installer. No dependencies or technical knowledge needed.
    🔗 diffusionbee.com

  12. compvis/latent-diffusion ⭐ 13,067
    High-Resolution Image Synthesis with Latent Diffusion Models

  13. instantid/InstantID ⭐ 11,711
    InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥
    🔗 instantid.github.io

  14. lucidrains/DALLE2-pytorch ⭐ 11,299
    Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch

  15. facebookresearch/dinov2 ⭐ 11,069
    PyTorch code and models for the DINOv2 self-supervised learning method.

  16. ashawkey/stable-dreamfusion ⭐ 8,669
    Text-to-3D & Image-to-3D & Mesh Exportation with NeRF + Diffusion.

  17. opengvlab/InternVL ⭐ 8,538
    [CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
    🔗 internvl.readthedocs.io/en/latest

  18. idea-research/GroundingDINO ⭐ 8,432
    [ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
    🔗 arxiv.org/abs/2303.05499

  19. carson-katri/dream-textures ⭐ 8,038
    Stable Diffusion built-in to Blender

  20. xavierxiao/Dreambooth-Stable-Diffusion ⭐ 7,743
    Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) with Stable Diffusion

  21. timothybrooks/instruct-pix2pix ⭐ 6,721
    PyTorch implementation of InstructPix2Pix, an instruction-based image editing model, based on the original CompVis/stable_diffusion repo.

  22. openai/consistency_models ⭐ 6,376
    Official repo for consistency models.

  23. salesforce/BLIP ⭐ 5,375
    PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

  24. nateraw/stable-diffusion-videos ⭐ 4,608
    Create 🔥 videos with Stable Diffusion by exploring the latent space and morphing between text prompts

  25. lkwq007/stablediffusion-infinity ⭐ 3,875
    Outpainting with Stable Diffusion on an infinite canvas

  26. jina-ai/discoart ⭐ 3,843
    🪩 Create Disco Diffusion artworks in one line

  27. mlc-ai/web-stable-diffusion ⭐ 3,671
    Bringing stable diffusion models to web browsers. Everything runs inside the browser with no server support.
    🔗 mlc.ai/web-stable-diffusion

  28. openai/glide-text2im ⭐ 3,645
    GLIDE: a diffusion-based text-conditional image synthesis model

  29. openai/improved-diffusion ⭐ 3,618
    Release for Improved Denoising Diffusion Probabilistic Models

  30. google-research/big_vision ⭐ 2,999
    Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.

  31. saharmor/dalle-playground ⭐ 2,759
    A playground to generate images from any text prompt using Stable Diffusion (past: using DALL-E Mini)

  32. open-compass/VLMEvalKit ⭐ 2,692
    Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
    🔗 huggingface.co/spaces/opencompass/open_vlm_leaderboard

  33. stability-ai/stability-sdk ⭐ 2,440
    SDK for interacting with stability.ai APIs (e.g. stable diffusion inference)
    🔗 platform.stability.ai

  34. thudm/CogVLM2 ⭐ 2,377
    GPT4V-level open-source multi-modal model based on Llama3-8B

  35. coyote-a/ultimate-upscale-for-automatic1111 ⭐ 1,744
    Ultimate SD Upscale extension for AUTOMATIC1111 Stable Diffusion web UI

  36. divamgupta/stable-diffusion-tensorflow ⭐ 1,606
    Stable Diffusion in TensorFlow / Keras

  37. nvlabs/prismer ⭐ 1,308
    The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
    🔗 shikun.io/projects/prismer

  38. chenyangqiqi/FateZero ⭐ 1,148
    [ICCV 2023 Oral] "FateZero: Fusing Attentions for Zero-shot Text-based Video Editing"
    🔗 fate-zero-edit.github.io

  39. tanelp/tiny-diffusion ⭐ 917
    A minimal PyTorch implementation of probabilistic diffusion models for 2D datasets.

  40. thereforegames/unprompted ⭐ 803
    Templating language written for Stable Diffusion workflows. Available as an extension for the Automatic1111 WebUI.

  41. sharonzhou/long_stable_diffusion ⭐ 689
    Long-form text-to-images generation, using a pipeline of deep generative models (GPT-3 and Stable Diffusion)

  42. laion-ai/dalle2-laion ⭐ 502
    Pretrained Dalle2 from laion

Finance

Financial and quantitative libraries: investment research tools, market data, algorithmic trading, backtesting, financial derivatives.

  1. openbb-finance/OpenBB ⭐ 42,269
    Investment Research for Everyone, Everywhere.
    🔗 openbb.co

  2. virattt/ai-hedge-fund ⭐ 38,006
    AI-powered hedge fund. The goal of this project is to explore the use of AI to make trading decisions.

  3. microsoft/qlib ⭐ 26,495
    Qlib is an AI-oriented Quant investment platform that aims to use AI tech to empower Quant Research, from exploring ideas to implementing productions. Qlib supports diverse ML modeling paradigms, including supervised learning, market dynamics modeling, and RL, and is now equipped with https://github.com/microsoft/RD...
    🔗 qlib.readthedocs.io/en/latest

  4. quantopian/zipline ⭐ 18,662
    Zipline, a Pythonic Algorithmic Trading Library
    🔗 www.zipline.io

  5. ranaroussi/yfinance ⭐ 18,196
    Download market data from Yahoo! Finance's API
    🔗 ranaroussi.github.io/yfinance

  6. mementum/backtrader ⭐ 17,877
    Python Backtesting library for trading strategies
    🔗 www.backtrader.com

  7. ai4finance-foundation/FinGPT ⭐ 16,632
    FinGPT: Open-Source Financial Large Language Models! Revolutionize 🔥 We release the trained model on HuggingFace.
    🔗 ai4finance.org

  8. ai4finance-foundation/FinRL ⭐ 12,099
    FinRL®: Financial Reinforcement Learning. 🔥
    🔗 ai4finance.org

  9. quantconnect/Lean ⭐ 11,758
    Lean Algorithmic Trading Engine by QuantConnect (Python, C#)
    🔗 lean.io

  10. ta-lib/ta-lib-python ⭐ 10,864
    Python wrapper for TA-Lib (http://ta-lib.org/).
    🔗 ta-lib.github.io/ta-lib-python

  11. goldmansachs/gs-quant ⭐ 9,107
    Python toolkit for quantitative finance
    🔗 developer.gs.com/discover/products/gs-quant

  12. kernc/backtesting.py ⭐ 6,756
    🔎 📈 🐍 💰 Backtest trading strategies in Python.
    🔗 kernc.github.io/backtesting.py

  13. quantopian/pyfolio ⭐ 5,993
    Portfolio and risk analytics in Python
    🔗 quantopian.github.io/pyfolio

  14. ranaroussi/quantstats ⭐ 5,898
    Portfolio analytics for quants, written in Python

  15. polakowo/vectorbt ⭐ 5,476
    Find your trading edge, using the fastest engine for backtesting, algorithmic trading, and research.
    🔗 vectorbt.dev

  16. google/tf-quant-finance ⭐ 4,919
    High-performance TensorFlow library for quantitative finance.

  17. borisbanushev/stockpredictionai ⭐ 4,833
    In this noteboook I will create a complete process for predicting stock price movements. Follow along and we will achieve some pretty good results. For that purpose we will use a Generative Adversarial Network (GAN) with LSTM, a type of Recurrent Neural Network, as generator, and a Convolutional Neural Networ...

  18. gbeced/pyalgotrade ⭐ 4,566
    Python Algorithmic Trading Library
    🔗 gbeced.github.io/pyalgotrade

  19. matplotlib/mplfinance ⭐ 4,066
    Financial Markets Data Visualization using Matplotlib
    🔗 pypi.org/project/mplfinance

  20. quantopian/alphalens ⭐ 3,786
    Performance analysis of predictive (alpha) stock factors
    🔗 quantopian.github.io/alphalens

  21. zvtvz/zvt ⭐ 3,672
    modular quant framework.
    🔗 zvt.readthedocs.io/en/latest

  22. cuemacro/finmarketpy ⭐ 3,610
    Python library for backtesting trading strategies & analyzing financial markets (formerly pythalesians)
    🔗 www.cuemacro.com

  23. robcarver17/pysystemtrade ⭐ 2,943
    Systematic Trading in python

  24. quantopian/research_public ⭐ 2,621
    Quantitative research and educational materials
    🔗 www.quantopian.com/lectures

  25. pmorissette/bt ⭐ 2,589
    bt - flexible backtesting for Python
    🔗 pmorissette.github.io/bt

  26. domokane/FinancePy ⭐ 2,463
    A Python Finance Library that focuses on the pricing and risk-management of Financial Derivatives, including fixed-income, equity, FX and credit derivatives.

  27. blankly-finance/blankly ⭐ 2,309
    🚀 💸 Easily build, backtest and deploy your algo in just a few lines of code. Trade stocks, cryptos, and forex across exchanges w/ one package.
    🔗 package.blankly.finance

  28. pmorissette/ffn ⭐ 2,294
    ffn - a financial function library for Python
    🔗 pmorissette.github.io/ffn

  29. cuemacro/findatapy ⭐ 1,869
    Python library to download market data via Bloomberg, Eikon, Quandl, Yahoo etc.

  30. quantopian/empyrical ⭐ 1,392
    Common financial risk and performance metrics. Used by zipline and pyfolio.
    🔗 quantopian.github.io/empyrical

  31. idanya/algo-trader ⭐ 833
    Trading bot with support for realtime trading, backtesting, custom strategies and much more.

  32. gbeced/basana ⭐ 746
    A Python async and event driven framework for algorithmic trading, with a focus on crypto currencies.

  33. chancefocus/PIXIU ⭐ 741
    This repository introduces PIXIU, an open-source resource featuring the first financial large language models (LLMs), instruction tuning data, and evaluation benchmarks to holistically assess financial LLMs. Our goal is to continually push forward the open-source development of financial artificial intelligence (AI).

  34. nasdaq/data-link-python ⭐ 548
    A Python library for Nasdaq Data Link's RESTful API

Game Development

Game development tools, engines and libraries.

  1. kitao/pyxel ⭐ 16,470
    A retro game engine for Python

  2. microsoft/TRELLIS ⭐ 10,081
    A large 3D asset generation model. It takes in text or image prompts and generates high-quality 3D assets in various formats, such as Radiance Fields, 3D Gaussians, and meshes.
    🔗 trellis3d.github.io

  3. pygame/pygame ⭐ 8,134
    🐍🎮 pygame (the library) is a Free and Open Source python programming language library for making multimedia applications like games built on top of the excellent SDL library. C, Python, Native, OpenGL.
    🔗 www.pygame.org

  4. panda3d/panda3d ⭐ 4,827
    Powerful, mature open-source cross-platform game engine for Python and C++, developed by Disney and CMU
    🔗 www.panda3d.org

  5. niklasf/python-chess ⭐ 2,616
    python-chess is a chess library for Python, with move generation, move validation, and support for common formats
    🔗 python-chess.readthedocs.io/en/latest

  6. pokepetter/ursina ⭐ 2,395
    A game engine powered by python and panda3d.
    🔗 pokepetter.github.io/ursina

  7. pyglet/pyglet ⭐ 2,054
    pyglet is a cross-platform windowing and multimedia library for Python, for developing games and other visually rich applications.
    🔗 pyglet.org

  8. pythonarcade/arcade ⭐ 1,837
    Easy to use Python library for creating 2D arcade games.
    🔗 arcade.academy

GIS

Geospatial libraries: raster and vector data formats, interactive mapping and visualisation, computing frameworks for processing images, projections.

  1. domlysz/BlenderGIS ⭐ 8,347
    Blender addons to make the bridge between Blender and geographic data

  2. python-visualization/folium ⭐ 7,181
    Python Data. Leaflet.js Maps.
    🔗 python-visualization.github.io/folium

  3. osgeo/gdal ⭐ 5,388
    GDAL is an open source MIT licensed translator library for raster and vector geospatial data formats.
    🔗 gdal.org

  4. gboeing/osmnx ⭐ 5,200
    Download, model, analyze, and visualize street networks and other geospatial features from OpenStreetMap.
    🔗 osmnx.readthedocs.io

  5. geopandas/geopandas ⭐ 4,817
    Python tools for geographic data
    🔗 geopandas.org

  6. shapely/shapely ⭐ 4,177
    Manipulation and analysis of geometric objects
    🔗 shapely.readthedocs.io/en/stable

  7. giswqs/geemap ⭐ 3,696
    A Python package for interactive geospatial analysis and visualization with Google Earth Engine.
    🔗 geemap.org

  8. microsoft/torchgeo ⭐ 3,511
    TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data
    🔗 www.osgeo.org/projects/torchgeo

  9. holoviz/datashader ⭐ 3,425
    Quickly and accurately render even the largest data.
    🔗 datashader.org

  10. opengeos/leafmap ⭐ 3,400
    A Python package for interactive mapping and geospatial analysis with minimal coding in a Jupyter environment
    🔗 leafmap.org

  11. opengeos/segment-geospatial ⭐ 3,337
    A Python package for segmenting geospatial data with the Segment Anything Model (SAM)
    🔗 samgeo.gishub.org

  12. google/earthengine-api ⭐ 2,899
    Python and JavaScript bindings for calling the Earth Engine API.

  13. rasterio/rasterio ⭐ 2,378
    Rasterio reads and writes geospatial raster datasets
    🔗 rasterio.readthedocs.io

  14. mcordts/cityscapesScripts ⭐ 2,279
    README and scripts for the Cityscapes Dataset

  15. azavea/raster-vision ⭐ 2,155
    An open source library and framework for deep learning on satellite and aerial imagery.
    🔗 docs.rastervision.io

  16. apache/sedona ⭐ 2,085
    A cluster computing framework for processing large-scale geospatial data
    🔗 sedona.apache.org

  17. plant99/felicette ⭐ 1,822
    Satellite imagery for dummies.

  18. gboeing/osmnx-examples ⭐ 1,698
    Gallery of OSMnx tutorials, usage examples, and feature demonstrations.
    🔗 osmnx.readthedocs.io

  19. microsoft/GlobalMLBuildingFootprints ⭐ 1,597
    Worldwide building footprints derived from satellite imagery

  20. jupyter-widgets/ipyleaflet ⭐ 1,526
    A Jupyter - Leaflet.js bridge
    🔗 ipyleaflet.readthedocs.io

  21. pysal/pysal ⭐ 1,403
    PySAL: Python Spatial Analysis Library Meta-Package
    🔗 pysal.org/pysal

  22. anitagraser/movingpandas ⭐ 1,317
    Movement trajectory classes and functions built on top of GeoPandas
    🔗 movingpandas.org

  23. sentinel-hub/eo-learn ⭐ 1,184
    Earth observation processing framework for machine learning in Python
    🔗 eo-learn.readthedocs.io/en/latest

  24. residentmario/geoplot ⭐ 1,171
    High-level geospatial data visualization library for Python.
    🔗 residentmario.github.io/geoplot/index.html

  25. osgeo/grass ⭐ 961
    GRASS - free and open-source geospatial processing engine
    🔗 grass.osgeo.org

  26. opengeos/streamlit-geospatial ⭐ 953
    A multi-page streamlit app for geospatial
    🔗 huggingface.co/spaces/giswqs/streamlit

  27. developmentseed/titiler ⭐ 906
    Build your own Raster dynamic map tile services
    🔗 developmentseed.org/titiler

  28. makepath/xarray-spatial ⭐ 888
    Raster-based Spatial Analytics for Python
    🔗 xarray-spatial.readthedocs.io

  29. datasystemslab/GeoTorchAI ⭐ 505
    GeoTorchAI: A Framework for Training and Using Spatiotemporal Deep Learning Models at Scale
    🔗 kanchanchy.github.io/geotorchai

Graph

Graphs and network libraries: network analysis, graph machine learning, visualisation.

  1. networkx/networkx ⭐ 15,942
    Network Analysis in Python
    🔗 networkx.org

  2. stellargraph/stellargraph ⭐ 3,018
    StellarGraph - Machine Learning on Graphs
    🔗 stellargraph.readthedocs.io

  3. westhealth/pyvis ⭐ 1,112
    Python package for creating and visualizing interactive network graphs.
    🔗 pyvis.readthedocs.io/en/latest

  4. microsoft/graspologic ⭐ 918
    graspologic is a package for graph statistical algorithms
    🔗 graspologic-org.github.io/graspologic

  5. rampasek/GraphGPS ⭐ 760
    Recipe for a General, Powerful, Scalable Graph Transformer

  6. dylanhogg/llmgraph ⭐ 447
    Create knowledge graphs with LLMs

GUI

Graphical user interface libraries and toolkits.

  1. hoffstadt/DearPyGui ⭐ 14,432
    Dear PyGui: A fast and powerful Graphical User Interface Toolkit for Python with minimal dependencies
    🔗 dearpygui.readthedocs.io/en/latest

  2. pysimplegui/PySimpleGUI ⭐ 13,638
    Python GUIs for Humans! PySimpleGUI is the top-rated Python application development environment. Launched in 2018 and actively developed, maintained, and supported in 2024. Transforms tkinter, Qt, WxPython, and Remi into a simple, intuitive, and fun experience for both hobbyists and expert users.
    🔗 www.pysimplegui.com

  3. parthjadhav/Tkinter-Designer ⭐ 9,868
    An easy and fast way to create a Python GUI 🐍

  4. samuelcolvin/FastUI ⭐ 8,847
    FastUI is a new way to build web application user interfaces defined by declarative Python code.
    🔗 fastui-demo.onrender.com

  5. r0x0r/pywebview ⭐ 5,319
    Build GUI for your Python program with JavaScript, HTML, and CSS
    🔗 pywebview.flowrl.com

  6. beeware/toga ⭐ 5,053
    A Python native, OS native GUI toolkit.
    🔗 toga.readthedocs.io/en/latest

  7. dddomodossola/remi ⭐ 3,619
    Python REMote Interface library. Platform independent. In about 100 Kbytes, perfect for your diet.

  8. wxwidgets/Phoenix ⭐ 2,482
    wxPython's Project Phoenix. A new implementation of wxPython, better, stronger, faster than he was before.
    🔗 wxpython.org

Jupyter

Jupyter and JupyterLab and Notebook tools, libraries and plugins.

  1. jupyterlab/jupyterlab ⭐ 14,686
    JupyterLab computational environment.
    🔗 jupyterlab.readthedocs.io

  2. marimo-team/marimo ⭐ 14,288
    A reactive Python notebook: run a cell or interact with a UI element, and marimo automatically runs dependent cells, keeping code and outputs consistent. marimo notebooks are stored as pure Python, executable as scripts, and deployable as apps.
    🔗 marimo.io

  3. jupyter/notebook ⭐ 12,408
    Jupyter Interactive Notebook
    🔗 jupyter-notebook.readthedocs.io

  4. garrettj403/SciencePlots ⭐ 7,995
    Matplotlib styles for scientific plotting

  5. mwouts/jupytext ⭐ 6,918
    Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts
    🔗 jupytext.readthedocs.io

  6. nteract/papermill ⭐ 6,214
    📚 Parameterize, execute, and analyze notebooks
    🔗 papermill.readthedocs.io/en/latest

  7. connorferster/handcalcs ⭐ 5,762
    Python library for converting Python calculations into rendered latex.

  8. voila-dashboards/voila ⭐ 5,754
    Voilà turns Jupyter notebooks into standalone web applications
    🔗 voila.readthedocs.io

  9. jupyterlite/jupyterlite ⭐ 4,220
    Wasm powered Jupyter running in the browser 💡
    🔗 jupyterlite.rtfd.io/en/stable/try/lab

  10. executablebooks/jupyter-book ⭐ 4,088
    Create beautiful, publication-quality books and documents from computational content.
    🔗 next.jupyterbook.org

  11. jupyterlab/jupyterlab-desktop ⭐ 4,056
    JupyterLab desktop application, based on Electron.

  12. jupyterlab/jupyter-ai ⭐ 3,695
    A generative AI extension for JupyterLab
    🔗 jupyter-ai.readthedocs.io

  13. jupyter-widgets/ipywidgets ⭐ 3,249
    Interactive Widgets for the Jupyter Notebook
    🔗 ipywidgets.readthedocs.io

  14. quantopian/qgrid ⭐ 3,073
    An interactive grid for sorting, filtering, and editing DataFrames in Jupyter notebooks

  15. jupyter/nbdime ⭐ 2,756
    Tools for diffing and merging of Jupyter notebooks.
    🔗 nbdime.readthedocs.io

  16. mito-ds/mito ⭐ 2,472
    Jupyter extensions that help you write code faster: Context aware AI Chat, Autocomplete, and Spreadsheet
    🔗 trymito.io

  17. jupyter/nbviewer ⭐ 2,256
    nbconvert as a web service: Render Jupyter Notebooks as static web pages
    🔗 nbviewer.jupyter.org

  18. maartenbreddels/ipyvolume ⭐ 1,962
    3d plotting for Python in the Jupyter notebook based on IPython widgets using WebGL

  19. jupyter-lsp/jupyterlab-lsp ⭐ 1,912
    Coding assistance for JupyterLab (code navigation + hover suggestions + linters + autocompletion + rename) using Language Server Protocol
    🔗 jupyterlab-lsp.readthedocs.io

  20. jupyter/nbconvert ⭐ 1,855
    Jupyter Notebook Conversion
    🔗 nbconvert.readthedocs.io

  21. koaning/drawdata ⭐ 1,431
    Draw datasets from within Python notebooks.
    🔗 koaning.github.io/drawdata

  22. nbqa-dev/nbQA ⭐ 1,140
    Run ruff, isort, pyupgrade, mypy, pylint, flake8, and more on Jupyter Notebooks
    🔗 nbqa.readthedocs.io/en/latest/index.html

  23. 8080labs/pyforest ⭐ 1,111
    With pyforest you can use all your favorite Python libraries without importing them before. If you use a package that is not imported yet, pyforest imports the package for you and adds the code to the first Jupyter cell.
    🔗 8080labs.com

  24. vizzuhq/ipyvizzu ⭐ 966
    Build animated charts in Jupyter Notebook and similar environments with a simple Python syntax.
    🔗 ipyvizzu.vizzuhq.com

  25. aws/graph-notebook ⭐ 784
    Library extending Jupyter notebooks to integrate with Apache TinkerPop, openCypher, and RDF SPARQL.
    🔗 github.com/aws/graph-notebook

  26. linealabs/lineapy ⭐ 667
    Move fast from data science prototype to pipeline. Capture, analyze, and transform messy notebooks into data pipelines with just two lines of code.
    🔗 lineapy.org

  27. xiaohk/stickyland ⭐ 569
    Break the linear presentation of Jupyter Notebooks with sticky cells!
    🔗 xiaohk.github.io/stickyland

  28. infuseai/colab-xterm ⭐ 466
    Open a terminal in colab, including the free tier.

LLMs and ChatGPT

Large language model and GPT libraries and frameworks: auto-gpt, agents, QnA, chain-of-thought workflows, API integations. Also see the Natural Language Processing category for crossover.

  1. significant-gravitas/AutoGPT ⭐ 176,814
    AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
    🔗 agpt.co

  2. open-webui/open-webui ⭐ 102,162
    Open WebUI is an extensible, feature-rich, and user-friendly self-hosted AI platform designed to operate entirely offline. It supports various LLM runners like Ollama and OpenAI-compatible APIs, with built-in inference engine for RAG
    🔗 openwebui.com

  3. deepseek-ai/DeepSeek-V3 ⭐ 98,173
    A strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.

  4. ggerganov/llama.cpp ⭐ 82,861
    LLM inference in C/C++

  5. nomic-ai/gpt4all ⭐ 73,788
    GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
    🔗 nomic.ai/gpt4all

  6. xtekky/gpt4free ⭐ 64,619
    The official gpt4free repository | various collection of powerful language models | o4, o3 and deepseek r1, gpt-4.1, gemini 2.5
    🔗 t.me/g4f_channel

  7. killianlucas/open-interpreter ⭐ 59,906
    A natural language interface for computers
    🔗 openinterpreter.com

  8. infiniflow/ragflow ⭐ 59,593
    RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
    🔗 ragflow.io

  9. modelcontextprotocol/servers ⭐ 58,953
    A collection of reference implementations for the Model Context Protocol (MCP), as well as references to community built servers
    🔗 modelcontextprotocol.io

  10. facebookresearch/llama ⭐ 58,501
    Inference code for Llama models

  11. imartinez/private-gpt ⭐ 56,237
    Interact with your documents using the power of GPT, 100% privately, no data leaks
    🔗 privategpt.dev

  12. gpt-engineer-org/gpt-engineer ⭐ 54,449
    CLI platform to experiment with codegen. Precursor to: https://lovable.dev

  13. hiyouga/LLaMA-Factory ⭐ 54,052
    Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
    🔗 llamafactory.readthedocs.io

  14. vllm-project/vllm ⭐ 51,985
    A high-throughput and memory-efficient inference and serving engine for LLMs
    🔗 docs.vllm.ai

  15. xai-org/grok-1 ⭐ 50,340
    This repository contains JAX example code for loading and running the Grok-1 open-weights model.

  16. unclecode/crawl4ai ⭐ 47,664
    AI-ready web crawling tailored for LLMs, AI agents, and data pipelines. Open source, flexible, and built for real-time performance, Crawl4AI empowers developers with unmatched speed, precision, and deployment ease.
    🔗 crawl4ai.com

  17. oobabooga/text-generation-webui ⭐ 44,283
    LLM UI with advanced features, easy setup, and multiple backend support.
    🔗 oobabooga.gumroad.com/l/deep_reason

  18. karpathy/nanoGPT ⭐ 42,732
    The simplest, fastest repository for training/finetuning medium-sized GPTs.

  19. unslothai/unsloth ⭐ 41,849
    Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.
    🔗 docs.unsloth.ai

  20. thudm/ChatGLM-6B ⭐ 41,086
    ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

  21. hpcaitech/ColossalAI ⭐ 41,024
    Making large AI models cheaper, faster and more accessible
    🔗 www.colossalai.org

  22. lm-sys/FastChat ⭐ 38,836
    An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

  23. quivrhq/quivr ⭐ 38,112
    Opiniated RAG for integrating GenAI in your apps 🧠 Focus on your product rather than the RAG. Easy integration in existing products with customisation! Any LLM: GPT4, Groq, Llama. Any Vectorstore: PGVector, Faiss. Any Files. Anyway you want.
    🔗 core.quivr.com

  24. laion-ai/Open-Assistant ⭐ 37,413
    OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
    🔗 open-assistant.io

  25. moymix/TaskMatrix ⭐ 34,423
    Connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting.

  26. pythagora-io/gpt-pilot ⭐ 33,171
    The first real AI developer

  27. danielmiessler/Fabric ⭐ 32,515
    Fabric is an open-source framework for augmenting humans using AI. It provides a modular system for solving specific problems using a crowdsourced set of AI prompts that can be used anywhere.
    🔗 danielmiessler.com/p/fabric-origin-story

  28. khoj-ai/khoj ⭐ 30,526
    Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI
    🔗 khoj.dev

  29. tatsu-lab/stanford_alpaca ⭐ 30,070
    Code and documentation to train Stanford's Alpaca models, and generate the data.
    🔗 crfm.stanford.edu/2023/03/13/alpaca.html

  30. exo-explore/exo ⭐ 28,938
    Run your own AI cluster at home. Unify your existing devices into one powerful GPU: iPhone, iPad, Android, Mac, NVIDIA, Raspberry Pi etc

  31. meta-llama/llama3 ⭐ 28,831
    The official Meta Llama 3 GitHub site

  32. pathwaycom/llm-app ⭐ 27,266
    Ready-to-run cloud templates for RAG, AI pipelines, and enterprise search with live data. 🐳Docker-friendly.⚡Always in sync with Sharepoint, Google Drive, S3, Kafka, PostgreSQL, real-time data APIs, and more.
    🔗 pathway.com/developers/templates

  33. karpathy/llm.c ⭐ 27,118
    LLM training in simple, pure C/CUDA. There is no need for 245MB of PyTorch or 107MB of cPython

  34. microsoft/graphrag ⭐ 26,435
    A modular graph-based Retrieval-Augmented Generation (RAG) system
    🔗 microsoft.github.io/graphrag

  35. stanfordnlp/dspy ⭐ 26,271
    DSPy: The framework for programming—not prompting—language models
    🔗 dspy.ai

  36. vision-cair/MiniGPT-4 ⭐ 25,699
    Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
    🔗 minigpt-4.github.io

  37. microsoft/semantic-kernel ⭐ 25,368
    An SDK that integrates LLMs like OpenAI, Azure OpenAI, and Hugging Face with conventional programming languages like C#, Python, and Java
    🔗 aka.ms/semantic-kernel

  38. berriai/litellm ⭐ 25,239
    Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
    🔗 docs.litellm.ai/docs

  39. huggingface/open-r1 ⭐ 25,013
    The goal of this repo is to build the missing pieces of the R1 pipeline such that everybody can reproduce and build on top of it

  40. microsoft/JARVIS ⭐ 24,224
    JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf

  41. openai/gpt-2 ⭐ 23,767
    Code for the paper "Language Models are Unsupervised Multitask Learners"
    🔗 openai.com/blog/better-language-models

  42. haotian-liu/LLaVA ⭐ 23,017
    [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
    🔗 llava.hliu.cc

  43. cinnamon/kotaemon ⭐ 22,775
    An open-source RAG UI for chatting with your documents. Built with both end users and developers in mind
    🔗 cinnamon.github.io/kotaemon

  44. karpathy/minGPT ⭐ 22,238
    A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

  45. deepset-ai/haystack ⭐ 21,485
    AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversatio...
    🔗 haystack.deepset.ai

  46. openai/chatgpt-retrieval-plugin ⭐ 21,195
    The ChatGPT Retrieval Plugin lets you easily find personal or work documents by asking questions in natural language.

  47. mlc-ai/mlc-llm ⭐ 20,950
    Universal LLM Deployment Engine with ML Compilation
    🔗 llm.mlc.ai

  48. microsoft/BitNet ⭐ 20,487
    Official inference framework for 1-bit LLMs (e.g., BitNet b1.58). It offers a suite of optimized kernels, that support fast and lossless inference of 1.58-bit models

  49. guidance-ai/guidance ⭐ 20,455
    A guidance language for controlling large language models.

  50. rasahq/rasa ⭐ 20,382
    💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
    🔗 rasa.com/docs/rasa

  51. stitionai/devika ⭐ 19,377
    Devika is an advanced AI software engineer that can understand high-level human instructions, break them down into steps, research relevant information, and write code to achieve the given objective.

  52. huggingface/peft ⭐ 19,007
    🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
    🔗 huggingface.co/docs/peft

  53. tloen/alpaca-lora ⭐ 18,922
    Instruct-tune LLaMA on consumer hardware

  54. nirdiamant/RAG_Techniques ⭐ 18,697
    The most comprehensive and dynamic collections of Retrieval-Augmented Generation (RAG) tutorials available today. This repository serves as a hub for cutting-edge techniques aimed at enhancing the accuracy, efficiency, and contextual richness of RAG systems.

  55. qwenlm/Qwen ⭐ 18,681
    The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

  56. vanna-ai/vanna ⭐ 18,547
    RAG (Retrieval-Augmented Generation) framework for SQL generation and related functionality.
    🔗 vanna.ai/docs

  57. karpathy/llama2.c ⭐ 18,543
    Inference Llama 2 in one file of pure C

  58. dao-ailab/flash-attention ⭐ 18,293
    Fast and memory-efficient exact attention

  59. anthropics/anthropic-cookbook ⭐ 17,965
    Provides code and guides designed to help developers build with Claude, offering copy-able code snippets that you can easily integrate into your own projects.

  60. facebookresearch/llama-cookbook ⭐ 17,603
    Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama model family and using them on various provider services
    🔗 www.llama.com

  61. idea-research/Grounded-Segment-Anything ⭐ 16,597
    Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
    🔗 arxiv.org/abs/2401.14159

  62. openai/evals ⭐ 16,541
    Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

  63. transformeroptimus/SuperAGI ⭐ 16,514
    <⚡️> SuperAGI - A dev-first open source autonomous AI agent framework. Enabling developers to build, manage & run useful autonomous agents quickly and reliably.
    🔗 superagi.com

  64. facebookresearch/codellama ⭐ 16,349
    Inference code for CodeLlama models

  65. modelcontextprotocol/python-sdk ⭐ 15,977
    The Model Context Protocol allows applications to provide context for LLMs in a standardized way, separating the concerns of providing context from the actual LLM interaction.
    🔗 modelcontextprotocol.io

  66. sgl-project/sglang ⭐ 15,914
    SGLang is a fast serving framework for large language models and vision language models.
    🔗 docs.sglang.ai

  67. mlc-ai/web-llm ⭐ 15,901
    High-performance In-browser LLM Inference Engine
    🔗 webllm.mlc.ai

  68. thudm/ChatGLM2-6B ⭐ 15,720
    ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型

  69. mayooear/ai-pdf-chatbot-langchain ⭐ 15,669
    AI PDF chatbot agent built with LangChain & LangGraph
    🔗 www.youtube.com/watch?v=of6soldiewu

  70. fauxpilot/fauxpilot ⭐ 14,727
    FauxPilot - an open-source alternative to GitHub Copilot server

  71. lvwerra/trl ⭐ 14,559
    Train transformer language models with reinforcement learning.
    🔗 hf.co/docs/trl

  72. llmware-ai/llmware ⭐ 14,249
    Unified framework for building enterprise RAG pipelines with small, specialized models
    🔗 llmware-ai.github.io/llmware

  73. skyvern-ai/skyvern ⭐ 13,799
    Skyvern automates browser-based workflows using LLMs and computer vision. It provides a simple API endpoint to fully automate manual workflows, replacing brittle or unreliable automation solutions.
    🔗 www.skyvern.com

  74. blinkdl/RWKV-LM ⭐ 13,777
    RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RNN and transformer - great performance, linear time, constant space (no kv-cache), fast training, infinite ctx_len, and f...

  75. nvidia/Megatron-LM ⭐ 12,835
    Ongoing research training transformer models at scale
    🔗 docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/index.html#quick-start

  76. paddlepaddle/PaddleNLP ⭐ 12,678
    Easy-to-use and powerful LLM and SLM library with awesome model zoo.
    🔗 paddlenlp.readthedocs.io

  77. swivid/F5-TTS ⭐ 12,609
    Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
    🔗 arxiv.org/abs/2410.06885

  78. lightning-ai/litgpt ⭐ 12,462
    20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
    🔗 lightning.ai

  79. lightning-ai/litgpt ⭐ 12,462
    20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
    🔗 lightning.ai

  80. shishirpatil/gorilla ⭐ 12,230
    Enables LLMs to use tools by invoking APIs. Given a query, Gorilla comes up with the semantically and syntactically correct API.
    🔗 gorilla.cs.berkeley.edu

  81. microsoft/LoRA ⭐ 12,227
    Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
    🔗 arxiv.org/abs/2106.09685

  82. andrewyng/aisuite ⭐ 12,226
    Simple, unified interface to multiple Generative AI providers. aisuite makes it easy for developers to use multiple LLM through a standardized interface.

  83. dottxt-ai/outlines ⭐ 12,062
    Structured Text Generation from LLMs
    🔗 dottxt-ai.github.io/outlines

  84. openlmlab/MOSS ⭐ 12,058
    An open-source tool-augmented conversational language model from Fudan University
    🔗 txsun1997.github.io/blogs/moss.html

  85. jiayi-pan/TinyZero ⭐ 11,998
    TinyZero is a reproduction of DeepSeek R1 Zero in countdown and multiplication tasks.

  86. h2oai/h2ogpt ⭐ 11,862
    Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://gpt-docs.h2o.ai/
    🔗 h2o.ai

  87. google-research/vision_transformer ⭐ 11,559
    Vision Transformer and MLP-Mixer Architectures

  88. instructor-ai/instructor ⭐ 10,931
    Instructor is a Python library that makes it a breeze to work with structured outputs from large language models (LLMs). Built on top of Pydantic, it provides a simple, transparent, and user-friendly API to manage validation, retries, and streaming responses.
    🔗 python.useinstructor.com

  89. volcengine/verl ⭐ 10,836
    veRL is a flexible, efficient and production-ready RL training library for large language models (LLMs).
    🔗 verl.readthedocs.io/en/latest/index.html

  90. databrickslabs/dolly ⭐ 10,806
    Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform
    🔗 www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html

  91. microsoft/promptflow ⭐ 10,557
    Build high-quality LLM apps - from prototyping, testing to production deployment and monitoring.
    🔗 microsoft.github.io/promptflow

  92. artidoro/qlora ⭐ 10,550
    QLoRA: Efficient Finetuning of Quantized LLMs
    🔗 arxiv.org/abs/2305.14314

  93. mistralai/mistral-inference ⭐ 10,354
    Official inference library for Mistral models
    🔗 mistral.ai

  94. chainlit/chainlit ⭐ 10,127
    Build Conversational AI in minutes ⚡️
    🔗 docs.chainlit.io

  95. explodinggradients/ragas ⭐ 9,884
    Supercharge Your LLM Application Evaluations 🚀
    🔗 docs.ragas.io

  96. axolotl-ai-cloud/axolotl ⭐ 9,870
    Go ahead and axolotl questions
    🔗 docs.axolotl.ai

  97. karpathy/minbpe ⭐ 9,746
    Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

  98. mshumer/gpt-prompt-engineer ⭐ 9,561
    Simply input a description of your task and some test cases, and the system will generate, test, and rank a multitude of prompts to find the ones that perform the best.

  99. eleutherai/lm-evaluation-harness ⭐ 9,513
    A framework for few-shot evaluation of language models.
    🔗 www.eleuther.ai

  100. blinkdl/ChatRWKV ⭐ 9,500
    ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source.

  101. anthropics/anthropic-quickstarts ⭐ 9,373
    A collection of projects designed to help developers quickly get started with building applications using the Anthropic API. Each quickstart provides a foundation that you can easily build upon and customize for your specific needs.

  102. abetlen/llama-cpp-python ⭐ 9,325
    Simple Python bindings for @ggerganov's llama.cpp library.
    🔗 llama-cpp-python.readthedocs.io

  103. e2b-dev/E2B ⭐ 8,965
    E2B is an open-source infrastructure that allows you to run AI-generated code in secure isolated sandboxes in the cloud
    🔗 e2b.dev/docs

  104. apple/ml-ferret ⭐ 8,640
    Ferret: Refer and Ground Anything Anywhere at Any Granularity

  105. jzhang38/TinyLlama ⭐ 8,631
    The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

  106. modelscope/ms-swift ⭐ 8,607
    Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4v, Phi4, ...) (AAAI 2025).
    🔗 swift.readthedocs.io/zh-cn/latest

  107. canner/WrenAI ⭐ 8,581
    Open-source GenBI AI Agent that empowers data-driven teams to chat with their data to generate Text-to-SQL, charts, spreadsheets, reports, and BI.
    🔗 getwren.ai/oss

  108. thudm/CodeGeeX ⭐ 8,532
    CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
    🔗 codegeex.cn

  109. vaibhavs10/insanely-fast-whisper ⭐ 8,516
    An opinionated CLI to transcribe Audio files w/ Whisper on-device! Powered by 🤗 Transformers, Optimum & flash-attn

  110. optimalscale/LMFlow ⭐ 8,446
    An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.
    🔗 optimalscale.github.io/lmflow

  111. skypilot-org/skypilot ⭐ 8,342
    SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 16+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
    🔗 docs.skypilot.co

  112. eleutherai/gpt-neo ⭐ 8,296
    An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.
    🔗 www.eleuther.ai

  113. sjtu-ipads/PowerInfer ⭐ 8,233
    High-speed Large Language Model Serving for Local Deployment

  114. vikhyat/moondream ⭐ 8,187
    A tiny open-source computer-vision language model designed to run efficiently on edge devices
    🔗 moondream.ai

  115. lianjiatech/BELLE ⭐ 8,180
    BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)

  116. plachtaa/VALL-E-X ⭐ 7,892
    An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/

  117. 01-ai/Yi ⭐ 7,831
    The Yi series models are the next generation of open-source large language models trained from scratch by 01.AI.
    🔗 01.ai

  118. thudm/GLM-130B ⭐ 7,682
    GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)

  119. zilliztech/GPTCache ⭐ 7,624
    Semantic cache for LLMs. Fully integrated with LangChain and llama_index.
    🔗 gptcache.readthedocs.io

  120. sweepai/sweep ⭐ 7,569
    Sweep: AI coding assistant for JetBrains
    🔗 sweep.dev

  121. future-house/paper-qa ⭐ 7,548
    High-accuracy retrieval augmented generation (RAG) on PDFs or text files, with a focus on the scientific literature
    🔗 futurehouse.gitbook.io/futurehouse-cookbook

  122. promptfoo/promptfoo ⭐ 7,511
    Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.
    🔗 promptfoo.dev

  123. openlm-research/open_llama ⭐ 7,505
    OpenLLaMA: An Open Reproduction of LLaMA

  124. bigcode-project/starcoder ⭐ 7,428
    Home of StarCoder: fine-tuning & inference!

  125. eleutherai/gpt-neox ⭐ 7,256
    An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
    🔗 www.eleuther.ai

  126. weaviate/Verba ⭐ 7,199
    Retrieval Augmented Generation (RAG) chatbot powered by Weaviate

  127. bhaskatripathi/pdfGPT ⭐ 7,134
    PDF GPT allows you to chat with the contents of your PDF file by using GPT capabilities. The most effective open source solution to turn your pdf files in a chatbot!
    🔗 huggingface.co/spaces/bhaskartripathi/pdfchatter

  128. apple/corenet ⭐ 7,013
    CoreNet is a deep neural network toolkit that allows researchers and engineers to train standard and novel small and large-scale models for variety of tasks, including foundation models (e.g., CLIP and LLM), object classification, object detection, and semantic segmentation.

  129. internlm/InternLM ⭐ 6,975
    Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).
    🔗 internlm.readthedocs.io

  130. mit-han-lab/streaming-llm ⭐ 6,933
    [ICLR 2024] Efficient Streaming Language Models with Attention Sinks
    🔗 arxiv.org/abs/2309.17453

  131. pipecat-ai/pipecat ⭐ 6,790
    Open Source framework for voice and multimodal conversational AI

  132. langchain-ai/opengpts ⭐ 6,666
    An open source effort to create a similar experience to OpenAI's GPTs and Assistants API.

  133. run-llama/rags ⭐ 6,482
    RAGs is a Streamlit app that lets you create a RAG pipeline from a data source using natural language.

  134. nat/openplayground ⭐ 6,355
    An LLM playground you can run on your laptop

  135. topoteretes/cognee ⭐ 6,309
    Memory for AI Agents in 5 lines of code
    🔗 www.cognee.ai

  136. minedojo/Voyager ⭐ 6,227
    An Open-Ended Embodied Agent with Large Language Models
    🔗 voyager.minedojo.org

  137. lightning-ai/lit-llama ⭐ 6,071
    Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.

  138. qwenlm/Qwen-VL ⭐ 6,066
    The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

  139. nirdiamant/Prompt_Engineering ⭐ 6,019
    A comprehensive collection of tutorials and implementations for Prompt Engineering techniques, ranging from fundamental concepts to advanced strategies.

  140. pytorch-labs/gpt-fast ⭐ 6,011
    Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

  141. arcee-ai/mergekit ⭐ 6,006
    Tools for merging pretrained large language models.

  142. langchain-ai/chat-langchain ⭐ 5,972
    Locally hosted chatbot specifically focused on question answering over the LangChain documentation
    🔗 chat.langchain.com

  143. lyogavin/airllm ⭐ 5,838
    AirLLM optimizes inference memory usage, allowing 70B large language models to run inference on a single 4GB GPU card without quantization, distillation and pruning. And you can run 405B Llama3.1 on 8GB vram now.

  144. allenai/OLMo ⭐ 5,761
    OLMo is a repository for training and using AI2's state-of-the-art open language models. It is designed by scientists, for scientists.
    🔗 allenai.org/olmo

  145. open-compass/opencompass ⭐ 5,658
    OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
    🔗 opencompass.org.cn

  146. microsoft/promptbase ⭐ 5,642
    promptbase is an evolving collection of resources, best practices, and example scripts for eliciting the best performance from foundation models.

  147. linkedin/Liger-Kernel ⭐ 5,346
    Efficient Triton Kernels for LLM Training
    🔗 arxiv.org/pdf/2410.10989

  148. microsoft/LLMLingua ⭐ 5,262
    [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
    🔗 llmlingua.com

  149. dsdanielpark/Bard-API ⭐ 5,257
    The unofficial python package that returns response of Google Bard through cookie value.
    🔗 pypi.org/project/bardapi

  150. guardrails-ai/guardrails ⭐ 5,241
    Open-source Python package for specifying structure and type, validating and correcting the outputs of large language models (LLMs)
    🔗 www.guardrailsai.com/docs

  151. openbmb/ToolBench ⭐ 5,153
    [ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.
    🔗 openbmb.github.io/toolbench

  152. geeeekexplorer/nano-vllm ⭐ 5,124
    A lightweight vLLM implementation built from scratch.

  153. nvidia/NeMo-Guardrails ⭐ 4,880
    NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.

  154. togethercomputer/RedPajama-Data ⭐ 4,766
    The RedPajama-Data repository contains code for preparing large datasets for training large language models.

  155. 1rgs/jsonformer ⭐ 4,764
    A Bulletproof Way to Generate Structured JSON from Language Models

  156. katanaml/sparrow ⭐ 4,615
    Sparrow is a solution for efficient data extraction and processing from various documents and images like invoices and receipts
    🔗 sparrow.katanaml.io

  157. boundaryml/baml ⭐ 4,590
    The AI framework that adds the engineering to prompt engineering (Python/TS/Ruby/Java/C#/Rust/Go compatible)
    🔗 docs.boundaryml.com

  158. kyegomez/tree-of-thoughts ⭐ 4,514
    Plug in and Play Implementation of Tree of Thoughts: Deliberate Problem Solving with Large Language Models that Elevates Model Reasoning by atleast 70%
    🔗 discord.gg/qutxnk2nmf

  159. microsoft/BioGPT ⭐ 4,438
    Implementation of BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining

  160. yizhongw/self-instruct ⭐ 4,414
    Aligning pretrained language models with instruction data generated by themselves.

  161. agiresearch/AIOS ⭐ 4,354
    AIOS, a Large Language Model (LLM) Agent operating system, embeds large language model into Operating Systems (OS) as the brain of the OS, enabling an operating system "with soul" -- an important step towards AGI.
    🔗 aios.foundation

  162. h2oai/h2o-llmstudio ⭐ 4,343
    H2O LLM Studio - a framework and no-code GUI for fine-tuning LLMs. Documentation: https://docs.h2o.ai/h2o-llmstudio/
    🔗 h2o.ai

  163. instruction-tuning-with-gpt-4/GPT-4-LLM ⭐ 4,312
    Instruction Tuning with GPT-4
    🔗 instruction-tuning-with-gpt-4.github.io

  164. ragapp/ragapp ⭐ 4,282
    The easiest way to use Agentic RAG in any enterprise

  165. turboderp/exllamav2 ⭐ 4,228
    A fast inference library for running LLMs locally on modern consumer-class GPUs

  166. truefoundry/cognita ⭐ 4,142
    RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry
    🔗 cognita.truefoundry.com

  167. mshumer/gpt-llm-trainer ⭐ 4,139
    Input a description of your task, and the system will generate a dataset, parse it, and fine-tune a LLaMA 2 model for you

  168. lm-sys/RouteLLM ⭐ 4,087
    A framework for serving and evaluating LLM routers - save LLM costs without compromising quality

  169. marker-inc-korea/AutoRAG ⭐ 4,085
    AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
    🔗 marker-inc-korea.github.io/autorag

  170. microsoft/LMOps ⭐ 4,049
    General technology for enabling AI capabilities w/ LLMs and MLLMs
    🔗 aka.ms/generalai

  171. llm-attacks/llm-attacks ⭐ 4,042
    This is the official repository for "Universal and Transferable Adversarial Attacks on Aligned Language Models"
    🔗 llm-attacks.org

  172. eth-sri/lmql ⭐ 3,989
    A language for constraint-guided and efficient LLM programming.
    🔗 lmql.ai

  173. kiln-ai/Kiln ⭐ 3,909
    The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.
    🔗 getkiln.ai

  174. vllm-project/aibrix ⭐ 3,907
    AIBrix delivers a cloud-native solution optimized for deploying, managing, and scaling large language model (LLM) inference, tailored specifically to enterprise needs.

  175. yuliang-liu/MonkeyOCR ⭐ 3,838
    A lightweight LMM-based Document Parsing Model with a Structure-Recognition-Relation Triplet Paradigm

  176. deep-agent/R1-V ⭐ 3,835
    We are building a general framework for Reinforcement Learning with Verifiable Rewards (RLVR) in VLM. RLVR outperforms chain-of-thought supervised fine-tuning (CoT-SFT) in both effectiveness and out-of-distribution (OOD) robustness for vision language models.

  177. defog-ai/sqlcoder ⭐ 3,822
    SoTA LLM for converting natural language questions to SQL queries

  178. ravenscroftj/turbopilot ⭐ 3,820
    Turbopilot is an open source large-language-model based code completion engine that runs locally on CPU

  179. openai/simple-evals ⭐ 3,817
    Lightweight library for evaluating language models

  180. huggingface/text-embeddings-inference ⭐ 3,789
    A blazing fast inference solution for text embeddings models
    🔗 huggingface.co/docs/text-embeddings-inference/quick_tour

  181. mmabrouk/llm-workflow-engine ⭐ 3,709
    Power CLI and Workflow manager for LLMs (core package)

  182. meta-llama/PurpleLlama ⭐ 3,580
    Set of tools to assess and improve LLM security. An umbrella project to bring together tools and evals to help the community build responsibly with open genai models.

  183. bclavie/RAGatouille ⭐ 3,566
    Bridging the gap between state-of-the-art research and alchemical RAG pipeline practices.

  184. next-gpt/NExT-GPT ⭐ 3,532
    Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model
    🔗 next-gpt.github.io

  185. minimaxir/simpleaichat ⭐ 3,519
    Python package for easily interfacing with chat apps, with robust features and minimal code complexity.

  186. iryna-kondr/scikit-llm ⭐ 3,464
    Seamlessly integrate LLMs into scikit-learn.
    🔗 beastbyte.ai

  187. minimaxir/gpt-2-simple ⭐ 3,407
    Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts

  188. sylphai-inc/AdalFlow ⭐ 3,397
    Unified auto-differentiative framework for both zero-shot prompt optimization and few-shot optimization. It advances existing auto-optimization research, including Text-Grad and DsPy
    🔗 adalflow.sylph.ai

  189. jaymody/picoGPT ⭐ 3,381
    An unnecessarily tiny implementation of GPT-2 in NumPy.

  190. lightning-ai/LitServe ⭐ 3,379
    The easiest way to deploy agents, MCP servers, models, RAG, pipelines and more. No MLOps. No YAML.
    🔗 lightning.ai/litserve

  191. flashinfer-ai/flashinfer ⭐ 3,345
    FlashInfer is a library and kernel generator for Large Language Models that provides high-performance implementation of LLM GPU kernels such as FlashAttention, SparseAttention, PageAttention, Sampling
    🔗 flashinfer.ai

  192. deep-diver/LLM-As-Chatbot ⭐ 3,325
    LLM as a Chatbot Service

  193. novasky-ai/SkyThought ⭐ 3,301
    Sky-T1: Train your own O1 preview model within $450
    🔗 novasky-ai.github.io

  194. predibase/lorax ⭐ 3,273
    Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
    🔗 loraexchange.ai

  195. luodian/Otter ⭐ 3,260
    🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
    🔗 otter-ntu.github.io

  196. verazuo/jailbreak_llms ⭐ 3,206
    Official repo for the ACM CCS 2024 paper "Do Anything Now'': Characterizing and Evaluating In-The-Wild Jailbreak Prompts
    🔗 jailbreak-llms.xinyueshen.me

  197. mit-han-lab/llm-awq ⭐ 3,140
    AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

  198. microsoft/torchscale ⭐ 3,091
    Foundation Architecture for (M)LLMs
    🔗 aka.ms/generalai

  199. cohere-ai/cohere-toolkit ⭐ 3,068
    Cohere Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.

  200. pytorch/executorch ⭐ 3,029
    An end-to-end solution for enabling on-device inference capabilities across mobile and edge devices including wearables, embedded devices and microcontrollers. It is part of the PyTorch Edge ecosystem and enables efficient deployment of PyTorch models to edge devices.
    🔗 pytorch.org/executorch

  201. mistralai/mistral-finetune ⭐ 2,984
    A light-weight codebase that enables memory-efficient and performant finetuning of Mistral's models. It is based on LoRA.

  202. li-plus/chatglm.cpp ⭐ 2,980
    C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)

  203. hiyouga/EasyR1 ⭐ 2,970
    EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
    🔗 verl.readthedocs.io/en/latest/index.html

  204. baichuan-inc/Baichuan-13B ⭐ 2,970
    A 13B large language model developed by Baichuan Intelligent Technology
    🔗 huggingface.co/baichuan-inc/baichuan-13b-chat

  205. freedomintelligence/LLMZoo ⭐ 2,944
    ⚡LLM Zoo is a project that provides data, models, and evaluation benchmark for large language models.⚡

  206. agenta-ai/agenta ⭐ 2,922
    The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.
    🔗 www.agenta.ai

  207. hegelai/prompttools ⭐ 2,897
    Open-source tools for prompt testing and experimentation, with support for both LLMs (e.g. OpenAI, LLaMA) and vector databases (e.g. Chroma, Weaviate, LanceDB).
    🔗 prompttools.readthedocs.io

  208. deepseek-ai/DualPipe ⭐ 2,827
    DualPipe is an innovative bidirectional pipeline parallelism algorithm introduced in the DeepSeek-V3 Technical Report.

  209. juncongmoo/pyllama ⭐ 2,802
    LLaMA: Open and Efficient Foundation Language Models

  210. argilla-io/distilabel ⭐ 2,800
    Distilabel is the framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
    🔗 distilabel.argilla.io

  211. huggingface/smollm ⭐ 2,795
    Everything about the SmolLM and SmolVLM family of models
    🔗 huggingface.co/huggingfacetb

  212. alpha-vllm/LLaMA2-Accessory ⭐ 2,787
    An Open-source Toolkit for LLM Development
    🔗 llama2-accessory.readthedocs.io

  213. noahshinn/reflexion ⭐ 2,785
    [NeurIPS 2023] Reflexion: Language Agents with Verbal Reinforcement Learning

  214. janhq/cortex.cpp ⭐ 2,764
    Cortex is a Local AI API Platform that is used to run and customize LLMs.
    🔗 cortex.so

  215. evolvinglmms-lab/lmms-eval ⭐ 2,732
    A One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
    🔗 www.lmms-lab.com

  216. paperswithcode/galai ⭐ 2,730
    Model API for GALACTICA

  217. truera/trulens ⭐ 2,625
    Evaluation and Tracking for LLM Experiments and AI Agents
    🔗 www.trulens.org

  218. roboflow/maestro ⭐ 2,588
    streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
    🔗 maestro.roboflow.com

  219. databricks/dbrx ⭐ 2,568
    Code examples and resources for DBRX, a large language model developed by Databricks
    🔗 www.databricks.com

  220. ofa-sys/OFA ⭐ 2,504
    Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

  221. ruc-nlpir/FlashRAG ⭐ 2,498
    FlashRAG is a Python toolkit for the reproduction and development of RAG research. Our toolkit includes 36 pre-processed benchmark RAG datasets and 15 state-of-the-art RAG algorithms.
    🔗 arxiv.org/abs/2405.13576

  222. young-geng/EasyLM ⭐ 2,480
    Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax.

  223. intel/neural-compressor ⭐ 2,449
    SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
    🔗 intel.github.io/neural-compressor

  224. spcl/graph-of-thoughts ⭐ 2,418
    Official Implementation of "Graph of Thoughts: Solving Elaborate Problems with Large Language Models"
    🔗 arxiv.org/pdf/2308.09687.pdf

  225. civitai/sd_civitai_extension ⭐ 2,377
    All of the Civitai models inside Automatic 1111 Stable Diffusion Web UI

  226. azure-samples/graphrag-accelerator ⭐ 2,374
    One-click deploy of a Knowledge Graph powered RAG (GraphRAG) in Azure
    🔗 github.com/microsoft/graphrag

  227. uptrain-ai/uptrain ⭐ 2,292
    An open-source unified platform to evaluate and improve Generative AI applications. Provide grades for 20+ preconfigured evaluations (covering language, code, embedding use cases)
    🔗 uptrain.ai

  228. facebookresearch/large_concept_model ⭐ 2,244
    Large Concept Models: Language modeling in a sentence representation space

  229. openai/finetune-transformer-lm ⭐ 2,222
    Code and model for the paper "Improving Language Understanding by Generative Pre-Training"
    🔗 s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf

  230. casper-hansen/AutoAWQ ⭐ 2,205
    AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
    🔗 casper-hansen.github.io/autoawq

  231. langwatch/langwatch ⭐ 2,187
    LangWatch is an open platform for Observing, Evaluating and Optimizing your LLM and Agentic applications.
    🔗 langwatch.ai

  232. ist-daslab/gptq ⭐ 2,140
    Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
    🔗 arxiv.org/abs/2210.17323

  233. akariasai/self-rag ⭐ 2,127
    This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi.
    🔗 selfrag.github.io

  234. tairov/llama2.mojo ⭐ 2,115
    Inference Llama 2 in one file of pure 🔥
    🔗 www.modular.com/blog/community-spotlight-how-i-built-llama2-by-aydyn-tairov

  235. microsoft/Megatron-DeepSpeed ⭐ 2,104
    Ongoing research training transformer language models at scale, including: BERT & GPT-2

  236. openai/image-gpt ⭐ 2,068
    Archived. Code and models from the paper "Generative Pretraining from Pixels"

  237. epfllm/meditron ⭐ 2,046
    Meditron is a suite of open-source medical Large Language Models (LLMs).
    🔗 huggingface.co/epfl-llm

  238. lucidrains/toolformer-pytorch ⭐ 2,041
    Implementation of Toolformer, Language Models That Can Use Tools, by MetaAI

  239. facebookresearch/chameleon ⭐ 2,032
    Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
    🔗 arxiv.org/abs/2405.09818

  240. googleapis/python-genai ⭐ 2,027
    Google Gen AI Python SDK provides an interface for developers to integrate Google's generative models into their Python applications.
    🔗 googleapis.github.io/python-genai

  241. huggingface/nanotron ⭐ 2,012
    Minimalistic large language model 3D-parallelism training

  242. illuin-tech/colpali ⭐ 2,009
    Code used for training the vision retrievers in the ColPali: Efficient Document Retrieval with Vision Language Models paper
    🔗 huggingface.co/vidore

  243. neulab/prompt2model ⭐ 2,004
    A system that takes a natural language task description to train a small special-purpose model that is conducive for deployment.

  244. openai/gpt-2-output-dataset ⭐ 1,986
    Dataset of GPT-2 outputs for research in detection, biases, and more

  245. minimaxir/aitextgen ⭐ 1,843
    A robust Python tool for text-based AI training and generation using GPT-2.
    🔗 docs.aitextgen.io

  246. noamgat/lm-format-enforcer ⭐ 1,835
    Enforce the output format (JSON Schema, Regex etc) of a language model

  247. ai-hypercomputer/maxtext ⭐ 1,829
    MaxText is a high performance, highly scalable, open-source LLM written in pure Python/Jax and targeting Google Cloud TPUs and GPUs for training and inference.

  248. protectai/llm-guard ⭐ 1,827
    Sanitization, detection of harmful language, prevention of data leakage, and resistance against prompt injection attacks for LLMs
    🔗 protectai.github.io/llm-guard

  249. openai/gpt-discord-bot ⭐ 1,827
    Example Discord bot written in Python that uses the completions API to have conversations with the text-davinci-003 model, and the moderations API to filter the messages.

  250. ray-project/llm-applications ⭐ 1,803
    A comprehensive guide to building RAG-based LLM applications for production.

  251. minishlab/model2vec ⭐ 1,753
    Model2Vec is a technique to turn any sentence transformer into a really small static model, reducing model size by 15x and making the models up to 500x faster, with a small drop in performance
    🔗 minish.ai/packages/model2vec

  252. agentops-ai/tokencost ⭐ 1,735
    Easy token price estimates for 400+ LLMs. TokenOps.
    🔗 agentops.ai

  253. qwenlm/Qwen-Audio ⭐ 1,733
    The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

  254. huggingface/lighteval ⭐ 1,716
    LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.
    🔗 huggingface.co/docs/lighteval/en/index

  255. vllm-project/llm-compressor ⭐ 1,618
    Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
    🔗 blog.vllm.ai/llm-compressor

  256. jina-ai/thinkgpt ⭐ 1,578
    Agent techniques to augment your LLM and push it beyong its limits

  257. huggingface/picotron ⭐ 1,577
    Minimalist & most-hackable repository for pre-training Llama-like models with 4D Parallelism (Data, Tensor, Pipeline, Context parallel)

  258. meetkai/functionary ⭐ 1,569
    Chat language model that can use tools and interpret the results

  259. jennyzzt/dgm ⭐ 1,501
    Self-improving system that iteratively modifies its own code and empirically validates each change

  260. answerdotai/rerankers ⭐ 1,493
    Welcome to rerankers! Our goal is to provide users with a simple API to use any reranking models.

  261. run-llama/llama-lab ⭐ 1,488
    Llama Lab is a repo dedicated to building cutting-edge projects using LlamaIndex

  262. chatarena/chatarena ⭐ 1,483
    ChatArena (or Chat Arena) is a Multi-Agent Language Game Environments for LLMs. The goal is to develop communication and collaboration capabilities of AIs.

  263. cstankonrad/long_llama ⭐ 1,460
    LongLLaMA is a large language model capable of handling long contexts. It is based on OpenLLaMA and fine-tuned with the Focused Transformer (FoT) method.

  264. farizrahman4u/loopgpt ⭐ 1,455
    Re-implementation of Auto-GPT as a python package, written with modularity and extensibility in mind.

  265. bigscience-workshop/Megatron-DeepSpeed ⭐ 1,401
    Ongoing research training transformer language models at scale, including: BERT & GPT-2

  266. karpathy/nano-llama31 ⭐ 1,394
    This repo is to Llama 3.1 what nanoGPT is to GPT-2. i.e. it is a minimal, dependency-free implementation of the Llama 3.1 architecture

  267. explosion/spacy-transformers ⭐ 1,389
    🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy
    🔗 spacy.io/usage/embeddings-transformers

  268. nirdiamant/Controllable-RAG-Agent ⭐ 1,329
    An advanced Retrieval-Augmented Generation (RAG) solution designed to tackle complex questions that simple semantic similarity-based retrieval cannot solve

  269. mlfoundations/dclm ⭐ 1,323
    DataComp for Language Models

  270. protectai/rebuff ⭐ 1,314
    Rebuff is designed to protect AI applications from prompt injection (PI) attacks through a multi-layered defense
    🔗 playground.rebuff.ai

  271. facebookresearch/MobileLLM ⭐ 1,309
    Training code of MobileLLM introduced in our work: "MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases"

  272. keirp/automatic_prompt_engineer ⭐ 1,283
    Large Language Models Are Human-Level Prompt Engineers

  273. explosion/spacy-llm ⭐ 1,276
    🦙 Integrating LLMs into structured NLP pipelines
    🔗 spacy.io/usage/large-language-models

  274. hao-ai-lab/LookaheadDecoding ⭐ 1,259
    Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
    🔗 arxiv.org/abs/2402.02057

  275. ray-project/ray-llm ⭐ 1,258
    RayLLM - LLMs on Ray (Archived). Read README for more info.
    🔗 docs.ray.io/en/latest

  276. srush/MiniChain ⭐ 1,233
    A tiny library for coding with large language models.
    🔗 srush-minichain.hf.space

  277. deepseek-ai/EPLB ⭐ 1,229
    Expert Parallelism Load Balancer across GPUs

  278. ibm/Dromedary ⭐ 1,148
    Dromedary: towards helpful, ethical and reliable LLMs.

  279. lupantech/chameleon-llm ⭐ 1,133
    Codes for "Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models".
    🔗 chameleon-llm.github.io

  280. vectifyai/PageIndex ⭐ 1,094
    A document indexing system that builds search tree structures from long documents, making them ready for reasoning-based RAG
    🔗 pageindex.ai

  281. rlancemartin/auto-evaluator ⭐ 1,078
    Evaluation tool for LLM QA chains
    🔗 autoevaluator.langchain.com

  282. mlc-ai/xgrammar ⭐ 1,062
    XGrammar is an open-source library for efficient, flexible, and portable structured generation. It supports general context-free grammar to enable a broad range of structures while bringing careful system optimizations to enable fast executions.
    🔗 xgrammar.mlc.ai/docs

  283. cerebras/modelzoo ⭐ 1,050
    Examples of common deep learning models that can be trained on Cerebras hardware

  284. ctlllll/LLM-ToolMaker ⭐ 1,031
    Large Language Models as Tool Makers

  285. datadreamer-dev/DataDreamer ⭐ 1,030
    DataDreamer is a powerful open-source Python library for prompting, synthetic data generation, and training workflows. It is designed to be simple, extremely efficient, and research-grade.
    🔗 datadreamer.dev

  286. microsoft/Llama-2-Onnx ⭐ 1,029
    A Microsoft optimized version of the Llama 2 model, available from Meta

  287. nomic-ai/pygpt4all ⭐ 1,018
    Official supported Python bindings for llama.cpp + gpt4all
    🔗 nomic-ai.github.io/pygpt4all

  288. pinecone-io/canopy ⭐ 1,017
    Retrieval Augmented Generation (RAG) framework and context engine powered by Pinecone
    🔗 www.pinecone.io

  289. ajndkr/lanarky ⭐ 994
    The web framework for building LLM microservices
    🔗 lanarky.ajndkr.com

  290. likejazz/llama3.np ⭐ 987
    llama3.np is a pure NumPy implementation for Llama 3 model.

  291. huggingface/optimum-nvidia ⭐ 986
    Optimum-NVIDIA delivers the best inference performance on the NVIDIA platform through Hugging Face. Run LLaMA 2 at 1,200 tokens/second (up to 28x faster than the framework)

  292. prometheus-eval/prometheus-eval ⭐ 960
    Evaluate your LLM's response with Prometheus and GPT4 💯

  293. sumandora/remove-refusals-with-transformers ⭐ 940
    A proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens

  294. soulter/hugging-chat-api ⭐ 931
    HuggingChat Python API🤗

  295. wandb/weave ⭐ 927
    Weave is a toolkit for developing AI-powered applications, built by Weights & Biases.
    🔗 wandb.me/weave

  296. langchain-ai/langsmith-cookbook ⭐ 927
    LangSmith is a platform for building production-grade LLM applications.
    🔗 langsmith-cookbook.vercel.app

  297. nousresearch/Hermes-Function-Calling ⭐ 908
    Code for the Hermes Pro Large Language Model to perform function calling based on the provided schema. It allows users to query the model and retrieve information related to stock prices, company fundamentals, financial statements

  298. centerforaisafety/hle ⭐ 893
    Humanity's Last Exam (HLE) is a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage
    🔗 lastexam.ai

  299. muennighoff/sgpt ⭐ 868
    SGPT: GPT Sentence Embeddings for Semantic Search
    🔗 arxiv.org/abs/2202.08904

  300. utkusen/promptmap ⭐ 835
    Vulnerability scanning tool that automatically tests prompt injection attacks on your LLM applications. It analyzes your LLM system prompts, runs them, and sends attack prompts to them.

  301. opengvlab/OmniQuant ⭐ 827
    [ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

  302. junruxiong/IncarnaMind ⭐ 798
    Connect and chat with your multiple documents (pdf and txt) through GPT 3.5, GPT-4 Turbo, Claude and Local Open-Source LLMs
    🔗 www.incarnamind.com

  303. oliveirabruno01/babyagi-asi ⭐ 797
    BabyAGI: an Autonomous and Self-Improving agent, or BASI

  304. cagostino/npcpy ⭐ 795
    This repo leverages the power of LLMs to understand your natural language commands and questions, executing tasks, answering queries, and providing relevant information from local files and the web.

  305. opengenerativeai/GenossGPT ⭐ 752
    One API for all LLMs either Private or Public (Anthropic, Llama V2, GPT 3.5/4, Vertex, GPT4ALL, HuggingFace ...) 🌈🐂 Replace OpenAI GPT with any LLMs in your app with one line.
    🔗 genoss.ai

  306. tag-research/TAG-Bench ⭐ 746
    Table-Augmented Generation (TAG) is a unified and general-purpose paradigm for answering natural language questions over databases
    🔗 arxiv.org/pdf/2408.14717

  307. developersdigest/llm-api-engine ⭐ 726
    Build and deploy AI-powered APIs in seconds. This project allows you to create custom APIs that extract structured data from websites using natural language descriptions, powered by LLMs and web scraping technology.
    🔗 www.youtube.com/watch?v=8kuek1bo4mm

  308. salesforce/xgen ⭐ 719
    Salesforce open-source LLMs with 8k sequence length.

  309. squeezeailab/SqueezeLLM ⭐ 695
    [ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
    🔗 arxiv.org/abs/2306.07629

  310. lupantech/ScienceQA ⭐ 677
    Data and code for NeurIPS 2022 Paper "Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering".

  311. tsinghuadatabasegroup/DB-GPT ⭐ 651
    LLM As Database Administrator
    🔗 dbgpt.dbmind.cn

  312. microsoft/VPTQ ⭐ 647
    Extreme Low-bit Vector Post-Training Quantization for Large Language Models

  313. magnivorg/prompt-layer-library ⭐ 636
    🍰 PromptLayer - Maintain a log of your prompts and OpenAI API requests. Track, debug, and replay old completions.
    🔗 www.promptlayer.com

  314. cyberark/FuzzyAI ⭐ 635
    A powerful tool for automated LLM fuzzing. It is designed to help developers and security researchers identify and mitigate potential jailbreaks in their LLM APIs.

  315. modal-labs/llm-finetuning ⭐ 605
    Guide for fine-tuning Llama/Mistral/CodeLlama models and more

  316. google-gemini/genai-processors ⭐ 602
    GenAI Processors is a lightweight Python library that enables efficient, parallel content processing.

  317. langchain-ai/langsmith-sdk ⭐ 588
    LangSmith Client SDK Implementations
    🔗 docs.smith.langchain.com

  318. judahpaul16/gpt-home ⭐ 586
    ChatGPT at home! Basically a better Google Nest Hub or Amazon Alexa home assistant. Built on the Raspberry Pi using the OpenAI API.
    🔗 hub.docker.com/r/judahpaul/gpt-home

  319. zhudotexe/kani ⭐ 583
    kani (カニ) is a highly hackable microframework for chat-based language models with tool use/function calling. (NLP-OSS @ EMNLP 2023)
    🔗 kani.readthedocs.io

  320. metauto-ai/agent-as-a-judge ⭐ 578
    ⚖️ The First Coding Agent-as-a-Judge
    🔗 arxiv.org/pdf/2410.10934

  321. qixucen/atom ⭐ 577
    Atom of Thoughts (AoT) is a new reasoning framework that represents the solution as a composition of atomic questions. This approach transforms the reasoning process into a Markov process with atomic states

  322. predibase/llm_distillation_playbook ⭐ 560
    Best practices for distilling large language models.

  323. huggingface/text-clustering ⭐ 554
    Easily embed, cluster and semantically label text datasets

  324. hazyresearch/ama_prompting ⭐ 547
    Ask Me Anything language model prompting

  325. declare-lab/instruct-eval ⭐ 546
    This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.
    🔗 declare-lab.github.io/instruct-eval

  326. vahe1994/SpQR ⭐ 543
    Quantization algorithm and the model evaluation code for SpQR method for LLM compression

  327. eugeneyan/obsidian-copilot ⭐ 540
    🤖 A prototype assistant for writing and thinking
    🔗 eugeneyan.com/writing/obsidian-copilot

  328. likenneth/honest_llama ⭐ 536
    Inference-Time Intervention: Eliciting Truthful Answers from a Language Model

  329. deepseek-ai/DeepSeek-Prover-V1.5 ⭐ 527
    DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

  330. kbressem/medAlpaca ⭐ 527
    LLM finetuned for medical question answering

  331. continuum-llms/chatgpt-memory ⭐ 524
    Allows to scale the ChatGPT API to multiple simultaneous sessions with infinite contextual and adaptive memory powered by GPT and Redis datastore.

  332. hazyresearch/H3 ⭐ 519
    Language Modeling with the H3 State Space Model

  333. cohere-ai/notebooks ⭐ 503
    Code examples and jupyter notebooks for the Cohere Platform

  334. reasoning-machines/pal ⭐ 502
    PaL: Program-Aided Language Models (ICML 2023)
    🔗 reasonwithpal.com

  335. codelion/adaptive-classifier ⭐ 335
    A flexible, adaptive classification system that allows for dynamic addition of new classes and continuous learning from examples. Built on top of transformers from HuggingFace, this library provides an easy-to-use interface for creating and updating text classifiers.

  336. stanford-oval/suql ⭐ 271
    SUQL: Conversational Search over Structured and Unstructured Data with LLMs
    🔗 arxiv.org/abs/2311.09818

  337. emissary-tech/legit-rag ⭐ 262
    A modular Retrieval-Augmented Generation (RAG) system built with FastAPI, Qdrant, and OpenAI.

  338. dottxt-ai/outlines-core ⭐ 230
    Core functionality for structured generation, formerly implemented in Outlines, with a focus on performance and portability.
    🔗 docs.rs/outlines-core

  339. quotient-ai/judges ⭐ 228
    judges is a small library to use and create LLM-as-a-Judge evaluators. The purpose of judges is to have a curated set of LLM evaluators in a low-friction format across a variety of use cases

  340. jina-ai/llm-query-expansion ⭐ 52
    Query Expension for Better Query Embedding using LLMs

Math and Science

Mathematical, numerical and scientific libraries.

  1. numpy/numpy ⭐ 29,889
    The fundamental package for scientific computing with Python.
    🔗 numpy.org

  2. camdavidsonpilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers ⭐ 27,587
    aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)
    🔗 camdavidsonpilon.github.io/probabilistic-programming-and-bayesian-methods-for-hackers

  3. taichi-dev/taichi ⭐ 27,264
    Productive, portable, and performant GPU programming in Python: Taichi Lang is an open-source, imperative, parallel programming language for high-performance numerical computation.
    🔗 taichi-lang.org

  4. experience-monks/math-as-code ⭐ 15,334
    This is a reference to ease developers into mathematical notation by showing comparisons with Python code

  5. scipy/scipy ⭐ 13,803
    SciPy library main repository
    🔗 scipy.org

  6. sympy/sympy ⭐ 13,734
    A computer algebra system written in pure Python
    🔗 sympy.org

  7. google/or-tools ⭐ 12,203
    Google Optimization Tools (a.k.a., OR-Tools) is an open-source, fast and portable software suite for solving combinatorial optimization problems.
    🔗 developers.google.com/optimization

  8. z3prover/z3 ⭐ 11,181
    Z3 is a theorem prover from Microsoft Research with a Python language binding.

  9. cupy/cupy ⭐ 10,334
    NumPy & SciPy for GPU
    🔗 cupy.dev

  10. google-deepmind/alphageometry ⭐ 4,544
    Solving Olympiad Geometry without Human Demonstrations

  11. pim-book/programmers-introduction-to-mathematics ⭐ 3,610
    Code for A Programmer's Introduction to Mathematics
    🔗 pimbook.org

  12. mikedh/trimesh ⭐ 3,284
    Python library for loading and using triangular meshes.
    🔗 trimesh.org

  13. talalalrawajfeh/mathematics-roadmap ⭐ 3,069
    A Comprehensive Roadmap to Mathematics

  14. pyro-ppl/numpyro ⭐ 2,485
    Probabilistic programming with NumPy powered by JAX for autograd and JIT compilation to GPU/TPU/CPU.
    🔗 num.pyro.ai

  15. mckinsey/causalnex ⭐ 2,345
    A Python library that helps data scientists to infer causation rather than observing correlation.
    🔗 causalnex.readthedocs.io

  16. pyomo/pyomo ⭐ 2,228
    An object-oriented algebraic modeling language in Python for structured optimization problems.
    🔗 www.pyomo.org

  17. facebookresearch/theseus ⭐ 1,917
    A library for differentiable nonlinear optimization

  18. arviz-devs/arviz ⭐ 1,706
    Exploratory analysis of Bayesian models with Python
    🔗 python.arviz.org

  19. google-research/torchsde ⭐ 1,660
    Differentiable SDE solvers with GPU support and efficient sensitivity analysis.

  20. dynamicslab/pysindy ⭐ 1,607
    A package for the sparse identification of nonlinear dynamical systems from data
    🔗 pysindy.readthedocs.io/en/latest

  21. geomstats/geomstats ⭐ 1,377
    Computations and statistics on manifolds with geometric structures.
    🔗 geomstats.ai

  22. cma-es/pycma ⭐ 1,197
    pycma is a Python implementation of CMA-ES and a few related numerical optimization tools.

  23. pymc-labs/CausalPy ⭐ 1,016
    A Python package for causal inference in quasi-experimental settings
    🔗 causalpy.readthedocs.io

  24. lean-dojo/LeanDojo ⭐ 675
    Tool for data extraction and interacting with Lean programmatically.
    🔗 leandojo.org

  25. willianfuks/tfcausalimpact ⭐ 641
    Python Causal Impact Implementation Based on Google's R Package. Built using TensorFlow Probability.

  26. brandondube/prysm ⭐ 302
    Prysm is an open-source library for physical and first-order modeling of optical systems and analysis of related data: numerical and physical optics, integrated modeling, phase retrieval, segmented systems, polynomials and fitting, sequential raytracing.
    🔗 prysm.readthedocs.io/en/stable

  27. lean-dojo/ReProver ⭐ 279
    Retrieval-Augmented Theorem Provers for Lean
    🔗 leandojo.org

  28. albahnsen/pycircular ⭐ 104
    pycircular is a Python module for circular data analysis

  29. gbillotey/Fractalshades ⭐ 32
    Arbitrary-precision fractal explorer - Python package

Machine Learning - General

General and classical machine learning libraries. See below for other sections covering specialised ML areas.

  1. openai/openai-cookbook ⭐ 65,268
    Examples and guides for using the OpenAI API
    🔗 cookbook.openai.com

  2. scikit-learn/scikit-learn ⭐ 62,617
    scikit-learn: machine learning in Python
    🔗 scikit-learn.org

  3. suno-ai/bark ⭐ 38,164
    🔊 Text-Prompted Generative Audio Model

  4. tencentarc/GFPGAN ⭐ 36,907
    GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.

  5. facebookresearch/faiss ⭐ 36,033
    A library for efficient similarity search and clustering of dense vectors.
    🔗 faiss.ai

  6. google-research/google-research ⭐ 35,979
    This repository contains code released by Google Research
    🔗 research.google

  7. google/jax ⭐ 32,749
    Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
    🔗 docs.jax.dev

  8. open-mmlab/mmdetection ⭐ 31,325
    OpenMMLab Detection Toolbox and Benchmark
    🔗 mmdetection.readthedocs.io

  9. lutzroeder/netron ⭐ 30,853
    Visualizer for neural network, deep learning and machine learning models
    🔗 netron.app

  10. google/mediapipe ⭐ 30,579
    Cross-platform, customizable ML solutions for live and streaming media.
    🔗 ai.google.dev/edge/mediapipe

  11. ageron/handson-ml2 ⭐ 29,018
    A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.

  12. dmlc/xgboost ⭐ 27,091
    Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
    🔗 xgboost.readthedocs.io

  13. roboflow/supervision ⭐ 26,883
    We write your reusable computer vision tools. 💜
    🔗 supervision.roboflow.com

  14. facebookresearch/fastText ⭐ 26,276
    A library for efficient learning of word representations and sentence classification.
    🔗 fasttext.cc

  15. modular/modular ⭐ 24,478
    The Modular Accelerated Xecution (MAX) platform is an integrated suite of AI libraries, tools, and technologies that unifies commonly fragmented AI deployment workflows
    🔗 docs.modular.com

  16. harisiqbal88/PlotNeuralNet ⭐ 23,649
    Latex code for making neural networks diagrams

  17. jina-ai/serve ⭐ 21,642
    ☁️ Build multimodal AI applications with cloud-native stack
    🔗 jina.ai/serve

  18. ml-explore/mlx ⭐ 21,381
    MLX is an array framework for machine learning on Apple silicon, brought to you by Apple machine learning research.
    🔗 ml-explore.github.io/mlx

  19. onnx/onnx ⭐ 19,225
    Open standard for machine learning interoperability
    🔗 onnx.ai

  20. microsoft/LightGBM ⭐ 17,385
    A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
    🔗 lightgbm.readthedocs.io/en/latest

  21. microsoft/onnxruntime ⭐ 17,160
    ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
    🔗 onnxruntime.ai

  22. tensorflow/tensor2tensor ⭐ 16,289
    Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

  23. ddbourgin/numpy-ml ⭐ 16,122
    Machine learning, in numpy
    🔗 numpy-ml.readthedocs.io

  24. aleju/imgaug ⭐ 14,619
    Image augmentation for machine learning experiments.
    🔗 imgaug.readthedocs.io

  25. neonbjb/tortoise-tts ⭐ 14,391
    A multi-voice TTS system trained with an emphasis on quality

  26. microsoft/nni ⭐ 14,228
    An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
    🔗 nni.readthedocs.io

  27. deepmind/deepmind-research ⭐ 14,109
    This repository contains implementations and illustrative code to accompany DeepMind publications

  28. jindongwang/transferlearning ⭐ 13,991
    Transfer learning / domain adaptation / domain generalization / multi-task learning etc. Papers, codes, datasets, applications, tutorials.-迁移学习
    🔗 transferlearning.xyz

  29. google-gemini/cookbook ⭐ 13,968
    A collection of guides and examples for the Gemini API, including quickstart tutorials for writing prompts.
    🔗 ai.google.dev/gemini-api/docs

  30. spotify/annoy ⭐ 13,844
    Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

  31. deepmind/alphafold ⭐ 13,663
    Implementation of the inference pipeline of AlphaFold v2

  32. ggerganov/ggml ⭐ 12,809
    Tensor library for machine learning

  33. facebookresearch/AnimatedDrawings ⭐ 12,548
    Code to accompany "A Method for Animating Children's Drawings of the Human Figure"

  34. optuna/optuna ⭐ 12,277
    A hyperparameter optimization framework
    🔗 optuna.org

  35. thudm/CogVideo ⭐ 11,674
    text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

  36. statsmodels/statsmodels ⭐ 10,799
    Statsmodels: statistical modeling and econometrics in Python
    🔗 www.statsmodels.org/devel

  37. cleanlab/cleanlab ⭐ 10,685
    Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
    🔗 cleanlab.ai

  38. twitter/the-algorithm-ml ⭐ 10,290
    Source code for Twitter's Recommendation Algorithm
    🔗 blog.twitter.com/engineering/en_us/topics/open-source/2023/twitter-recommendation-algorithm

  39. wandb/wandb ⭐ 10,067
    The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.
    🔗 wandb.ai

  40. epistasislab/tpot ⭐ 9,936
    A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
    🔗 epistasislab.github.io/tpot

  41. megvii-basedetection/YOLOX ⭐ 9,928
    YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. Documentation: https://yolox.readthedocs.io/

  42. facebookresearch/xformers ⭐ 9,714
    Hackable and optimized Transformers building blocks, supporting a composable construction.
    🔗 facebookresearch.github.io/xformers

  43. pycaret/pycaret ⭐ 9,412
    An open-source, low-code machine learning library in Python
    🔗 www.pycaret.org

  44. awslabs/autogluon ⭐ 9,119
    Fast and Accurate ML in 3 Lines of Code
    🔗 auto.gluon.ai

  45. pymc-devs/pymc ⭐ 9,114
    Bayesian Modeling and Probabilistic Programming in Python
    🔗 www.pymc.io

  46. open-mmlab/mmsegmentation ⭐ 9,037
    OpenMMLab Semantic Segmentation Toolbox and Benchmark.
    🔗 mmsegmentation.readthedocs.io/en/main

  47. huggingface/accelerate ⭐ 8,924
    🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
    🔗 huggingface.co/docs/accelerate

  48. uberi/speech_recognition ⭐ 8,788
    Speech recognition module for Python, supporting several engines and APIs, online and offline.
    🔗 pypi.python.org/pypi/speechrecognition

  49. catboost/catboost ⭐ 8,465
    A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
    🔗 catboost.ai

  50. automl/auto-sklearn ⭐ 7,880
    Automated Machine Learning with scikit-learn
    🔗 automl.github.io/auto-sklearn

  51. lmcinnes/umap ⭐ 7,853
    Uniform Manifold Approximation and Projection

  52. ml-explore/mlx-examples ⭐ 7,636
    Examples in the MLX framework

  53. py-why/dowhy ⭐ 7,583
    DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
    🔗 www.pywhy.org/dowhy

  54. featurelabs/featuretools ⭐ 7,489
    An open source python library for automated feature engineering
    🔗 www.featuretools.com

  55. hyperopt/hyperopt ⭐ 7,437
    Distributed Asynchronous Hyperparameter Optimization in Python
    🔗 hyperopt.github.io/hyperopt

  56. hips/autograd ⭐ 7,311
    Efficiently computes derivatives of NumPy code.

  57. open-mmlab/mmagic ⭐ 7,199
    OpenMMLab Multimodal Advanced, Generative, and Intelligent Creation Toolbox. Unlock the magic 🪄: Generative-AI (AIGC), easy-to-use APIs, awsome model zoo, diffusion models, for text-to-image generation, image/video restoration/enhancement, etc.
    🔗 mmagic.readthedocs.io/en/latest

  58. scikit-learn-contrib/imbalanced-learn ⭐ 7,013
    A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning
    🔗 imbalanced-learn.org

  59. yangchris11/samurai ⭐ 6,874
    Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"
    🔗 yangchris11.github.io/samurai

  60. probml/pyprobml ⭐ 6,827
    Python code for "Probabilistic Machine learning" book by Kevin Murphy

  61. nicolashug/Surprise ⭐ 6,632
    A Python scikit for building and analyzing recommender systems
    🔗 surpriselib.com

  62. project-monai/MONAI ⭐ 6,592
    AI Toolkit for Healthcare Imaging
    🔗 monai.io

  63. google/automl ⭐ 6,384
    Google Brain AutoML

  64. cleverhans-lab/cleverhans ⭐ 6,331
    An adversarial example library for constructing attacks, building defenses, and benchmarking both

  65. open-mmlab/mmcv ⭐ 6,186
    OpenMMLab Computer Vision Foundation
    🔗 mmcv.readthedocs.io/en/latest

  66. kevinmusgrave/pytorch-metric-learning ⭐ 6,179
    The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.
    🔗 kevinmusgrave.github.io/pytorch-metric-learning

  67. google-deepmind/graphcast ⭐ 6,171
    GraphCast: Learning skillful medium-range global weather forecasting

  68. uber/causalml ⭐ 5,481
    Uplift modeling and causal inference with machine learning algorithms

  69. online-ml/river ⭐ 5,434
    🌊 Online machine learning in Python
    🔗 riverml.xyz

  70. mdbloice/Augmentor ⭐ 5,117
    Image augmentation library in Python for machine learning.
    🔗 augmentor.readthedocs.io/en/stable

  71. rasbt/mlxtend ⭐ 5,039
    A library of extension and helper modules for Python's data analysis and machine learning libraries.
    🔗 rasbt.github.io/mlxtend

  72. marqo-ai/marqo ⭐ 4,899
    Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai
    🔗 www.marqo.ai

  73. skvark/opencv-python ⭐ 4,895
    Automated CI toolchain to produce precompiled opencv-python, opencv-python-headless, opencv-contrib-python and opencv-contrib-python-headless packages.
    🔗 pypi.org/project/opencv-python

  74. apple/coremltools ⭐ 4,833
    Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
    🔗 coremltools.readme.io

  75. nmslib/hnswlib ⭐ 4,791
    Header-only C++/python library for fast approximate nearest neighbors
    🔗 github.com/nmslib/hnswlib

  76. sanchit-gandhi/whisper-jax ⭐ 4,613
    JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.

  77. huggingface/autotrain-advanced ⭐ 4,436
    AutoTrain Advanced: faster and easier training and deployments of state-of-the-art machine learning models
    🔗 huggingface.co/autotrain

  78. nv-tlabs/GET3D ⭐ 4,372
    Generative Model of High Quality 3D Textured Shapes Learned from Images

  79. districtdatalabs/yellowbrick ⭐ 4,359
    Visual analysis and diagnostic tools to facilitate machine learning model selection.
    🔗 www.scikit-yb.org

  80. lucidrains/deep-daze ⭐ 4,356
    Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun

  81. huggingface/notebooks ⭐ 4,202
    Notebooks using the Hugging Face libraries 🤗

  82. py-why/EconML ⭐ 4,189
    ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to brin...
    🔗 www.microsoft.com/en-us/research/project/alice

  83. microsoft/FLAML ⭐ 4,163
    A fast library for AutoML and tuning. Join our Discord: https://discord.gg/Cppx2vSPVP.
    🔗 microsoft.github.io/flaml

  84. cmusphinx/pocketsphinx ⭐ 4,147
    A small speech recognizer

  85. ourownstory/neural_prophet ⭐ 4,121
    NeuralProphet: A simple forecasting package
    🔗 neuralprophet.com

  86. huggingface/speech-to-speech ⭐ 4,100
    Speech To Speech: an effort for an open-sourced and modular GPT4-o

  87. priorlabs/TabPFN ⭐ 4,033
    The TabPFN is a neural network that learned to do tabular data prediction. This is the original CUDA-supporting pytorch impelementation.
    🔗 priorlabs.ai

  88. zjunlp/DeepKE ⭐ 4,018
    [EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction
    🔗 deepke.zjukg.cn

  89. rucaibox/RecBole ⭐ 3,843
    A unified, comprehensive and efficient recommendation library
    🔗 recbole.io

  90. yoheinakajima/instagraph ⭐ 3,531
    Converts text input or URL into knowledge graph and displays

  91. lightly-ai/lightly ⭐ 3,464
    A python library for self-supervised learning on images.
    🔗 docs.lightly.ai/self-supervised-learning

  92. huggingface/safetensors ⭐ 3,342
    Implements a new simple format for storing tensors safely (as opposed to pickle) and that is still fast (zero-copy).
    🔗 huggingface.co/docs/safetensors

  93. pytorch/glow ⭐ 3,311
    Compiler for Neural Network hardware accelerators

  94. facebookresearch/vissl ⭐ 3,284
    VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.
    🔗 vissl.ai

  95. lucidrains/musiclm-pytorch ⭐ 3,267
    Implementation of MusicLM, Google's new SOTA model for music generation using attention networks, in Pytorch

  96. hrnet/HRNet-Semantic-Segmentation ⭐ 3,248
    The OCR approach is rephrased as Segmentation Transformer: https://arxiv.org/abs/1909.11065. This is an official implementation of semantic segmentation for HRNet. https://arxiv.org/abs/1908.07919

  97. mljar/mljar-supervised ⭐ 3,176
    Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation
    🔗 mljar.com

  98. shankarpandala/lazypredict ⭐ 3,170
    Lazy Predict help build a lot of basic models without much code and helps understand which models works better without any parameter tuning

  99. facebookresearch/flow_matching ⭐ 2,975
    Flow Matching (FM) is a recent framework for generative modeling that has achieved state-of-the-art performance across various domains, including image, video, audio, speech, and biological structures
    🔗 facebookresearch.github.io/flow_matching

  100. huggingface/optimum ⭐ 2,975
    🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools
    🔗 huggingface.co/docs/optimum/main

  101. scikit-learn-contrib/hdbscan ⭐ 2,957
    A high performance implementation of HDBSCAN clustering.
    🔗 hdbscan.readthedocs.io/en/latest

  102. google-research/t5x ⭐ 2,842
    T5X is a modular, composable, research-friendly framework for high-performance, configurable, self-service training, evaluation, and inference of sequence models (starting with language) at many scales.

  103. nvidia/cuda-python ⭐ 2,825
    CUDA Python: Performance meets Productivity
    🔗 nvidia.github.io/cuda-python

  104. scikit-optimize/scikit-optimize ⭐ 2,782
    Sequential model-based optimization with a scipy.optimize interface
    🔗 scikit-optimize.github.io

  105. neuraloperator/neuraloperator ⭐ 2,774
    Comprehensive library for learning neural operators in PyTorch. It is the official implementation for Fourier Neural Operators and Tensorized Neural Operators.
    🔗 neuraloperator.github.io/dev/index.html

  106. huggingface/huggingface_hub ⭐ 2,758
    The official Python client for the Huggingface Hub.
    🔗 huggingface.co/docs/huggingface_hub

  107. apple/ml-ane-transformers ⭐ 2,643
    Reference implementation of the Transformer architecture optimized for Apple Neural Engine (ANE)

  108. eric-mitchell/direct-preference-optimization ⭐ 2,638
    Reference implementation for DPO (Direct Preference Optimization)

  109. freedmand/semantra ⭐ 2,627
    Semantra is a multipurpose tool for semantically searching documents. Query by meaning rather than just by matching text.

  110. rom1504/clip-retrieval ⭐ 2,593
    Easily compute clip embeddings and build a clip retrieval system with them
    🔗 rom1504.github.io/clip-retrieval

  111. scikit-learn-contrib/category_encoders ⭐ 2,450
    A library of sklearn compatible categorical variable encoders
    🔗 contrib.scikit-learn.org/category_encoders

  112. huggingface/evaluate ⭐ 2,257
    🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
    🔗 huggingface.co/docs/evaluate

  113. qdrant/fastembed ⭐ 2,204
    Fast, Accurate, Lightweight Python library to make State of the Art Embedding
    🔗 qdrant.github.io/fastembed

  114. aws/sagemaker-python-sdk ⭐ 2,169
    A library for training and deploying machine learning models on Amazon SageMaker
    🔗 sagemaker.readthedocs.io

  115. microsoft/Olive ⭐ 1,996
    Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.
    🔗 microsoft.github.io/olive

  116. castorini/pyserini ⭐ 1,889
    Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.
    🔗 pyserini.io

  117. contextlab/hypertools ⭐ 1,847
    A Python toolbox for gaining geometric insights into high-dimensional data
    🔗 hypertools.readthedocs.io

  118. linkedin/greykite ⭐ 1,846
    A flexible, intuitive and fast forecasting library

  119. bmabey/pyLDAvis ⭐ 1,834
    Python library for interactive topic model visualization. Port of the R LDAvis package.

  120. rentruewang/koila ⭐ 1,824
    Prevent PyTorch's CUDA error: out of memory in just 1 line of code.
    🔗 koila.rentruewang.com

  121. laekov/fastmoe ⭐ 1,757
    A fast MoE impl for PyTorch
    🔗 fastmoe.ai

  122. stanfordmlgroup/ngboost ⭐ 1,757
    Natural Gradient Boosting for Probabilistic Prediction

  123. visual-layer/fastdup ⭐ 1,704
    fastdup is a powerful, free tool designed to rapidly generate valuable insights from image and video datasets. It helps enhance the quality of both images and labels, while significantly reducing data operation costs, all with unmatched scalability.

  124. microsoft/i-Code ⭐ 1,703
    The ambition of the i-Code project is to build integrative and composable multimodal AI. The "i" stands for integrative multimodal learning.

  125. tensorflow/addons ⭐ 1,701
    Useful extra functionality for TensorFlow 2.x maintained by SIG-addons

  126. kubeflow/katib ⭐ 1,606
    Automated Machine Learning on Kubernetes
    🔗 www.kubeflow.org/docs/components/katib

  127. google/vizier ⭐ 1,576
    Python-based research interface for blackbox and hyperparameter optimization, based on the internal Google Vizier Service.
    🔗 oss-vizier.readthedocs.io

  128. jina-ai/finetuner ⭐ 1,504
    🎯 Task-oriented embedding tuning for BERT, CLIP, etc.
    🔗 finetuner.jina.ai

  129. microsoft/Semi-supervised-learning ⭐ 1,496
    A Unified Semi-Supervised Learning Codebase (NeurIPS'22)
    🔗 usb.readthedocs.io

  130. csinva/imodels ⭐ 1,481
    Interpretable ML package 🔍 for concise, transparent, and accurate predictive modeling (sklearn-compatible).
    🔗 csinva.io/imodels

  131. spotify/voyager ⭐ 1,469
    🛰️ An approximate nearest-neighbor search library for Python and Java with a focus on ease of use, simplicity, and deployability.
    🔗 spotify.github.io/voyager

  132. patchy631/machine-learning ⭐ 1,442
    Machine Learning Tutorials Repository

  133. pytorch/FBGEMM ⭐ 1,400
    FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

  134. lightning-ai/lightning-thunder ⭐ 1,375
    Thunder is a source-to-source compiler for PyTorch. It makes PyTorch programs faster by combining and using different hardware executors at once

  135. koaning/scikit-lego ⭐ 1,349
    Extra blocks for scikit-learn pipelines.
    🔗 koaning.github.io/scikit-lego

  136. borealisai/advertorch ⭐ 1,348
    A Toolbox for Adversarial Robustness Research

  137. awslabs/dgl-ke ⭐ 1,308
    High performance, easy-to-use, and scalable package for learning large-scale knowledge graph embeddings.
    🔗 dglke.dgl.ai/doc

  138. opentensor/bittensor ⭐ 1,183
    Internet-scale Neural Networks
    🔗 www.bittensor.com

  139. davidmrau/mixture-of-experts ⭐ 1,134
    PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538

  140. google-research/deeplab2 ⭐ 1,026
    DeepLab2 is a TensorFlow library for deep labeling, aiming to provide a unified and state-of-the-art TensorFlow codebase for dense pixel labeling tasks.

  141. oml-team/open-metric-learning ⭐ 966
    OML is a PyTorch-based framework to train and validate the models producing high-quality embeddings.
    🔗 open-metric-learning.readthedocs.io/en/latest/index.html

  142. huggingface/optimum-quanto ⭐ 961
    A pytorch quantization backend for optimum

  143. pymc-labs/pymc-marketing ⭐ 891
    Bayesian marketing toolbox in PyMC. Media Mix (MMM), customer lifetime value (CLV), buy-till-you-die (BTYD) models and more.
    🔗 www.pymc-marketing.io

  144. hazyresearch/safari ⭐ 891
    Convolutions for Sequence Modeling

  145. criteo/autofaiss ⭐ 863
    Automatically create Faiss knn indices with the most optimal similarity search parameters.
    🔗 criteo.github.io/autofaiss

  146. replicate/replicate-python ⭐ 840
    Python client for Replicate
    🔗 replicate.com

  147. awslabs/python-deequ ⭐ 784
    Python API for Deequ, a library built on Spark for defining "unit tests for data", which measure data quality in large datasets

  148. googleapis/python-aiplatform ⭐ 772
    A Python SDK for Vertex AI, a fully managed, end-to-end platform for data science and machine learning.

  149. minishlab/semhash ⭐ 761
    SemHash is a lightweight and flexible tool for deduplicating datasets using semantic similarity. It combines fast embedding generation from Model2Vec with efficient ANN-based similarity search through Vicinity
    🔗 minish.ai/packages/semhash

  150. nomic-ai/contrastors ⭐ 728
    Contrastive learning toolkit that enables researchers and engineers to train and evaluate contrastive models efficiently.

  151. facebookresearch/balance ⭐ 701
    The balance python package offers a simple workflow and methods for dealing with biased data samples when looking to infer from them to some target population of interest.
    🔗 import-balance.org

  152. nicolas-hbt/pygraft ⭐ 691
    Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your Fingertips
    🔗 pygraft.readthedocs.io/en/latest

  153. intel/intel-npu-acceleration-library ⭐ 680
    The Intel NPU Acceleration Library is a Python library designed to boost the efficiency of your applications by leveraging the power of the Intel Neural Processing Unit (NPU) to perform high-speed computations on compatible hardware.

  154. huggingface/exporters ⭐ 661
    Export Hugging Face models to Core ML and TensorFlow Lite

  155. qdrant/quaterion ⭐ 655
    Blazing fast framework for fine-tuning similarity learning models
    🔗 quaterion.qdrant.tech

  156. hpcaitech/EnergonAI ⭐ 630
    Large-scale model inference.

  157. intellabs/bayesian-torch ⭐ 606
    A library for Bayesian neural network layers and uncertainty estimation in Deep Learning extending the core of PyTorch

  158. eleutherai/sparsify ⭐ 584
    This library trains k-sparse autoencoders (SAEs) on the residual stream activations of HuggingFace language models, roughly following the recipe detailed in Scaling and evaluating sparse autoencoders (Gao et al. 2024)

  159. microsoft/Focal-Transformer ⭐ 556
    [NeurIPS 2021 Spotlight] Official code for "Focal Self-attention for Local-Global Interactions in Vision Transformers"

  160. deepgraphlearning/ULTRA ⭐ 545
    A foundation model for knowledge graph reasoning

  161. linkedin/FastTreeSHAP ⭐ 538
    Fast SHAP value computation for interpreting tree-based models

  162. mrdbourke/m1-machine-learning-test ⭐ 533
    Code for testing various M1 Chip benchmarks with TensorFlow.

  163. raivnlab/MRL ⭐ 522
    Code repository for the paper - "Matryoshka Representation Learning"

  164. nevronai/MetisFL ⭐ 522
    The first open Federated Learning framework implemented in C++ and Python.
    🔗 metisfl.org

  165. lightning-ai/litData ⭐ 506
    Transform datasets at scale. Optimize datasets for fast AI model training.

  166. dylanhogg/gptauthor ⭐ 79
    GPTAuthor is an AI tool for writing long form, multi-chapter stories given a story prompt.

Machine Learning - Deep Learning

Machine learning libraries that cross over with deep learning in some way.

  1. tensorflow/tensorflow ⭐ 190,689
    An Open Source Machine Learning Framework for Everyone
    🔗 tensorflow.org

  2. pytorch/pytorch ⭐ 91,456
    Tensors and Dynamic neural networks in Python with strong GPU acceleration
    🔗 pytorch.org

  3. openai/whisper ⭐ 84,724
    Robust Speech Recognition via Large-Scale Weak Supervision

  4. keras-team/keras ⭐ 63,201
    Deep Learning for humans
    🔗 keras.io

  5. deepfakes/faceswap ⭐ 54,229
    Deepfakes Software For All
    🔗 www.faceswap.dev

  6. facebookresearch/segment-anything ⭐ 50,797
    The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

  7. microsoft/DeepSpeed ⭐ 39,296
    DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
    🔗 www.deepspeed.ai

  8. rwightman/pytorch-image-models ⭐ 34,705
    The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more
    🔗 huggingface.co/docs/timm

  9. facebookresearch/detectron2 ⭐ 32,343
    Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
    🔗 detectron2.readthedocs.io/en/latest

  10. xinntao/Real-ESRGAN ⭐ 31,651
    Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.

  11. openai/CLIP ⭐ 29,773
    CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

  12. lightning-ai/pytorch-lightning ⭐ 29,768
    The deep learning framework to pretrain, finetune and deploy AI models. PyTorch Lightning is just organized PyTorch - Lightning disentangles PyTorch code to decouple the science from the engineering.
    🔗 lightning.ai/pytorch-lightning

  13. google-research/tuning_playbook ⭐ 28,911
    A playbook for systematically maximizing the performance of deep learning models.

  14. facebookresearch/Detectron ⭐ 26,361
    FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.

  15. matterport/Mask_RCNN ⭐ 25,227
    Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

  16. lucidrains/vit-pytorch ⭐ 23,344
    Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

  17. paddlepaddle/Paddle ⭐ 23,024
    PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
    🔗 www.paddlepaddle.org

  18. pyg-team/pytorch_geometric ⭐ 22,585
    Graph Neural Network Library for PyTorch
    🔗 pyg.org

  19. sanster/IOPaint ⭐ 21,807
    Image inpainting tool powered by SOTA AI Model. Remove any unwanted object, defect, people from your pictures or erase and replace(powered by stable diffusion) any thing on your pictures.
    🔗 www.iopaint.com

  20. apache/mxnet ⭐ 20,805
    Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
    🔗 mxnet.apache.org

  21. danielgatis/rembg ⭐ 19,767
    Rembg is a tool to remove images background

  22. rasbt/deeplearning-models ⭐ 17,115
    A collection of various deep learning architectures, models, and tips

  23. albumentations-team/albumentations ⭐ 15,044
    Fast and flexible image augmentation library. Paper about the library: https://www.mdpi.com/2078-2489/11/2/125
    🔗 albumentations.ai

  24. microsoft/Swin-Transformer ⭐ 14,992
    This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
    🔗 arxiv.org/abs/2103.14030

  25. facebookresearch/detr ⭐ 14,501
    End-to-End Object Detection with Transformers

  26. nvidia/DeepLearningExamples ⭐ 14,386
    State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

  27. dmlc/dgl ⭐ 13,971
    Python package built to ease deep learning on graph, on top of existing DL frameworks.
    🔗 dgl.ai

  28. mlfoundations/open_clip ⭐ 12,152
    Open source implementation of OpenAI's CLIP (Contrastive Language-Image Pre-training).

  29. tencent-hunyuan/HunyuanVideo ⭐ 10,631
    HunyuanVideo: A Systematic Framework For Large Video Generation Model
    🔗 aivideo.hunyuan.tencent.com

  30. kornia/kornia ⭐ 10,593
    🐍 Geometric Computer Vision Library for Spatial AI
    🔗 kornia.readthedocs.io

  31. modelscope/facechain ⭐ 9,460
    FaceChain is a deep-learning toolchain for generating your Digital-Twin.

  32. facebookresearch/pytorch3d ⭐ 9,378
    PyTorch3D is FAIR's library of reusable components for deep learning with 3D data
    🔗 pytorch3d.org

  33. keras-team/autokeras ⭐ 9,245
    AutoML library for deep learning
    🔗 autokeras.com

  34. arogozhnikov/einops ⭐ 9,026
    Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)
    🔗 einops.rocks

  35. bytedance/monolith ⭐ 8,906
    A deep learning framework for large scale recommendation modeling with collisionless embedding and real time training captures.

  36. pyro-ppl/pyro ⭐ 8,818
    Deep universal probabilistic programming with Python and PyTorch
    🔗 pyro.ai

  37. facebookresearch/ImageBind ⭐ 8,721
    ImageBind One Embedding Space to Bind Them All

  38. nvidia/apex ⭐ 8,716
    A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

  39. lucidrains/imagen-pytorch ⭐ 8,316
    Implementation of Imagen, Google's Text-to-Image Neural Network, in Pytorch

  40. google/trax ⭐ 8,226
    Trax — Deep Learning with Clear Code and Speed

  41. xpixelgroup/BasicSR ⭐ 7,596
    Open Source Image and Video Restoration Toolbox for Super-resolution, Denoise, Deblurring, etc. Currently, it includes EDSR, RCAN, SRResNet, SRGAN, ESRGAN, EDVR, BasicVSR, SwinIR, ECBSR, etc. Also support StyleGAN2, DFDNet.
    🔗 basicsr.readthedocs.io/en/latest

  42. google/flax ⭐ 6,671
    Flax is a neural network library for JAX that is designed for flexibility.
    🔗 flax.readthedocs.io

  43. skorch-dev/skorch ⭐ 6,075
    A scikit-learn compatible neural network library that wraps PyTorch

  44. facebookresearch/mmf ⭐ 5,578
    A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
    🔗 mmf.sh

  45. mosaicml/composer ⭐ 5,386
    Supercharge Your Model Training
    🔗 docs.mosaicml.com

  46. nvidiagameworks/kaolin ⭐ 4,826
    A PyTorch Library for Accelerating 3D Deep Learning Research

  47. deci-ai/super-gradients ⭐ 4,818
    Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS.
    🔗 www.supergradients.com

  48. pytorch/ignite ⭐ 4,676
    High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.
    🔗 pytorch-ignite.ai

  49. facebookincubator/AITemplate ⭐ 4,655
    AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

  50. cvg/LightGlue ⭐ 3,956
    LightGlue: Local Feature Matching at Light Speed (ICCV 2023)

  51. google-research/scenic ⭐ 3,592
    Scenic: A Jax Library for Computer Vision Research and Beyond

  52. williamyang1991/VToonify ⭐ 3,585
    [SIGGRAPH Asia 2022] VToonify: Controllable High-Resolution Portrait Video Style Transfer

  53. facebookresearch/PyTorch-BigGraph ⭐ 3,411
    Generate embeddings from large-scale graph-structured data.
    🔗 torchbiggraph.readthedocs.io

  54. pytorch/botorch ⭐ 3,294
    Bayesian optimization in PyTorch
    🔗 botorch.org

  55. alpa-projects/alpa ⭐ 3,137
    Training and serving large-scale neural networks with auto parallelization.
    🔗 alpa.ai

  56. deepmind/dm-haiku ⭐ 3,058
    JAX-based neural network library
    🔗 dm-haiku.readthedocs.io

  57. modelscope/ClearerVoice-Studio ⭐ 3,049
    An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

  58. explosion/thinc ⭐ 2,859
    🔮 A refreshing functional take on deep learning, compatible with your favorite libraries
    🔗 thinc.ai

  59. nerdyrodent/VQGAN-CLIP ⭐ 2,656
    Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.

  60. danielegrattarola/spektral ⭐ 2,381
    Graph Neural Networks with Keras and Tensorflow 2.
    🔗 graphneural.network

  61. google-research/electra ⭐ 2,356
    ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

  62. pytorch/torchrec ⭐ 2,266
    Pytorch domain library for recommendation systems
    🔗 pytorch.org/torchrec

  63. fepegar/torchio ⭐ 2,232
    Medical imaging processing for AI applications.
    🔗 torchio.org

  64. neuralmagic/sparseml ⭐ 2,145
    Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models

  65. jeshraghian/snntorch ⭐ 1,634
    Deep and online learning with spiking neural networks in Python
    🔗 snntorch.readthedocs.io/en/latest

  66. calculatedcontent/WeightWatcher ⭐ 1,622
    The WeightWatcher tool for predicting the accuracy of Deep Neural Networks

  67. tensorly/tensorly ⭐ 1,614
    TensorLy: Tensor Learning in Python.
    🔗 tensorly.org

  68. tensorflow/mesh ⭐ 1,612
    Mesh TensorFlow: Model Parallelism Made Easier

  69. vt-vl-lab/FGVC ⭐ 1,555
    [ECCV 2020] Flow-edge Guided Video Completion

  70. hysts/pytorch_image_classification ⭐ 1,409
    PyTorch implementation of image classification models for CIFAR-10/CIFAR-100/MNIST/FashionMNIST/Kuzushiji-MNIST/ImageNet

  71. xl0/lovely-tensors ⭐ 1,265
    Tensors, for human consumption
    🔗 xl0.github.io/lovely-tensors

  72. deepmind/android_env ⭐ 1,109
    RL research on Android devices.

  73. keras-team/keras-cv ⭐ 1,042
    Industry-strength Computer Vision workflows with Keras

  74. tensorflow/similarity ⭐ 1,022
    TensorFlow Similarity is a python package focused on making similarity learning quick and easy.

  75. kakaobrain/rq-vae-transformer ⭐ 912
    The official implementation of Autoregressive Image Generation using Residual Quantization (CVPR '22)

  76. deepmind/chex ⭐ 865
    Chex is a library of utilities for helping to write reliable JAX code
    🔗 chex.readthedocs.io

  77. mlfoundations/datacomp ⭐ 722
    DataComp: In search of the next generation of multimodal datasets
    🔗 datacomp.ai

  78. whitead/dmol-book ⭐ 660
    Deep learning for molecules and materials book
    🔗 dmol.pub

  79. allenai/reward-bench ⭐ 610
    RewardBench is a benchmark designed to evaluate the capabilities and safety of reward models (including those trained with Direct Preference Optimization, DPO)
    🔗 huggingface.co/spaces/allenai/reward-bench

Machine Learning - Interpretability

Machine learning interpretability libraries. Covers explainability, prediction explainations, dashboards, understanding knowledge development in training.

  1. slundberg/shap ⭐ 24,118
    A game theoretic approach to explain the output of any machine learning model.
    🔗 shap.readthedocs.io

  2. marcotcr/lime ⭐ 11,934
    Lime: Explaining the predictions of any machine learning classifier

  3. interpretml/interpret ⭐ 6,614
    Fit interpretable models. Explain blackbox machine learning.
    🔗 interpret.ml/docs

  4. arize-ai/phoenix ⭐ 6,277
    AI Observability & Evaluation
    🔗 arize.com/docs/phoenix

  5. pytorch/captum ⭐ 5,305
    Model interpretability and understanding for PyTorch
    🔗 captum.ai

  6. tensorflow/lucid ⭐ 4,699
    A collection of infrastructure and tools for research in neural network interpretability.

  7. pair-code/lit ⭐ 3,572
    The Learning Interpretability Tool: Interactively analyze ML models to understand their behavior in an extensible and framework agnostic interface.
    🔗 pair-code.github.io/lit

  8. maif/shapash ⭐ 2,909
    🔅 Shapash: User-friendly Explainability and Interpretability to Develop Reliable and Transparent Machine Learning Models
    🔗 maif.github.io/shapash

  9. teamhg-memex/eli5 ⭐ 2,772
    A library for debugging/inspecting machine learning classifiers and explaining their predictions
    🔗 eli5.readthedocs.io

  10. eleutherai/pythia ⭐ 2,561
    Interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers

  11. seldonio/alibi ⭐ 2,532
    Algorithms for explaining machine learning models
    🔗 docs.seldon.io/projects/alibi/en/stable

  12. oegedijk/explainerdashboard ⭐ 2,409
    Quickly build Explainable AI dashboards that show the inner workings of so-called "blackbox" machine learning models.
    🔗 explainerdashboard.readthedocs.io

  13. transformerlensorg/TransformerLens ⭐ 2,335
    A library for mechanistic interpretability of GPT-style language models
    🔗 transformerlensorg.github.io/transformerlens

  14. jalammar/ecco ⭐ 2,041
    Explain, analyze, and visualize NLP language models. Ecco creates interactive visualizations directly in Jupyter notebooks explaining the behavior of Transformer-based language models (like GPT2, BERT, RoBERTA, T5, and T0).
    🔗 ecco.readthedocs.io

  15. google-deepmind/penzai ⭐ 1,801
    A JAX library for writing models as legible, functional pytree data structures, along with tools for visualizing, modifying, and analyzing them. Penzai focuses on making it easy to do stuff with models after they have been trained
    🔗 penzai.readthedocs.io

  16. trusted-ai/AIX360 ⭐ 1,710
    Interpretability and explainability of data and machine learning models
    🔗 aix360.res.ibm.com

  17. stanfordnlp/pyreft ⭐ 1,494
    Stanford NLP Python library for Representation Finetuning (ReFT)
    🔗 arxiv.org/abs/2404.03592

  18. cdpierse/transformers-interpret ⭐ 1,358
    Model explainability that works seamlessly with 🤗 transformers. Explain your transformers model in just 2 lines of code.

  19. selfexplainml/PiML-Toolbox ⭐ 1,256
    PiML (Python Interpretable Machine Learning) toolbox for model development & diagnostics
    🔗 selfexplainml.github.io/piml-toolbox

  20. ethicalml/xai ⭐ 1,188
    XAI is a Machine Learning library that is designed with AI explainability in its core. XAI contains various tools that enable for analysis and evaluation of data and models
    🔗 ethical.institute/principles.html#commitment-3

  21. salesforce/OmniXAI ⭐ 934
    OmniXAI: A Library for eXplainable AI

  22. jbloomaus/SAELens ⭐ 871
    Training Sparse Autoencoders on LLms. Analyse sparse autoencoders and neural network internals.
    🔗 jbloomaus.github.io/saelens

  23. andyzoujm/representation-engineering ⭐ 848
    Representation Engineering: A Top-Down Approach to AI Transparency
    🔗 www.ai-transparency.org

  24. stanfordnlp/pyvene ⭐ 764
    Library for intervening on the internal states of PyTorch models. Interventions are an important operation in many areas of AI, including model editing, steering, robustness, and interpretability.
    🔗 pyvene.ai

  25. labmlai/inspectus ⭐ 672
    Inspectus provides visualization tools for attention mechanisms in deep learning models. It provides a set of comprehensive views, making it easier to understand how these models work.

  26. ndif-team/nnsight ⭐ 606
    The nnsight package enables interpreting and manipulating the internals of deep learned models.
    🔗 nnsight.net

  27. alignmentresearch/tuned-lens ⭐ 505
    Tools for understanding how transformer predictions are built layer-by-layer
    🔗 tuned-lens.readthedocs.io/en/latest

Machine Learning - Ops

MLOps tools, frameworks and libraries: intersection of machine learning, data engineering and DevOps; deployment, health, diagnostics and governance of ML models.

  1. apache/airflow ⭐ 40,962
    Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
    🔗 airflow.apache.org

  2. ray-project/ray ⭐ 37,951
    Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
    🔗 ray.io

  3. mlflow/mlflow ⭐ 21,198
    Open source platform for the machine learning lifecycle
    🔗 mlflow.org

  4. prefecthq/prefect ⭐ 19,754
    Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
    🔗 prefect.io

  5. kestra-io/kestra ⭐ 19,741
    ⚡ Workflow Automation Platform. Orchestrate & Schedule code in any language, run anywhere, 600+ plugins. Alternative to Airflow, n8n, Rundeck, VMware vRA, Zapier ...
    🔗 kestra.io

  6. spotify/luigi ⭐ 18,371
    Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

  7. iterative/dvc ⭐ 14,644
    🦉 Data Versioning and ML Experiments
    🔗 dvc.org

  8. horovod/horovod ⭐ 14,541
    Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
    🔗 horovod.ai

  9. jlowin/fastmcp ⭐ 14,335
    FastMCP is the standard framework for building MCP servers and clients. FastMCP 1.0 was incorporated into the official MCP Python SDK.
    🔗 gofastmcp.com

  10. langfuse/langfuse ⭐ 13,602
    🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
    🔗 langfuse.com/docs

  11. dagster-io/dagster ⭐ 13,559
    An orchestration platform for the development, production, and observation of data assets.
    🔗 dagster.io

  12. bentoml/OpenLLM ⭐ 11,544
    Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.
    🔗 bentoml.com

  13. ludwig-ai/ludwig ⭐ 11,524
    Low-code framework for building custom LLMs, neural networks, and other AI models
    🔗 ludwig.ai

  14. dbt-labs/dbt-core ⭐ 11,077
    dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
    🔗 getdbt.com

  15. great-expectations/great_expectations ⭐ 10,552
    Always know what to expect from your data.
    🔗 docs.greatexpectations.io

  16. kedro-org/kedro ⭐ 10,423
    Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
    🔗 kedro.org

  17. huggingface/text-generation-inference ⭐ 10,311
    A Rust, Python and gRPC server for text generation inference. Used in production at HuggingFace to power Hugging Chat, the Inference API and Inference Endpoint.
    🔗 hf.co/docs/text-generation-inference

  18. netflix/metaflow ⭐ 8,955
    Build, Manage and Deploy AI/ML Systems
    🔗 metaflow.org

  19. activeloopai/deeplake ⭐ 8,712
    Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
    🔗 activeloop.ai

  20. mage-ai/mage-ai ⭐ 8,415
    🧙 Build, run, and manage data pipelines for integrating and transforming data.
    🔗 www.mage.ai

  21. bentoml/BentoML ⭐ 7,885
    The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
    🔗 bentoml.com

  22. internlm/lmdeploy ⭐ 6,692
    LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
    🔗 lmdeploy.readthedocs.io/en/latest

  23. evidentlyai/evidently ⭐ 6,385
    Evidently is ​​an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.
    🔗 discord.gg/xzjkranp8b

  24. flyteorg/flyte ⭐ 6,351
    Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
    🔗 flyte.org

  25. feast-dev/feast ⭐ 6,209
    The Open Source Feature Store for AI/ML
    🔗 feast.dev

  26. allegroai/clearml ⭐ 6,090
    ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
    🔗 clear.ml/docs

  27. adap/flower ⭐ 6,064
    Flower: A Friendly Federated AI Framework
    🔗 flower.ai

  28. aimhubio/aim ⭐ 5,698
    Aim 💫 — An easy-to-use & supercharged open-source experiment tracker.
    🔗 aimstack.io

  29. zenml-io/zenml ⭐ 4,696
    ZenML 🙏: The bridge between ML and Ops. https://zenml.io.
    🔗 zenml.io

  30. internlm/xtuner ⭐ 4,641
    An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
    🔗 xtuner.readthedocs.io/zh-cn/latest

  31. orchest/orchest ⭐ 4,130
    Build data pipelines, the easy way 🛠️
    🔗 orchest.readthedocs.io/en/stable

  32. kubeflow/pipelines ⭐ 3,876
    Machine Learning Pipelines for Kubeflow
    🔗 www.kubeflow.org/docs/components/pipelines

  33. polyaxon/polyaxon ⭐ 3,652
    MLOps Tools For Managing & Orchestrating The Machine Learning LifeCycle
    🔗 polyaxon.com

  34. ploomber/ploomber ⭐ 3,594
    The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
    🔗 docs.ploomber.io

  35. towhee-io/towhee ⭐ 3,385
    Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
    🔗 towhee.io

  36. determined-ai/determined ⭐ 3,156
    Determined is an open-source machine learning platform that simplifies distributed training, hyperparameter tuning, experiment tracking, and resource management. Works with PyTorch and TensorFlow.
    🔗 determined.ai

  37. leptonai/leptonai ⭐ 2,768
    A Pythonic framework to simplify AI service building
    🔗 lepton.ai

  38. azure/PyRIT ⭐ 2,654
    The Python Risk Identification Tool for generative AI (PyRIT) is an open access automation framework to empower security professionals and ML engineers to red team foundation models and their applications.
    🔗 azure.github.io/pyrit

  39. labmlai/labml ⭐ 2,188
    🔎 Monitor deep learning model training and hardware usage from your mobile phone 📱
    🔗 labml.ai

  40. apache/hamilton ⭐ 2,182
    Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
    🔗 hamilton.apache.org

  41. meltano/meltano ⭐ 2,131
    Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
    🔗 meltano.com

  42. dstackai/dstack ⭐ 1,830
    dstack is an open-source container orchestrator that simplifies workload orchestration and drives GPU utilization for ML teams. It works with any GPU cloud, on-prem cluster, or accelerated hardware.
    🔗 dstack.ai/docs

  43. dagworks-inc/burr ⭐ 1,720
    Build applications that make decisions (chatbots, agents, simulations, etc...). Monitor, trace, persist, and execute on your own infrastructure.
    🔗 burr.apache.org

  44. hi-primus/optimus ⭐ 1,514
    🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
    🔗 hi-optimus.com

  45. vllm-project/production-stack ⭐ 1,477
    vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
    🔗 docs.vllm.ai/projects/production-stack

  46. kubeflow/examples ⭐ 1,439
    A repository to host extended examples and tutorials

  47. substratusai/kubeai ⭐ 1,016
    AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.
    🔗 www.kubeai.org

Machine Learning - Reinforcement

Machine learning libraries and toolkits that cross over with reinforcement learning in some way: agent reinforcement learning, agent environemnts, RLHF

  1. openai/gym ⭐ 36,223
    A toolkit for developing and comparing reinforcement learning algorithms.
    🔗 www.gymlibrary.dev

  2. openai/baselines ⭐ 16,350
    OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

  3. google/dopamine ⭐ 10,762
    Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.
    🔗 github.com/google/dopamine

  4. farama-foundation/Gymnasium ⭐ 9,611
    An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)
    🔗 gymnasium.farama.org

  5. thu-ml/tianshou ⭐ 8,623
    An elegant PyTorch deep reinforcement learning library.
    🔗 tianshou.org

  6. deepmind/pysc2 ⭐ 8,143
    StarCraft II Learning Environment

  7. lucidrains/PaLM-rlhf-pytorch ⭐ 7,855
    Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM

  8. tensorlayer/TensorLayer ⭐ 7,365
    Deep Learning and Reinforcement Learning Library for Scientists and Engineers
    🔗 tensorlayerx.com

  9. keras-rl/keras-rl ⭐ 5,555
    Deep Reinforcement Learning for Keras.
    🔗 keras-rl.readthedocs.io

  10. deepmind/dm_control ⭐ 4,149
    Google DeepMind's software stack for physics-based simulation and Reinforcement Learning environments, using MuJoCo.

  11. ai4finance-foundation/ElegantRL ⭐ 4,087
    Massively Parallel Deep Reinforcement Learning. 🔥
    🔗 ai4finance.org

  12. deepmind/acme ⭐ 3,734
    A library of reinforcement learning components and agents

  13. facebookresearch/ReAgent ⭐ 3,638
    A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)
    🔗 reagent.ai

  14. opendilab/DI-engine ⭐ 3,479
    DI-engine is a generalized decision intelligence engine for PyTorch and JAX. It provides python-first and asynchronous-native task and middleware abstractions
    🔗 di-engine-docs.readthedocs.io

  15. pettingzoo-team/PettingZoo ⭐ 3,024
    An API standard for multi-agent reinforcement learning environments, with popular reference environments and related utilities
    🔗 pettingzoo.farama.org

  16. eureka-research/Eureka ⭐ 3,010
    Official Repository for "Eureka: Human-Level Reward Design via Coding Large Language Models" (ICLR 2024)
    🔗 eureka-research.github.io

  17. pytorch/rl ⭐ 2,906
    A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.
    🔗 pytorch.org/rl

  18. kzl/decision-transformer ⭐ 2,604
    Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.

  19. arise-initiative/robosuite ⭐ 1,799
    robosuite: A Modular Simulation Framework and Benchmark for Robot Learning
    🔗 robosuite.ai

  20. anthropics/hh-rlhf ⭐ 1,762
    Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"
    🔗 arxiv.org/abs/2204.05862

  21. humancompatibleai/imitation ⭐ 1,530
    Clean PyTorch implementations of imitation and reward learning algorithms
    🔗 imitation.readthedocs.io

  22. denys88/rl_games ⭐ 1,145
    RL Games: High performance RL library

  23. google-deepmind/meltingpot ⭐ 719
    A suite of test scenarios for multi-agent reinforcement learning.

Natural Language Processing

Natural language processing libraries and toolkits: text processing, topic modelling, tokenisers, chatbots. Also see the LLMs and ChatGPT category for crossover.

  1. huggingface/transformers ⭐ 146,785
    🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
    🔗 huggingface.co/transformers

  2. myshell-ai/OpenVoice ⭐ 32,931
    Instant voice cloning by MIT and MyShell. Audio foundation model.
    🔗 research.myshell.ai/open-voice

  3. explosion/spaCy ⭐ 31,939
    💫 Industrial-strength Natural Language Processing (NLP) in Python
    🔗 spacy.io

  4. pytorch/fairseq ⭐ 31,613
    Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

  5. vikparuchuri/marker ⭐ 26,453
    Marker converts PDF, EPUB, and MOBI to markdown. It's 10x faster than nougat, more accurate on most documents, and has low hallucination risk.
    🔗 www.datalab.to

  6. microsoft/unilm ⭐ 21,514
    Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
    🔗 aka.ms/generalai

  7. huggingface/datasets ⭐ 20,360
    🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
    🔗 huggingface.co/docs/datasets

  8. vikparuchuri/surya ⭐ 17,776
    OCR, layout analysis, reading order, table recognition in 90+ languages
    🔗 www.datalab.to

  9. ukplab/sentence-transformers ⭐ 17,118
    State-of-the-Art Text Embeddings
    🔗 www.sbert.net

  10. m-bain/whisperX ⭐ 16,694
    WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

  11. rare-technologies/gensim ⭐ 16,094
    Topic Modelling for Humans
    🔗 radimrehurek.com/gensim

  12. openai/tiktoken ⭐ 15,079
    tiktoken is a fast BPE tokeniser for use with OpenAI's models.

  13. nvidia/NeMo ⭐ 15,061
    A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
    🔗 docs.nvidia.com/nemo-framework/user-guide/latest/overview.html

  14. gunthercox/ChatterBot ⭐ 14,364
    ChatterBot is a machine learning, conversational dialog engine for creating chat bots
    🔗 docs.chatterbot.us

  15. flairnlp/flair ⭐ 14,219
    A very simple framework for state-of-the-art Natural Language Processing (NLP)
    🔗 flairnlp.github.io/flair

  16. nltk/nltk ⭐ 14,167
    NLTK Source
    🔗 www.nltk.org

  17. jina-ai/clip-as-service ⭐ 12,698
    🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
    🔗 clip-as-service.jina.ai

  18. allenai/allennlp ⭐ 11,860
    An open-source NLP research library, built on PyTorch.
    🔗 www.allennlp.org

  19. facebookresearch/seamless_communication ⭐ 11,589
    Foundational Models for State-of-the-Art Speech and Text Translation

  20. neuml/txtai ⭐ 11,197
    💡 All-in-one open-source AI framework for semantic search, LLM orchestration and language model workflows
    🔗 neuml.github.io/txtai

  21. google/sentencepiece ⭐ 11,070
    Unsupervised text tokenizer for Neural Network-based text generation.

  22. facebookresearch/ParlAI ⭐ 10,598
    A framework for training and evaluating AI models on a variety of openly available dialogue datasets.
    🔗 parl.ai

  23. doccano/doccano ⭐ 10,134
    Open source annotation tool for machine learning practitioners.

  24. speechbrain/speechbrain ⭐ 10,119
    A PyTorch-based Speech Toolkit
    🔗 speechbrain.github.io

  25. facebookresearch/nougat ⭐ 9,525
    Implementation of Nougat Neural Optical Understanding for Academic Documents
    🔗 facebookresearch.github.io/nougat

  26. sloria/TextBlob ⭐ 9,387
    Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
    🔗 textblob.readthedocs.io

  27. espnet/espnet ⭐ 9,279
    End-to-End Speech Processing Toolkit
    🔗 espnet.github.io/espnet

  28. togethercomputer/OpenChatKit ⭐ 9,010
    OpenChatKit provides a powerful, open-source base to create both specialized and general purpose chatbots

  29. clips/pattern ⭐ 8,822
    Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
    🔗 github.com/clips/pattern/wiki

  30. deeppavlov/DeepPavlov ⭐ 6,906
    An open source library for deep learning end-to-end dialog systems and chatbots.
    🔗 deeppavlov.ai

  31. maartengr/BERTopic ⭐ 6,888
    Leveraging BERT and c-TF-IDF to create easily interpretable topics.
    🔗 maartengr.github.io/bertopic

  32. quivrhq/MegaParse ⭐ 6,560
    File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.
    🔗 megaparse.com

  33. facebookresearch/metaseq ⭐ 6,527
    A codebase for working with Open Pre-trained Transformers, originally forked from fairseq.

  34. kingoflolz/mesh-transformer-jax ⭐ 6,342
    Model parallel transformers in JAX and Haiku

  35. aiwaves-cn/agents ⭐ 5,648
    An Open-source Framework for Data-centric, Self-evolving Autonomous Language Agents

  36. layout-parser/layout-parser ⭐ 5,362
    A Unified Toolkit for Deep Learning Based Document Image Analysis
    🔗 layout-parser.github.io

  37. salesforce/CodeGen ⭐ 5,106
    CodeGen is a family of open-source model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex.

  38. minimaxir/textgenrnn ⭐ 4,937
    Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.

  39. makcedward/nlpaug ⭐ 4,586
    Data augmentation for NLP
    🔗 makcedward.github.io

  40. argilla-io/argilla ⭐ 4,571
    Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
    🔗 docs.argilla.io

  41. facebookresearch/DrQA ⭐ 4,490
    Reading Wikipedia to Answer Open-Domain Questions

  42. thilinarajapakse/simpletransformers ⭐ 4,196
    Transformers for Information Retrieval, Text Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conversational AI
    🔗 simpletransformers.ai

  43. promptslab/Promptify ⭐ 3,962
    Prompt Engineering | Prompt Versioning | Use GPT or other prompt based models to get structured output. Join our discord for Prompt-Engineering, LLMs and other latest research
    🔗 discord.gg/m88xfymbk6

  44. maartengr/KeyBERT ⭐ 3,933
    A minimal and easy-to-use keyword extraction technique that leverages BERT embeddings to create keywords and keyphrases that are most similar to a document.
    🔗 maartengr.github.io/keybert

  45. life4/textdistance ⭐ 3,476
    📐 Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

  46. jsvine/markovify ⭐ 3,351
    A simple, extensible Markov chain generator.

  47. bytedance/lightseq ⭐ 3,282
    LightSeq: A High Performance Library for Sequence Processing and Generation

  48. errbotio/errbot ⭐ 3,200
    Errbot is a chatbot, a daemon that connects to your favorite chat service and bring your tools and some fun into the conversation.
    🔗 errbot.io

  49. neuralmagic/deepsparse ⭐ 3,158
    Sparsity-aware deep learning inference runtime for CPUs
    🔗 neuralmagic.com/deepsparse

  50. huawei-noah/Pretrained-Language-Model ⭐ 3,115
    Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

  51. ddangelov/Top2Vec ⭐ 3,062
    Top2Vec learns jointly embedded topic, document and word vectors.

  52. salesforce/CodeT5 ⭐ 3,023
    Home of CodeT5: Open Code LLMs for Code Understanding and Generation
    🔗 arxiv.org/abs/2305.07922

  53. jbesomi/texthero ⭐ 2,904
    Text preprocessing, representation and visualization from zero to hero.
    🔗 texthero.org

  54. bigscience-workshop/promptsource ⭐ 2,896
    Toolkit for creating, sharing and using natural language prompts.

  55. huggingface/neuralcoref ⭐ 2,882
    ✨Fast Coreference Resolution in spaCy with Neural Networks
    🔗 huggingface.co/coref

  56. nvidia/nv-ingest ⭐ 2,704
    NVIDIA-Ingest is a scalable, performance-oriented document content and metadata extraction microservice.

  57. huggingface/setfit ⭐ 2,520
    SetFit is an efficient and prompt-free framework for few-shot fine-tuning of Sentence Transformers.
    🔗 hf.co/docs/setfit

  58. alibaba/EasyNLP ⭐ 2,154
    EasyNLP: A Comprehensive and Easy-to-use NLP Toolkit

  59. jamesturk/jellyfish ⭐ 2,144
    🪼 a python library for doing approximate and phonetic matching of strings.
    🔗 jamesturk.github.io/jellyfish

  60. urchade/GLiNER ⭐ 2,141
    Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024
    🔗 arxiv.org/abs/2311.08526

  61. thudm/P-tuning-v2 ⭐ 2,046
    An optimized deep prompt tuning strategy comparable to fine-tuning across scales and tasks

  62. featureform/featureform ⭐ 1,924
    The Virtual Feature Store. Turn your existing data infrastructure into a feature store.
    🔗 www.featureform.com

  63. marella/ctransformers ⭐ 1,866
    Python bindings for the Transformer models implemented in C/C++ using GGML library.

  64. explosion/spacy-models ⭐ 1,763
    💫 Models for the spaCy Natural Language Processing (NLP) library
    🔗 spacy.io

  65. deepset-ai/FARM ⭐ 1,755
    🏡 Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.
    🔗 farm.deepset.ai

  66. nomic-ai/nomic ⭐ 1,745
    Interact, analyze and structure massive text, image, embedding, audio and video datasets
    🔗 atlas.nomic.ai

  67. chonkie-inc/chonkie ⭐ 1,743
    🦛 CHONK your texts with Chonkie ✨ — The no-nonsense RAG chunking library
    🔗 docs.chonkie.ai

  68. franck-dernoncourt/NeuroNER ⭐ 1,712
    Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.
    🔗 neuroner.com

  69. google-research/language ⭐ 1,685
    Shared repository for open-sourced projects from the Google AI Language team.
    🔗 ai.google/research/teams/language

  70. plasticityai/magnitude ⭐ 1,649
    A fast, efficient universal vector embedding utility package.

  71. arxiv-vanity/arxiv-vanity ⭐ 1,625
    Renders papers from arXiv as responsive web pages so you don't have to squint at a PDF.
    🔗 www.arxiv-vanity.com

  72. chrismattmann/tika-python ⭐ 1,596
    Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.

  73. intellabs/fastRAG ⭐ 1,588
    Efficient Retrieval Augmentation and Generation Framework

  74. answerdotai/ModernBERT ⭐ 1,435
    Bringing BERT into modernity via both architecture changes and scaling
    🔗 arxiv.org/abs/2412.13663

  75. dmmiller612/bert-extractive-summarizer ⭐ 1,434
    Easy to use extractive text summarization with BERT

  76. pemistahl/lingua-py ⭐ 1,419
    The most accurate natural language detection library for Python, suitable for short text and mixed-language text

  77. gunthercox/chatterbot-corpus ⭐ 1,400
    A multilingual dialog corpus
    🔗 corpus.chatterbot.us

  78. jonasgeiping/cramming ⭐ 1,338
    Cramming the training of a (BERT-type) language model into limited compute.

  79. openai/grade-school-math ⭐ 1,292
    GSM8K, a dataset of 8.5K high quality linguistically diverse grade school math word problems

  80. unitaryai/detoxify ⭐ 1,077
    Toxic Comment Classification with Pytorch Lightning and Transformers
    🔗 www.unitary.ai

  81. abertsch72/unlimiformer ⭐ 1,062
    Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"

  82. norskregnesentral/skweak ⭐ 926
    skweak: A software toolkit for weak supervision applied to NLP tasks

  83. keras-team/keras-hub ⭐ 913
    Pretrained model hub for Keras 3.
    🔗 keras.io/keras_hub

  84. explosion/spacy-streamlit ⭐ 840
    👑 spaCy building blocks and visualizers for Streamlit apps
    🔗 share.streamlit.io/ines/spacy-streamlit-demo/master/app.py

  85. paddlepaddle/RocketQA ⭐ 778
    🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.

  86. maartengr/PolyFuzz ⭐ 769
    Performs fuzzy string matching, string grouping, and contains extensive evaluation functions. PolyFuzz is meant to bring fuzzy string matching techniques together within a single framework.
    🔗 maartengr.github.io/polyfuzz

  87. webis-de/small-text ⭐ 618
    Small-Text provides state-of-the-art Active Learning for Text Classification. Several pre-implemented Query Strategies, Initialization Strategies, and Stopping Critera are provided, which can be easily mixed and matched to build active learning experiments or applications.
    🔗 small-text.readthedocs.io

  88. babelscape/rebel ⭐ 533
    REBEL is a seq2seq model that simplifies Relation Extraction (EMNLP 2021).

Packaging

Python packaging, dependency management and bundling.

  1. astral-sh/uv ⭐ 61,043
    An extremely fast Python package installer and resolver, written in Rust. Designed as a drop-in replacement for pip and pip-compile.
    🔗 docs.astral.sh/uv

  2. pyenv/pyenv ⭐ 42,573
    pyenv lets you easily switch between multiple versions of Python.

  3. python-poetry/poetry ⭐ 33,413
    Python packaging and dependency management made easy
    🔗 python-poetry.org

  4. pypa/pipenv ⭐ 25,076
    A virtualenv management tool that supports a multitude of systems and nicely bridges the gaps between pip, python and virtualenv.
    🔗 pipenv.pypa.io

  5. mitsuhiko/rye ⭐ 14,266
    a Hassle-Free Python Experience
    🔗 rye.astral.sh

  6. pyinstaller/pyinstaller ⭐ 12,492
    Freeze (package) Python programs into stand-alone executables
    🔗 www.pyinstaller.org

  7. pypa/pipx ⭐ 11,790
    Install and Run Python Applications in Isolated Environments
    🔗 pipx.pypa.io

  8. pdm-project/pdm ⭐ 8,418
    A modern Python package and dependency manager supporting the latest PEP standards
    🔗 pdm-project.org

  9. conda-forge/miniforge ⭐ 7,957
    A conda-forge distribution.
    🔗 conda-forge.org/download

  10. jazzband/pip-tools ⭐ 7,923
    A set of tools to keep your pinned Python dependencies fresh (pip-compile + pip-sync)
    🔗 pip-tools.rtfd.io

  11. mamba-org/mamba ⭐ 7,481
    The Fast Cross-Platform Package Manager: mamba is a reimplementation of the conda package manager in C++
    🔗 mamba.readthedocs.io

  12. conda/conda ⭐ 6,985
    A system-level, binary package and environment manager running on all major operating systems and platforms.
    🔗 docs.conda.io/projects/conda

  13. pypa/hatch ⭐ 6,693
    Modern, extensible Python project management
    🔗 hatch.pypa.io/latest

  14. indygreg/PyOxidizer ⭐ 5,883
    A modern Python application packaging and distribution tool

  15. pypa/virtualenv ⭐ 4,935
    A tool to create isolated Python environments. Since Python 3.3, a subset of it has been integrated into the standard lib venv module.
    🔗 virtualenv.pypa.io

  16. prefix-dev/pixi ⭐ 4,782
    pixi is a cross-platform, multi-language package manager and workflow tool built on the foundation of the conda ecosystem.
    🔗 pixi.sh

  17. spack/spack ⭐ 4,721
    A flexible package manager that supports multiple versions, configurations, platforms, and compilers.
    🔗 spack.io

  18. pantsbuild/pex ⭐ 4,003
    A tool for generating .pex (Python EXecutable) files, lock files and venvs.
    🔗 docs.pex-tool.org

  19. beeware/briefcase ⭐ 2,978
    Tools to support converting a Python project into a standalone native application.
    🔗 briefcase.readthedocs.io

  20. pypa/flit ⭐ 2,209
    Simplified packaging of Python modules
    🔗 flit.pypa.io

  21. linkedin/shiv ⭐ 1,853
    shiv is a command line utility for building fully self contained Python zipapps as outlined in PEP 441, but with all their dependencies included.

  22. marcelotduarte/cx_Freeze ⭐ 1,473
    Creates standalone executables from Python scripts with the same performance as the original script. It is cross-platform and should work on any platform that Python runs on.
    🔗 marcelotduarte.github.io/cx_freeze

  23. ofek/pyapp ⭐ 1,437
    Runtime installer for Python applications
    🔗 ofek.dev/pyapp

  24. pypa/gh-action-pypi-publish ⭐ 1,051
    The blessed :octocat: GitHub Action, for publishing your 📦 distribution files to PyPI, the tokenless way: https://github.com/marketplace/actions/pypi-publish
    🔗 packaging.python.org/guides/publishing-package-distribution-releases-using-github-actions-ci-cd-workflows

  25. py2exe/py2exe ⭐ 948
    Create standalone Windows programs from Python code
    🔗 www.py2exe.org

  26. prefix-dev/rip ⭐ 666
    RIP is a library that allows the resolving and installing of Python PyPI packages from Rust into a virtual environment. It's based on our experience with building Rattler and aims to provide the same experience but for PyPI instead of Conda.
    🔗 prefix.dev

  27. snok/install-poetry ⭐ 622
    Github action for installing and configuring Poetry

  28. python-poetry/install.python-poetry.org ⭐ 230
    The official Poetry installation script
    🔗 install.python-poetry.org

Pandas

Pandas and dataframe libraries: data analysis, statistical reporting, pandas GUIs, pandas performance optimisations.

  1. pandas-dev/pandas ⭐ 45,938
    Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
    🔗 pandas.pydata.org

  2. pola-rs/polars ⭐ 34,368
    Dataframes powered by a multithreaded, vectorized query engine, written in Rust
    🔗 docs.pola.rs

  3. duckdb/duckdb ⭐ 30,886
    DuckDB is an analytical in-process SQL database management system
    🔗 www.duckdb.org

  4. gventuri/pandas-ai ⭐ 20,885
    Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.
    🔗 pandas-ai.com

  5. kanaries/pygwalker ⭐ 15,009
    PyGWalker: Turn your dataframe into an interactive UI for visual analysis
    🔗 kanaries.net/pygwalker

  6. ydataai/ydata-profiling ⭐ 13,014
    1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
    🔗 docs.sdk.ydata.ai

  7. rapidsai/cudf ⭐ 9,037
    cuDF is a GPU DataFrame library for loading joining, aggregating, filtering, and otherwise manipulating data
    🔗 docs.rapids.ai/api/cudf/stable

  8. deepseek-ai/smallpond ⭐ 4,732
    A lightweight data processing framework built on DuckDB and 3FS.

  9. aws/aws-sdk-pandas ⭐ 4,038
    pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
    🔗 aws-sdk-pandas.readthedocs.io

  10. unionai-oss/pandera ⭐ 3,906
    A light-weight, flexible, and expressive statistical data testing library
    🔗 www.union.ai/pandera

  11. nalepae/pandarallel ⭐ 3,775
    A simple and efficient tool to parallelize Pandas operations on all available CPUs
    🔗 nalepae.github.io/pandarallel

  12. adamerose/PandasGUI ⭐ 3,236
    A GUI for Pandas DataFrames

  13. blaze/blaze ⭐ 3,201
    NumPy and Pandas interface to Big Data
    🔗 blaze.pydata.org

  14. eventual-inc/Daft ⭐ 3,090
    Distributed query engine providing simple and reliable data processing for any modality and scale
    🔗 daft.ai

  15. pydata/pandas-datareader ⭐ 3,065
    Extract data from a wide range of Internet sources into a pandas DataFrame.
    🔗 pydata.github.io/pandas-datareader/stable/index.html

  16. delta-io/delta-rs ⭐ 2,856
    A native Rust library for Delta Lake, with bindings into Python
    🔗 delta-io.github.io/delta-rs

  17. scikit-learn-contrib/sklearn-pandas ⭐ 2,837
    Pandas integration with sklearn

  18. jmcarpenter2/swifter ⭐ 2,619
    A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner

  19. fugue-project/fugue ⭐ 2,093
    A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
    🔗 fugue-tutorials.readthedocs.io

  20. pyjanitor-devs/pyjanitor ⭐ 1,431
    Clean APIs for data cleaning. Python implementation of R package Janitor
    🔗 pyjanitor-devs.github.io/pyjanitor

  21. holoviz/hvplot ⭐ 1,210
    A high-level plotting API for pandas, dask, xarray, and networkx built on HoloViews
    🔗 hvplot.holoviz.org

  22. renumics/spotlight ⭐ 1,185
    Interactively explore unstructured datasets from your dataframe.
    🔗 renumics.com

  23. machow/siuba ⭐ 1,172
    Python library for using dplyr like syntax with pandas and SQL
    🔗 siuba.org

  24. tkrabel/bamboolib ⭐ 949
    bamboolib - a GUI for pandas DataFrames
    🔗 bamboolib.com

  25. mwouts/itables ⭐ 882
    This packages changes how Pandas and Polars DataFrames are rendered in Jupyter Notebooks. With itables you can display your tables as interactive DataTables that you can sort, paginate, scroll or filter.
    🔗 mwouts.github.io/itables

Performance

Performance, parallelisation and low level libraries.

  1. celery/celery ⭐ 26,785
    Distributed Task Queue (development branch)
    🔗 docs.celeryq.dev

  2. google/flatbuffers ⭐ 24,439
    FlatBuffers: Memory Efficient Serialization Library
    🔗 flatbuffers.dev

  3. pybind/pybind11 ⭐ 16,920
    Seamless operability between C++11 and Python
    🔗 pybind11.readthedocs.io

  4. exaloop/codon ⭐ 15,778
    A high-performance, zero-overhead, extensible Python compiler with built-in NumPy support
    🔗 docs.exaloop.io/codon

  5. dask/dask ⭐ 13,330
    Parallel computing with task scheduling
    🔗 dask.org

  6. numba/numba ⭐ 10,512
    NumPy aware dynamic Python compiler using LLVM
    🔗 numba.pydata.org

  7. modin-project/modin ⭐ 10,222
    Modin: Scale your Pandas workflows by changing a single line of code
    🔗 modin.readthedocs.io

  8. vaexio/vaex ⭐ 8,406
    Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
    🔗 vaex.io

  9. nebuly-ai/optimate ⭐ 8,372
    A collection of libraries to optimise AI model performances
    🔗 www.nebuly.com

  10. mher/flower ⭐ 6,807
    Real-time monitor and web admin for Celery distributed task queue
    🔗 flower.readthedocs.io

  11. python-trio/trio ⭐ 6,600
    Trio – a friendly Python library for async concurrency and I/O
    🔗 trio.readthedocs.io

  12. ultrajson/ultrajson ⭐ 4,429
    Ultra fast JSON decoder and encoder written in C with Python bindings
    🔗 pypi.org/project/ujson

  13. airtai/faststream ⭐ 4,158
    FastStream is a powerful and easy-to-use Python framework for building asynchronous services interacting with event streams such as Apache Kafka, RabbitMQ, NATS and Redis.
    🔗 faststream.ag2.ai/latest

  14. tlkh/asitop ⭐ 4,130
    Perf monitoring CLI tool for Apple Silicon
    🔗 tlkh.github.io/asitop

  15. facebookincubator/cinder ⭐ 3,650
    Cinder is Meta's internal performance-oriented production version of CPython.
    🔗 trycinder.com

  16. ipython/ipyparallel ⭐ 2,625
    IPython Parallel: Interactive Parallel Computing in Python
    🔗 ipyparallel.readthedocs.io

  17. intel/intel-extension-for-transformers ⭐ 2,169
    ⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

  18. h5py/h5py ⭐ 2,149
    HDF5 for Python -- The h5py package is a Pythonic interface to the HDF5 binary data format.
    🔗 www.h5py.org

  19. agronholm/anyio ⭐ 2,109
    High level asynchronous concurrency and networking framework that works on top of either trio or asyncio

  20. tiangolo/asyncer ⭐ 2,000
    Asyncer, async and await, focused on developer experience.
    🔗 asyncer.tiangolo.com

  21. intel/intel-extension-for-pytorch ⭐ 1,902
    A Python package for extending the official PyTorch that can easily obtain performance on Intel platform

  22. faster-cpython/ideas ⭐ 1,720
    Discussion and work tracker for Faster CPython project.

  23. dask/distributed ⭐ 1,637
    A distributed task scheduler for Dask
    🔗 distributed.dask.org

  24. nschloe/perfplot ⭐ 1,376
    📈 Performance analysis for Python snippets

  25. intel/scikit-learn-intelex ⭐ 1,298
    Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application
    🔗 uxlfoundation.github.io/scikit-learn-intelex

  26. markshannon/faster-cpython ⭐ 955
    How to make CPython faster.

  27. zerointensity/pointers.py ⭐ 931
    Bringing the hell of pointers to Python.
    🔗 pointers.zintensity.dev

  28. brandtbucher/specialist ⭐ 660
    Visualize CPython's specializing, adaptive interpreter. 🔥

Profiling

Memory and CPU/GPU profiling tools and libraries.

  1. bloomberg/memray ⭐ 14,127
    Memray is a memory profiler for Python
    🔗 bloomberg.github.io/memray

  2. benfred/py-spy ⭐ 13,923
    Sampling profiler for Python programs

  3. plasma-umass/scalene ⭐ 12,779
    Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals

  4. joerick/pyinstrument ⭐ 7,199
    🚴 Call stack profiler for Python. Shows you why your code is slow!
    🔗 pyinstrument.readthedocs.io

  5. gaogaotiantian/viztracer ⭐ 6,797
    A debugging and profiling tool that can trace and visualize python code execution
    🔗 viztracer.readthedocs.io

  6. pythonprofilers/memory_profiler ⭐ 4,497
    Monitor Memory usage of Python code
    🔗 pypi.python.org/pypi/memory_profiler

  7. pyutils/line_profiler ⭐ 3,030
    Line-by-line profiling for Python

  8. reloadware/reloadium ⭐ 2,974
    Hot Reloading and Profiling for Python

  9. jiffyclub/snakeviz ⭐ 2,457
    An in-browser Python profile viewer
    🔗 jiffyclub.github.io/snakeviz

  10. p403n1x87/austin ⭐ 2,081
    Python frame stack sampler for CPython
    🔗 pypi.org/project/austin-dist

  11. pythonspeed/filprofiler ⭐ 877
    A Python memory profiler for data processing and scientific computing applications
    🔗 pythonspeed.com/products/filmemoryprofiler

Security

Security related libraries: vulnerability discovery, SQL injection, environment auditing.

  1. swisskyrepo/PayloadsAllTheThings ⭐ 68,239
    A list of useful payloads and bypass for Web Application Security and Pentest/CTF
    🔗 swisskyrepo.github.io/payloadsallthethings

  2. sqlmapproject/sqlmap ⭐ 34,721
    Automatic SQL injection and database takeover tool
    🔗 sqlmap.org

  3. certbot/certbot ⭐ 32,308
    Certbot is EFF's tool to obtain certs from Let's Encrypt and (optionally) auto-enable HTTPS on your server. It can also act as a client for any other CA that uses the ACME protocol.

  4. aquasecurity/trivy ⭐ 27,459
    Find vulnerabilities, misconfigurations, secrets, SBOM in containers, Kubernetes, code repositories, clouds and more
    🔗 trivy.dev

  5. bridgecrewio/checkov ⭐ 7,698
    Checkov is a static code analysis tool for infrastructure as code (IaC) and also a software composition analysis (SCA) tool for images and open source packages.
    🔗 www.checkov.io

  6. nccgroup/ScoutSuite ⭐ 7,228
    Multi-Cloud Security Auditing Tool

  7. pycqa/bandit ⭐ 7,154
    Bandit is a tool designed to find common security issues in Python code.
    🔗 bandit.readthedocs.io

  8. stamparm/maltrail ⭐ 7,072
    Malicious traffic detection system

  9. microsoft/presidio ⭐ 4,994
    Context aware, pluggable and customizable PII de-identification service for text and images
    🔗 microsoft.github.io/presidio

  10. rhinosecuritylabs/pacu ⭐ 4,779
    The AWS exploitation framework, designed for testing the security of Amazon Web Services environments.
    🔗 rhinosecuritylabs.com/aws/pacu-open-source-aws-exploitation-framework

  11. dashingsoft/pyarmor ⭐ 4,423
    A tool used to obfuscate python scripts, bind obfuscated scripts to fixed machine or expire obfuscated scripts.
    🔗 pyarmor.dashingsoft.com

  12. mozilla/bleach ⭐ 2,707
    Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes
    🔗 bleach.readthedocs.io/en/latest

  13. pyupio/safety ⭐ 1,859
    Safety checks Python dependencies for known security vulnerabilities and suggests the proper remediations for vulnerabilities detected.
    🔗 safetycli.com/product/safety-cli

  14. trailofbits/pip-audit ⭐ 1,069
    Audits Python environments, requirements files and dependency trees for known security vulnerabilities, and can automatically fix them
    🔗 pypi.org/project/pip-audit

  15. fadi002/de4py ⭐ 895
    toolkit for python reverse engineering
    🔗 de4py.rf.gd

  16. thecyb3ralpha/BobTheSmuggler ⭐ 547
    A tool that leverages HTML Smuggling Attack and allows you to create HTML files with embedded 7z/zip archives.

Simulation

Simulation libraries: robotics, economic, agent-based, traffic, physics, astronomy, chemistry, quantum simulation. Also see the Maths and Science category for crossover.

  1. genesis-embodied-ai/Genesis ⭐ 25,783
    Genesis is a physics platform, and generative data engine, designed for general purpose Robotics/Embodied AI/Physical AI applications
    🔗 genesis-world.readthedocs.io

  2. atsushisakai/PythonRobotics ⭐ 25,423
    Python sample codes and textbook for robotics algorithms.
    🔗 atsushisakai.github.io/pythonrobotics

  3. bulletphysics/bullet3 ⭐ 13,533
    Bullet Physics SDK: real-time collision detection and multi-physics simulation for VR, games, visual effects, robotics, machine learning etc.
    🔗 bulletphysics.org

  4. isl-org/Open3D ⭐ 12,529
    Open3D: A Modern Library for 3D Data Processing
    🔗 www.open3d.org

  5. dlr-rm/stable-baselines3 ⭐ 11,072
    Stable Baselines3 (SB3) is a set of reliable implementations of reinforcement learning algorithms in PyTorch
    🔗 stable-baselines3.readthedocs.io

  6. nvidia/Cosmos ⭐ 8,041
    NVIDIA Cosmos is a developer-first world foundation model platform designed to help Physical AI developers build their Physical AI systems better and faster.
    🔗 github.com/nvidia-cosmos

  7. qiskit/qiskit ⭐ 6,255
    Qiskit is an open-source SDK for working with quantum computers at the level of extended quantum circuits, operators, and primitives.
    🔗 www.ibm.com/quantum/qiskit

  8. nvidia/warp ⭐ 5,285
    A Python framework for accelerated simulation, data generation and spatial computing.
    🔗 nvidia.github.io/warp

  9. astropy/astropy ⭐ 4,744
    Astronomy and astrophysics core library
    🔗 www.astropy.org

  10. quantumlib/Cirq ⭐ 4,645
    An open-source Python framework for creating, editing, and invoking Noisy Intermediate-Scale Quantum (NISQ) circuits.
    🔗 quantumai.google/cirq

  11. nvidia-omniverse/IsaacLab ⭐ 4,331
    Unified framework for robot learning built on NVIDIA Isaac Sim
    🔗 isaac-sim.github.io/isaaclab

  12. chakazul/Lenia ⭐ 3,660
    Lenia is a 2D cellular automata with continuous space, time and states. It produces a huge variety of interesting methematical life forms
    🔗 chakazul.github.io/lenia/javascript/lenia.html

  13. projectmesa/mesa ⭐ 3,028
    Mesa is an open-source Python library for agent-based modeling, ideal for simulating complex systems and exploring emergent behaviors.
    🔗 mesa.readthedocs.io

  14. openai/mujoco-py ⭐ 3,024
    MuJoCo is a physics engine for detailed, efficient rigid body simulations with contacts. mujoco-py allows using MuJoCo from Python 3.

  15. rdkit/rdkit ⭐ 2,996
    The official sources for the RDKit library

  16. google/brax ⭐ 2,758
    Massively parallel rigidbody physics simulation on accelerator hardware.

  17. pennylaneai/pennylane ⭐ 2,726
    PennyLane is a cross-platform Python library for quantum computing, quantum machine learning, and quantum chemistry. Built by researchers, for research.
    🔗 pennylane.ai

  18. taichi-dev/difftaichi ⭐ 2,627
    10 differentiable physical simulators built with Taichi differentiable programming (DiffTaichi, ICLR 2020)

  19. nvidia-omniverse/IsaacGymEnvs ⭐ 2,529
    Example RL environments for the NVIDIA Isaac Gym high performance environments

  20. dlr-rm/rl-baselines3-zoo ⭐ 2,480
    A training framework for Stable Baselines3 reinforcement learning agents, with hyperparameter optimization and pre-trained agents included.
    🔗 rl-baselines3-zoo.readthedocs.io

  21. facebookresearch/habitat-lab ⭐ 2,452
    A modular high-level library to train embodied AI agents across a variety of tasks and environments.
    🔗 aihabitat.org

  22. quantecon/QuantEcon.py ⭐ 2,152
    A community based Python library for quantitative economics
    🔗 quantecon.org/quantecon-py

  23. microsoft/PromptCraft-Robotics ⭐ 2,029
    Community for applying LLMs to robotics and a robot simulator with ChatGPT integration
    🔗 aka.ms/chatgpt-robotics

  24. eloialonso/diamond ⭐ 1,833
    DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model
    🔗 diamond-wm.github.io

  25. deepmodeling/deepmd-kit ⭐ 1,703
    A deep learning package for many-body potential energy representation and molecular dynamics
    🔗 docs.deepmodeling.com/projects/deepmd

  26. bowang-lab/scGPT ⭐ 1,267
    scGPT: Towards Building a Foundation Model for Single-Cell Multi-omics Using Generative AI
    🔗 scgpt.readthedocs.io/en/latest

  27. sail-sg/envpool ⭐ 1,168
    C++-based high-performance parallel environment execution engine (vectorized env) for general RL environments.
    🔗 envpool.readthedocs.io

  28. a-r-j/graphein ⭐ 1,111
    Protein Graph Library
    🔗 graphein.ai

  29. altera-al/project-sid ⭐ 1,066
    This repository contains our technical report: "Project Sid: Many-agent simulations toward AI civilization"

  30. google-deepmind/materials_discovery ⭐ 1,015
    Graph Networks for Materials Science (GNoME) is a project centered around scaling machine learning methods to tackle materials science.

  31. viblo/pymunk ⭐ 994
    Pymunk is a easy-to-use pythonic 2d physics library that can be used whenever you need 2d rigid body physics from Python
    🔗 www.pymunk.org

  32. nvidia-omniverse/OmniIsaacGymEnvs ⭐ 979
    Reinforcement Learning Environments for Omniverse Isaac Gym

  33. polymathicai/the_well ⭐ 926
    15TB of Physics Simulations: collection of machine learning datasets containing numerical simulations of a wide variety of spatiotemporal physical systems.
    🔗 polymathic-ai.org/the_well

  34. google/evojax ⭐ 901
    EvoJAX is a scalable, general purpose, hardware-accelerated neuroevolution toolkit built on the JAX library

  35. eureka-research/DrEureka ⭐ 899
    Official Repository for "DrEureka: Language Model Guided Sim-To-Real Transfer" (RSS 2024)
    🔗 eureka-research.github.io/dr-eureka

  36. facebookresearch/fairo ⭐ 887
    A modular embodied agent architecture and platform for building embodied agents

  37. ur-whitelab/chemcrow-public ⭐ 781
    Chemcrow

  38. ur-whitelab/chemcrow-runs ⭐ 88
    ur-whitelab/chemcrow-runs

Study

Miscellaneous study resources: algorithms, general resources, system design, code repos for textbooks, best practices, tutorials.

  1. thealgorithms/Python ⭐ 202,741
    All Algorithms implemented in Python
    🔗 thealgorithms.github.io/python

  2. microsoft/generative-ai-for-beginners ⭐ 91,678
    Learn the fundamentals of building Generative AI applications with our 21-lesson comprehensive course by Microsoft Cloud Advocates.
    🔗 microsoft.github.io/generative-ai-for-beginners

  3. labmlai/annotated_deep_learning_paper_implementations ⭐ 61,780
    🧑‍🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
    🔗 nn.labml.ai

  4. rasbt/LLMs-from-scratch ⭐ 58,705
    Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
    🔗 amzn.to/4fqvn0d

  5. mlabonne/llm-course ⭐ 57,201
    Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
    🔗 mlabonne.github.io/blog

  6. jakevdp/PythonDataScienceHandbook ⭐ 44,946
    Python Data Science Handbook: full text in Jupyter Notebooks
    🔗 jakevdp.github.io/pythondatasciencehandbook

  7. realpython/python-guide ⭐ 29,035
    Python best practices guidebook, written for humans.
    🔗 docs.python-guide.org

  8. d2l-ai/d2l-en ⭐ 26,260
    Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
    🔗 d2l.ai

  9. christoschristofidis/awesome-deep-learning ⭐ 25,643
    A curated list of awesome Deep Learning tutorials, projects and communities.

  10. hannibal046/Awesome-LLM ⭐ 24,199
    Awesome-LLM: a curated list of Large Language Model

  11. wesm/pydata-book ⭐ 23,380
    Materials and IPython notebooks for "Python for Data Analysis" by Wes McKinney, published by O'Reilly Media

  12. huggingface/agents-course ⭐ 20,683
    This repository contains the Hugging Face Agents Course.

  13. microsoft/recommenders ⭐ 20,466
    Best Practices on Recommendation Systems
    🔗 recommenders-team.github.io/recommenders/intro.html

  14. fchollet/deep-learning-with-python-notebooks ⭐ 19,315
    Jupyter notebooks for the code samples of the book "Deep Learning with Python"

  15. naklecha/llama3-from-scratch ⭐ 15,042
    llama3 implementation one matrix multiplication at a time

  16. graykode/nlp-tutorial ⭐ 14,663
    Natural Language Processing Tutorial for Deep Learning Researchers
    🔗 www.reddit.com/r/machinelearning/comments/amfinl/project_nlptutoral_repository_who_is_studying

  17. karpathy/nn-zero-to-hero ⭐ 14,455
    Neural Networks: Zero to Hero

  18. mrdbourke/pytorch-deep-learning ⭐ 14,330
    Materials for the Learn PyTorch for Deep Learning: Zero to Mastery course.
    🔗 learnpytorch.io

  19. shangtongzhang/reinforcement-learning-an-introduction ⭐ 14,183
    Python Implementation of Reinforcement Learning: An Introduction

  20. zhanymkanov/fastapi-best-practices ⭐ 12,458
    FastAPI Best Practices and Conventions we used at our startup

  21. karpathy/micrograd ⭐ 12,289
    A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API

  22. eugeneyan/open-llms ⭐ 12,173
    📋 A list of open LLMs available for commercial use.

  23. handsonllm/Hands-On-Large-Language-Models ⭐ 12,017
    Official code repo for the O'Reilly Book - "Hands-On Large Language Models"
    🔗 www.llm-book.com

  24. rucaibox/LLMSurvey ⭐ 11,660
    The official GitHub page for the survey paper "A Survey of Large Language Models".
    🔗 arxiv.org/abs/2303.18223

  25. srush/GPU-Puzzles ⭐ 11,252
    Teaching beginner GPU programming in a completely interactive fashion

  26. nielsrogge/Transformers-Tutorials ⭐ 11,055
    This repository contains demos I made with the Transformers library by HuggingFace.

  27. openai/spinningup ⭐ 11,037
    An educational resource to help anyone learn deep reinforcement learning.
    🔗 spinningup.openai.com

  28. mooler0410/LLMsPracticalGuide ⭐ 9,975
    A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers)
    🔗 arxiv.org/abs/2304.13712v2

  29. roboflow/notebooks ⭐ 7,933
    A collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-edge models like YOLO11, RT-DETR, SAM 2, Florence-2, PaliGemma 2, and Qwen2.5VL.
    🔗 roboflow.com/models

  30. udlbook/udlbook ⭐ 7,633
    Understanding Deep Learning - Simon J.D. Prince

  31. firmai/industry-machine-learning ⭐ 7,368
    A curated list of applied machine learning and data science notebooks and libraries across different industries (by @firmai)
    🔗 www.sov.ai

  32. gkamradt/langchain-tutorials ⭐ 7,149
    Overview and tutorial of the LangChain Library

  33. alirezadir/Machine-Learning-Interviews ⭐ 6,541
    This repo is meant to serve as a guide for Machine Learning/AI technical interviews.

  34. neetcode-gh/leetcode ⭐ 6,057
    Leetcode solutions for NeetCode.io

  35. huggingface/smol-course ⭐ 6,009
    a practical course on aligning language models for your specific use case. It's a handy way to get started with aligning language models, because everything runs on most local machines.

  36. mrdbourke/tensorflow-deep-learning ⭐ 5,643
    All course materials for the Zero to Mastery Deep Learning with TensorFlow course.
    🔗 dbourke.link/ztmtfcourse

  37. udacity/deep-learning-v2-pytorch ⭐ 5,404
    Projects and exercises for the latest Deep Learning ND program https://www.udacity.com/course/deep-learning-nanodegree--nd101

  38. chiphuyen/aie-book ⭐ 4,970
    Code for AI Engineering: Building Applications with Foundation Models (Chip Huyen 2025)

  39. timofurrer/awesome-asyncio ⭐ 4,838
    A curated list of awesome Python asyncio frameworks, libraries, software and resources

  40. promptslab/Awesome-Prompt-Engineering ⭐ 4,681
    This repository contains a hand-curated resources for Prompt Engineering with a focus on Generative Pre-trained Transformer (GPT), ChatGPT, PaLM etc
    🔗 discord.gg/m88xfymbk6

  41. huggingface/deep-rl-class ⭐ 4,410
    This repo contains the Hugging Face Deep Reinforcement Learning Course.

  42. rasbt/machine-learning-book ⭐ 4,401
    Code Repository for Machine Learning with PyTorch and Scikit-Learn
    🔗 sebastianraschka.com/books/#machine-learning-with-pytorch-and-scikit-learn

  43. zotroneneis/machine_learning_basics ⭐ 4,386
    Plain python implementations of basic machine learning algorithms

  44. huggingface/diffusion-models-class ⭐ 4,079
    Materials for the Hugging Face Diffusion Models Course

  45. amanchadha/coursera-deep-learning-specialization ⭐ 3,813
    Notes, programming assignments and quizzes from all courses within the Coursera Deep Learning specialization offered by deeplearning.ai: (i) Neural Networks and Deep Learning; (ii) Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization; (iii) Structuring Machine Learning Projects; (iv...

  46. fluentpython/example-code-2e ⭐ 3,659
    Example code for Fluent Python, 2nd edition (O'Reilly 2022)
    🔗 amzn.to/3j48u2j

  47. cosmicpython/book ⭐ 3,585
    A Book about Pythonic Application Architecture Patterns for Managing Complexity. Cosmos is the Opposite of Chaos you see. O'R. wouldn't actually let us call it "Cosmic Python" tho.
    🔗 www.cosmicpython.com

  48. mrdbourke/zero-to-mastery-ml ⭐ 3,337
    All course materials for the Zero to Mastery Machine Learning and Data Science course.
    🔗 dbourke.link/ztmmlcourse

  49. krzjoa/awesome-python-data-science ⭐ 2,950
    Probably the best curated list of data science software in Python.
    🔗 krzjoa.github.io/awesome-python-data-science

  50. gerdm/prml ⭐ 2,366
    Repository of notes, code and notebooks in Python for the book Pattern Recognition and Machine Learning by Christopher Bishop

  51. cgpotts/cs224u ⭐ 2,151
    Code for CS224u: Natural Language Understanding

  52. huggingface/cookbook ⭐ 2,144
    Community-driven practical examples of building AI applications and solving various tasks with AI using open-source tools and models.
    🔗 huggingface.co/learn/cookbook

  53. cerlymarco/MEDIUM_NoteBook ⭐ 2,117
    Repository containing notebooks of my posts on Medium

  54. trananhkma/fucking-awesome-python ⭐ 2,006
    awesome-python with :octocat: ⭐ and 🍴

  55. aburkov/theLMbook ⭐ 1,835
    Code for Hundred-Page Language Models Book by Andriy Burkov
    🔗 www.thelmbook.com

  56. chandlerbang/awesome-self-supervised-gnn ⭐ 1,681
    Papers about pretraining and self-supervised learning on Graph Neural Networks (GNN).

  57. atcold/NYU-DLSP21 ⭐ 1,625
    NYU Deep Learning Spring 2021
    🔗 atcold.github.io/nyu-dlsp21

  58. patrickloeber/MLfromscratch ⭐ 1,492
    Machine Learning algorithm implementations from scratch.

  59. huggingface/evaluation-guidebook ⭐ 1,461
    Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!

  60. davidadsp/Generative_Deep_Learning_2nd_Edition ⭐ 1,308
    The official code repository for the second edition of the O'Reilly book Generative Deep Learning: Teaching Machines to Paint, Write, Compose and Play.
    🔗 www.oreilly.com/library/view/generative-deep-learning/9781098134174

  61. rasbt/LLM-workshop-2024 ⭐ 976
    A 4-hour coding workshop to understand how LLMs are implemented and used

  62. jackhidary/quantumcomputingbook ⭐ 864
    Companion site for the textbook Quantum Computing: An Applied Approach

  63. rasbt/MachineLearning-QandAI-book ⭐ 579
    Machine Learning Q and AI book
    🔗 www.amazon.com/machine-learning-ai-essential-questions/dp/1718503768

  64. bayesianmodelingandcomputationinpython/BookCode_Edition1 ⭐ 534
    Bayesian Modeling and Computation in Python: open-access version of the text and the code examples in the book
    🔗 www.bayesiancomputationbook.com

  65. rwitten/HighPerfLLMs2024 ⭐ 512
    Build a full scale, high-performance LLM from scratch in Jax! We cover training and inference, roofline analysis, compilation, sharding, profiling and more.

  66. dylanhogg/awesome-python ⭐ 392
    🐍 Hand-picked awesome Python libraries and frameworks, organised by category
    🔗 www.awesomepython.org

Template

Template tools and libraries: cookiecutter repos, generators, quick-starts.

  1. tiangolo/full-stack-fastapi-template ⭐ 34,474
    Full stack, modern web application template. Using FastAPI, React, SQLModel, PostgreSQL, Docker, GitHub Actions, automatic HTTPS and more.

  2. cookiecutter/cookiecutter ⭐ 23,785
    A cross-platform command-line utility that creates projects from cookiecutters (project templates), e.g. Python package projects, C projects.
    🔗 pypi.org/project/cookiecutter

  3. drivendata/cookiecutter-data-science ⭐ 9,087
    A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.
    🔗 cookiecutter-data-science.drivendata.org

  4. buuntu/fastapi-react ⭐ 2,405
    🚀 Cookiecutter Template for FastAPI + React Projects. Using PostgreSQL, SQLAlchemy, and Docker

  5. pyscaffold/pyscaffold ⭐ 2,213
    🛠 Python project template generator with batteries included
    🔗 pyscaffold.org

  6. cjolowicz/cookiecutter-hypermodern-python ⭐ 1,867
    Cookiecutter template for a Python package based on the Hypermodern Python article series.
    🔗 cookiecutter-hypermodern-python.readthedocs.io

  7. fmind/mlops-python-package ⭐ 1,312
    Best practices designed to support your MLOPs initiatives. You can use this package as part of your MLOps toolkit or platform e.g. Model Registry, Experiment Tracking, Realtime Inference
    🔗 fmind.github.io/mlops-python-package

  8. tezromach/python-package-template ⭐ 1,096
    🚀 Your next Python package needs a bleeding-edge project structure.

  9. martinheinz/python-project-blueprint ⭐ 969
    Blueprint/Boilerplate For Python Projects

  10. fpgmaas/cookiecutter-uv ⭐ 925
    A modern cookiecutter template for Python projects that use uv for dependency management
    🔗 fpgmaas.github.io/cookiecutter-uv

  11. callmesora/llmops-python-package ⭐ 876
    Best practices designed to support your LLMOps initiatives. You can use this package as part of your LLMOps toolkit or platform e.g. Model Registry, Experiment Tracking, Realtime Inference

Terminal

Terminal and console tools and libraries: CLI tools, terminal based formatters, progress bars.

  1. willmcgugan/rich ⭐ 52,790
    Rich is a Python library for rich text and beautiful formatting in the terminal.
    🔗 rich.readthedocs.io/en/latest

  2. aider-ai/aider ⭐ 35,358
    Aider lets you pair program with LLMs, to edit code in your local git repository
    🔗 aider.chat

  3. tqdm/tqdm ⭐ 30,078
    ⚡ A Fast, Extensible Progress Bar for Python and CLI
    🔗 tqdm.github.io

  4. willmcgugan/textual ⭐ 29,547
    The lean application framework for Python. Build sophisticated user interfaces with a simple Python API. Run your apps in the terminal and a web browser.
    🔗 textual.textualize.io

  5. google/python-fire ⭐ 27,747
    Python Fire is a library for automatically generating command line interfaces (CLIs) from absolutely any Python object.

  6. anthropics/claude-code ⭐ 18,779
    Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows
    🔗 docs.anthropic.com/s/claude-code

  7. tiangolo/typer ⭐ 17,436
    Typer, build great CLIs. Easy to code. Based on Python type hints.
    🔗 typer.tiangolo.com

  8. pallets/click ⭐ 16,612
    Python composable command line interface toolkit
    🔗 click.palletsprojects.com

  9. prompt-toolkit/python-prompt-toolkit ⭐ 9,814
    Library for building powerful interactive command line applications in Python
    🔗 python-prompt-toolkit.readthedocs.io

  10. simonw/llm ⭐ 8,890
    A CLI utility and Python library for interacting with Large Language Models, both via remote APIs and models that can be installed and run on your own machine.
    🔗 llm.datasette.io

  11. saulpw/visidata ⭐ 8,346
    A terminal spreadsheet multitool for discovering and arranging data
    🔗 visidata.org

  12. xxh/xxh ⭐ 5,667
    🚀 Bring your favorite shell wherever you go through the ssh. Xonsh shell, fish, zsh, osquery and so on.

  13. tconbeer/harlequin ⭐ 4,736
    The SQL IDE for Your Terminal.
    🔗 harlequin.sh

  14. manrajgrover/halo ⭐ 2,958
    💫 Beautiful spinners for terminal, IPython and Jupyter

  15. urwid/urwid ⭐ 2,922
    Console user interface library for Python (official repo)
    🔗 urwid.org

  16. textualize/trogon ⭐ 2,667
    Easily turn your Click CLI into a powerful terminal application

  17. darrenburns/elia ⭐ 2,215
    A snappy, keyboard-centric terminal user interface for interacting with large language models. Chat with ChatGPT, Claude, Llama 3, Phi 3, Mistral, Gemma and more.

  18. tmbo/questionary ⭐ 1,804
    Python library to build pretty command line user prompts ✨Easy to use multi-select lists, confirmations, free text prompts ...

  19. jazzband/prettytable ⭐ 1,517
    Display tabular data in a visually appealing ASCII table format
    🔗 pypi.org/project/prettytable

  20. shobrook/wut ⭐ 1,362
    Just type wut and an LLM will help you understand whatever's in your terminal. You'll be surprised how useful this can be.

  21. 1j01/textual-paint ⭐ 1,026
    🎨 MS Paint in your terminal.
    🔗 pypi.org/project/textual-paint

Testing

Testing libraries: unit testing, load testing, acceptance testing, code coverage, browser automation, plugins.

  1. mitmproxy/mitmproxy ⭐ 39,785
    An interactive TLS-capable intercepting HTTP proxy for penetration testers and software developers.
    🔗 mitmproxy.org

  2. locustio/locust ⭐ 26,445
    Write scalable load tests in plain Python 🚗💨
    🔗 locust.cloud

  3. microsoft/playwright-python ⭐ 13,360
    Python version of the Playwright testing and automation library.
    🔗 playwright.dev/python

  4. pytest-dev/pytest ⭐ 12,864
    The pytest framework makes it easy to write small tests, yet scales to support complex functional testing
    🔗 pytest.org

  5. seleniumbase/SeleniumBase ⭐ 11,337
    Python APIs for web automation, testing, and bypassing bot-detection.
    🔗 seleniumbase.io

  6. robotframework/robotframework ⭐ 10,849
    Generic automation framework for acceptance testing and RPA
    🔗 robotframework.org

  7. confident-ai/deepeval ⭐ 9,115
    LLM evaluation framework similar to Pytest but specialized for unit testing LLM outputs. DeepEval incorporates the latest research to evaluate LLM outputs based on metrics such as G-Eval, hallucination, answer relevancy, RAGAS, etc
    🔗 deepeval.com

  8. getmoto/moto ⭐ 7,956
    A library that allows you to easily mock out tests based on AWS infrastructure.
    🔗 docs.getmoto.org/en/latest

  9. hypothesisworks/hypothesis ⭐ 7,914
    The property-based testing library for Python
    🔗 hypothesis.works

  10. newsapps/beeswithmachineguns ⭐ 6,592
    A utility for arming (creating) many bees (micro EC2 instances) to attack (load test) targets (web applications).
    🔗 apps.chicagotribune.com

  11. codium-ai/qodo-cover ⭐ 5,103
    Qodo-Cover: An AI-Powered Tool for Automated Test Generation and Code Coverage Enhancement! 💻🤖🧪🐞
    🔗 qodo.ai

  12. spulec/freezegun ⭐ 4,369
    Let your Python tests travel through time

  13. getsentry/responses ⭐ 4,276
    A utility for mocking out the Python Requests library.

  14. tox-dev/tox ⭐ 3,816
    Command line driven CI frontend and development task automation tool.
    🔗 tox.wiki

  15. behave/behave ⭐ 3,324
    BDD, Python style.
    🔗 behave.readthedocs.io/en/latest

  16. nedbat/coveragepy ⭐ 3,191
    The code coverage tool for Python
    🔗 coverage.readthedocs.io

  17. kevin1024/vcrpy ⭐ 2,814
    Automatically mock your HTTP interactions to simplify and speed up testing

  18. cobrateam/splinter ⭐ 2,754
    splinter - python test framework for web applications
    🔗 splinter.readthedocs.org/en/stable/index.html

  19. pytest-dev/pytest-testinfra ⭐ 2,428
    With Testinfra you can write unit tests in Python to test actual state of your servers configured by management tools like Salt, Ansible, Puppet, Chef and so on.
    🔗 testinfra.readthedocs.io

  20. pytest-dev/pytest-mock ⭐ 1,965
    Thin-wrapper around the mock package for easier use with pytest
    🔗 pytest-mock.readthedocs.io/en/latest

  21. pytest-dev/pytest-cov ⭐ 1,910
    Coverage plugin for pytest.

  22. pytest-dev/pytest-xdist ⭐ 1,649
    pytest plugin for distributed testing and loop-on-failures testing modes.
    🔗 pytest-xdist.readthedocs.io

  23. pytest-dev/pytest-asyncio ⭐ 1,534
    Asyncio support for pytest
    🔗 pytest-asyncio.readthedocs.io

  24. taverntesting/tavern ⭐ 1,094
    A command-line tool and Python library and Pytest plugin for automated testing of RESTful APIs, with a simple, concise and flexible YAML-based syntax
    🔗 taverntesting.github.io

Machine Learning - Time Series

Machine learning and classical timeseries libraries: forecasting, seasonality, anomaly detection, econometrics.

  1. facebook/prophet ⭐ 19,390
    Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
    🔗 facebook.github.io/prophet

  2. sktime/sktime ⭐ 9,146
    A unified framework for machine learning with time series
    🔗 www.sktime.net

  3. blue-yonder/tsfresh ⭐ 8,841
    Automatic extraction of relevant features from time series:
    🔗 tsfresh.readthedocs.io

  4. unit8co/darts ⭐ 8,742
    A python library for user-friendly forecasting and anomaly detection on time series.
    🔗 unit8co.github.io/darts

  5. facebookresearch/Kats ⭐ 6,050
    Kats, a kit to analyze time series data, a lightweight, easy-to-use, generalizable, and extendable framework to perform time series analysis, from understanding the key statistics and characteristics, detecting change points and anomalies, to forecasting future trends.

  6. awslabs/gluonts ⭐ 4,942
    Probabilistic time series modeling in Python
    🔗 ts.gluon.ai

  7. google-research/timesfm ⭐ 4,879
    TimesFM (Time Series Foundation Model) is a pretrained time-series foundation model developed by Google Research for time-series forecasting.
    🔗 research.google/blog/a-decoder-only-foundation-model-for-time-series-forecasting

  8. nixtla/statsforecast ⭐ 4,440
    Lightning ⚡️ fast forecasting with statistical and econometric models.
    🔗 nixtlaverse.nixtla.io/statsforecast

  9. salesforce/Merlion ⭐ 4,334
    Merlion: A Machine Learning Framework for Time Series Intelligence

  10. tdameritrade/stumpy ⭐ 3,948
    STUMPY is a powerful and scalable Python library for modern time series analysis
    🔗 stumpy.readthedocs.io/en/latest

  11. amazon-science/chronos-forecasting ⭐ 3,424
    Chronos: Pretrained Models for Probabilistic Time Series Forecasting
    🔗 arxiv.org/abs/2403.07815

  12. aistream-peelout/flow-forecast ⭐ 2,206
    Deep learning PyTorch library for time series forecasting, classification, and anomaly detection (originally for flood forecasting).
    🔗 flow-forecast.atlassian.net/wiki/spaces/ff/overview

  13. rjt1990/pyflux ⭐ 2,129
    Open source time series library for Python

  14. yuqinie98/PatchTST ⭐ 2,054
    An offical implementation of PatchTST: A Time Series is Worth 64 Words: Long-term Forecasting with Transformers

  15. uber/orbit ⭐ 1,983
    A Python package for Bayesian forecasting with object-oriented design and probabilistic models under the hood.
    🔗 orbit-ml.readthedocs.io/en/stable

  16. alkaline-ml/pmdarima ⭐ 1,656
    A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.
    🔗 www.alkaline-ml.com/pmdarima

  17. time-series-foundation-models/lag-llama ⭐ 1,462
    Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting

  18. winedarksea/AutoTS ⭐ 1,300
    Automated Time Series Forecasting

  19. autoviml/Auto_TS ⭐ 759
    Automatically build ARIMA, SARIMAX, VAR, FB Prophet and XGBoost Models on Time Series data sets with a Single Line of Code. Created by Ram Seshadri. Collaborators welcome.

  20. google/temporian ⭐ 695
    Temporian is an open-source Python library for preprocessing ⚡ and feature engineering 🛠 temporal data 📈 for machine learning applications 🤖
    🔗 temporian.readthedocs.io

Typing

Typing libraries: static and run-time type checking, annotations.

  1. python/mypy ⭐ 19,495
    Optional static typing for Python
    🔗 www.mypy-lang.org

  2. microsoft/pyright ⭐ 14,527
    Static Type Checker for Python

  3. facebook/pyre-check ⭐ 7,056
    Performant type-checking for python.
    🔗 pyre-check.org

  4. python-attrs/attrs ⭐ 5,537
    Python Classes Without Boilerplate
    🔗 www.attrs.org

  5. instagram/MonkeyType ⭐ 4,909
    A Python library that generates static type annotations by collecting runtime types

  6. google/pytype ⭐ 4,903
    A static type analyzer for Python code
    🔗 google.github.io/pytype

  7. python/typeshed ⭐ 4,731
    Collection of library stubs for Python, with static types

  8. koxudaxi/datamodel-code-generator ⭐ 3,313
    Pydantic model and dataclasses.dataclass generator for easy conversion of JSON, OpenAPI, JSON Schema, and YAML data sources.
    🔗 koxudaxi.github.io/datamodel-code-generator

  9. facebook/pyrefly ⭐ 3,219
    A fast type checker and IDE for Python. (A new version of Pyre)
    🔗 pyrefly.org

  10. mtshiba/pylyzer ⭐ 2,852
    A fast, feature-rich static code analyzer & language server for Python
    🔗 mtshiba.github.io/pylyzer

  11. microsoft/pylance-release ⭐ 1,848
    Fast, feature-rich language support for Python. Documentation and issues for Pylance.

  12. agronholm/typeguard ⭐ 1,677
    Run-time type checker for Python

  13. patrick-kidger/torchtyping ⭐ 1,433
    Type annotations and dynamic checking for a tensor's shape, dtype, names, etc.

  14. python/typing_extensions ⭐ 507
    Backported and experimental type hints for Python

  15. robertcraigie/pyright-python ⭐ 227
    Python command line wrapper for pyright, a static type checker
    🔗 pypi.org/project/pyright

Utility

General utility libraries: miscellaneous tools, linters, code formatters, version management, package tools, documentation tools.

  1. yt-dlp/yt-dlp ⭐ 118,448
    A feature-rich command-line audio/video downloader
    🔗 discord.gg/h5mncfw63r

  2. home-assistant/core ⭐ 80,092
    🏡 Open source home automation that puts local control and privacy first.
    🔗 www.home-assistant.io

  3. abi/screenshot-to-code ⭐ 70,353
    Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)
    🔗 screenshottocode.com

  4. python/cpython ⭐ 67,805
    The Python programming language
    🔗 www.python.org

  5. localstack/localstack ⭐ 59,554
    💻 A fully functional local AWS cloud stack. Develop and test your cloud & Serverless apps offline
    🔗 localstack.cloud

  6. faif/python-patterns ⭐ 41,641
    A collection of design patterns/idioms in Python

  7. ggerganov/whisper.cpp ⭐ 41,445
    Port of OpenAI's Whisper model in C/C++

  8. mingrammer/diagrams ⭐ 41,149
    🎨 Diagram as Code for prototyping cloud system architectures
    🔗 diagrams.mingrammer.com

  9. openai/openai-python ⭐ 27,329
    The official Python library for the OpenAI API
    🔗 pypi.org/project/openai

  10. keon/algorithms ⭐ 24,623
    Minimal examples of data structures and algorithms in Python

  11. pydantic/pydantic ⭐ 24,444
    Data validation using Python type hints
    🔗 docs.pydantic.dev

  12. norvig/pytudes ⭐ 23,920
    Python programs, usually short, of considerable difficulty, to perfect particular skills.

  13. squidfunk/mkdocs-material ⭐ 23,860
    Documentation that simply works
    🔗 squidfunk.github.io/mkdocs-material

  14. blakeblackshear/frigate ⭐ 23,778
    NVR with realtime local object detection for IP cameras
    🔗 frigate.video

  15. facebookresearch/audiocraft ⭐ 22,274
    Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

  16. delgan/loguru ⭐ 22,150
    Python logging made (stupidly) simple

  17. chriskiehl/Gooey ⭐ 21,338
    Turn (almost) any Python command line program into a full GUI application with one line

  18. mkdocs/mkdocs ⭐ 20,730
    Project documentation with Markdown.
    🔗 www.mkdocs.org

  19. micropython/micropython ⭐ 20,585
    MicroPython - a lean and efficient Python implementation for microcontrollers and constrained systems
    🔗 micropython.org

  20. rustpython/RustPython ⭐ 20,278
    A Python Interpreter written in Rust
    🔗 rustpython.github.io

  21. higherorderco/Bend ⭐ 18,862
    A massively parallel, high-level programming language
    🔗 higherorderco.com

  22. kivy/kivy ⭐ 18,464
    Open source UI framework written in Python, running on Windows, Linux, macOS, Android and iOS
    🔗 kivy.org

  23. ipython/ipython ⭐ 16,510
    Official repository for IPython itself. Other repos in the IPython organization contain things like the website, documentation builds, etc.
    🔗 ipython.readthedocs.org

  24. alievk/avatarify-python ⭐ 16,478
    Avatars for Zoom, Skype and other video-conferencing apps.

  25. openai/triton ⭐ 16,114
    Development repository for the Triton language and compiler
    🔗 triton-lang.org

  26. google/brotli ⭐ 14,174
    Brotli is a generic-purpose lossless compression algorithm that compresses data using a combination of a modern variant of the LZ77 algorithm, Huffman coding and 2nd order context modeling

  27. pyo3/pyo3 ⭐ 13,972
    Rust bindings for the Python interpreter
    🔗 pyo3.rs

  28. caronc/apprise ⭐ 13,956
    Apprise - Push Notifications that work with just about every platform!
    🔗 hub.docker.com/r/caronc/apprise

  29. zulko/moviepy ⭐ 13,673
    Video editing with Python
    🔗 zulko.github.io/moviepy

  30. nuitka/Nuitka ⭐ 13,462
    Nuitka is a Python compiler written in Python. It's fully compatible with Python 2.6, 2.7, 3.4-3.13. You feed it your Python app, it does a lot of clever things, and spits out an executable or extension module.
    🔗 nuitka.net

  31. pyodide/pyodide ⭐ 13,434
    Pyodide is a Python distribution for the browser and Node.js based on WebAssembly
    🔗 pyodide.org/en/stable

  32. python-pillow/Pillow ⭐ 12,902
    The Python Imaging Library adds image processing capabilities to Python (Pillow is the friendly PIL fork)
    🔗 python-pillow.github.io

  33. pytube/pytube ⭐ 12,840
    A lightweight, dependency-free Python library (and command-line utility) for downloading YouTube Videos.
    🔗 pytube.io

  34. dbader/schedule ⭐ 12,114
    Python job scheduling for humans.
    🔗 schedule.readthedocs.io

  35. ninja-build/ninja ⭐ 12,089
    Ninja is a small build system with a focus on speed.
    🔗 ninja-build.org

  36. asweigart/pyautogui ⭐ 11,614
    A cross-platform GUI automation Python module for human beings. Used to programmatically control the mouse & keyboard.

  37. secdev/scapy ⭐ 11,548
    Scapy: the Python-based interactive packet manipulation program & library.
    🔗 scapy.net

  38. magicstack/uvloop ⭐ 11,030
    Ultra fast asyncio event loop.

  39. comet-ml/opik ⭐ 11,010
    Opik is an open-source platform for evaluating, testing and monitoring LLM applications.
    🔗 www.comet.com/docs/opik

  40. pallets/jinja ⭐ 10,981
    A very fast and expressive template engine.
    🔗 jinja.palletsprojects.com

  41. aristocratos/bpytop ⭐ 10,685
    Linux/OSX/FreeBSD resource monitor

  42. cython/cython ⭐ 10,126
    The most widely used Python to C compiler
    🔗 cython.org

  43. facebookresearch/hydra ⭐ 9,483
    Hydra is a framework for elegantly configuring complex applications
    🔗 hydra.cc

  44. aws/serverless-application-model ⭐ 9,461
    The AWS Serverless Application Model (AWS SAM) transform is a AWS CloudFormation macro that transforms SAM templates into CloudFormation templates.
    🔗 aws.amazon.com/serverless/sam

  45. paramiko/paramiko ⭐ 9,432
    The leading native Python SSHv2 protocol library.
    🔗 paramiko.org

  46. boto/boto3 ⭐ 9,429
    AWS SDK for Python
    🔗 aws.amazon.com/sdk-for-python

  47. py-pdf/pypdf ⭐ 9,213
    A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
    🔗 pypdf.readthedocs.io/en/latest

  48. arrow-py/arrow ⭐ 8,879
    🏹 Better dates & times for Python
    🔗 arrow.readthedocs.io

  49. xonsh/xonsh ⭐ 8,865
    🐚 Python-powered shell. Full-featured and cross-platform.
    🔗 xon.sh

  50. eternnoir/pyTelegramBotAPI ⭐ 8,490
    Python Telegram bot api.

  51. icloud-photos-downloader/icloud_photos_downloader ⭐ 8,440
    A command-line tool to download photos from iCloud

  52. jasonppy/VoiceCraft ⭐ 8,315
    Zero-Shot Speech Editing and Text-to-Speech in the Wild

  53. googleapis/google-api-python-client ⭐ 8,315
    🐍 The official Python client library for Google's discovery based APIs.
    🔗 googleapis.github.io/google-api-python-client/docs

  54. kellyjonbrazil/jc ⭐ 8,265
    CLI tool and python library that converts the output of popular command-line tools, file-types, and common strings to JSON, YAML, or Dictionaries. This allows piping of output to tools like jq and simplifying automation scripts.

  55. theskumar/python-dotenv ⭐ 8,239
    Reads key-value pairs from a .env file and can set them as environment variables. It helps in developing applications following the 12-factor principles.
    🔗 saurabh-kumar.com/python-dotenv

  56. googlecloudplatform/python-docs-samples ⭐ 7,760
    Code samples used on cloud.google.com

  57. jd/tenacity ⭐ 7,641
    Retrying library for Python
    🔗 tenacity.readthedocs.io

  58. google/latexify_py ⭐ 7,514
    A library to generate LaTeX expression from Python code.

  59. pygithub/PyGithub ⭐ 7,424
    Typed interactions with the GitHub API v3
    🔗 pygithub.readthedocs.io

  60. bndr/pipreqs ⭐ 7,270
    pipreqs - Generate pip requirements.txt file based on imports of any project. Looking for maintainers to move this project forward.

  61. timdettmers/bitsandbytes ⭐ 7,212
    Accessible large language models via k-bit quantization for PyTorch.
    🔗 huggingface.co/docs/bitsandbytes/main/en/index

  62. sphinx-doc/sphinx ⭐ 7,211
    The Sphinx documentation generator
    🔗 www.sphinx-doc.org

  63. marshmallow-code/marshmallow ⭐ 7,162
    A lightweight library for converting complex objects to and from simple Python datatypes.
    🔗 marshmallow.readthedocs.io

  64. pyca/cryptography ⭐ 7,130
    cryptography is a package designed to expose cryptographic primitives and recipes to Python developers.
    🔗 cryptography.io

  65. ijl/orjson ⭐ 7,130
    Fast, correct Python JSON library supporting dataclasses, datetimes, and numpy

  66. gorakhargosh/watchdog ⭐ 6,983
    Python library and shell utilities to monitor filesystem events.
    🔗 packages.python.org/watchdog

  67. hugapi/hug ⭐ 6,889
    Embrace the APIs of the future. Hug aims to make developing APIs as simple as possible, but no simpler.

  68. agronholm/apscheduler ⭐ 6,834
    Task scheduling library for Python

  69. openai/point-e ⭐ 6,751
    Point cloud diffusion for 3D model synthesis

  70. pdfminer/pdfminer.six ⭐ 6,584
    Community maintained fork of pdfminer - we fathom PDF
    🔗 pdfminersix.readthedocs.io

  71. sdispater/pendulum ⭐ 6,488
    Python datetimes made easy
    🔗 pendulum.eustace.io

  72. scikit-image/scikit-image ⭐ 6,285
    Image processing in Python
    🔗 scikit-image.org

  73. wireservice/csvkit ⭐ 6,216
    A suite of utilities for converting to and working with CSV, the king of tabular file formats.
    🔗 csvkit.readthedocs.io

  74. pytransitions/transitions ⭐ 6,125
    A lightweight, object-oriented finite state machine implementation in Python with many extensions

  75. traceloop/openllmetry ⭐ 6,060
    Open-source observability for your LLM application, based on OpenTelemetry
    🔗 www.traceloop.com/openllmetry

  76. rsalmei/alive-progress ⭐ 5,935
    A new kind of Progress Bar, with real-time throughput, ETA, and very cool animations!

  77. spotify/pedalboard ⭐ 5,602
    🎛 🔊 A Python library for audio.
    🔗 spotify.github.io/pedalboard

  78. pywinauto/pywinauto ⭐ 5,497
    Windows GUI Automation with Python (based on text properties)
    🔗 pywinauto.github.io

  79. buildbot/buildbot ⭐ 5,372
    Python-based continuous integration testing framework; your pull requests are more than welcome!
    🔗 www.buildbot.net

  80. prompt-toolkit/ptpython ⭐ 5,328
    A better Python REPL

  81. tebelorg/RPA-Python ⭐ 5,266
    Python package for doing RPA

  82. pythonnet/pythonnet ⭐ 5,153
    Python for .NET is a package that gives Python programmers nearly seamless integration with the .NET Common Language Runtime (CLR) and provides a powerful application scripting tool for .NET developers.
    🔗 pythonnet.github.io

  83. pycqa/pycodestyle ⭐ 5,106
    Simple Python style checker in one Python file
    🔗 pycodestyle.pycqa.org

  84. jorgebastida/awslogs ⭐ 4,937
    AWS CloudWatch logs for Humans™

  85. pytoolz/toolz ⭐ 4,935
    A functional standard library for Python.
    🔗 toolz.readthedocs.org

  86. ashleve/lightning-hydra-template ⭐ 4,762
    PyTorch Lightning + Hydra. A very user-friendly template for ML experimentation. ⚡🔥⚡

  87. bogdanp/dramatiq ⭐ 4,723
    A fast and reliable background task processing library for Python 3.
    🔗 dramatiq.io

  88. pyo3/maturin ⭐ 4,693
    Build and publish crates with pyo3, cffi and uniffi bindings as well as rust binaries as python packages
    🔗 maturin.rs

  89. hhatto/autopep8 ⭐ 4,623
    A tool that automatically formats Python code to conform to the PEP 8 style guide.
    🔗 pypi.org/project/autopep8

  90. pyinvoke/invoke ⭐ 4,568
    Pythonic task management & command execution.
    🔗 pyinvoke.org

  91. ets-labs/python-dependency-injector ⭐ 4,458
    Dependency injection framework for Python
    🔗 python-dependency-injector.ets-labs.org

  92. blealtan/efficient-kan ⭐ 4,418
    An efficient pure-PyTorch implementation of Kolmogorov-Arnold Network (KAN).

  93. pyinfra-dev/pyinfra ⭐ 4,312
    🔧 pyinfra turns Python code into shell commands and runs them on your servers. Execute ad-hoc commands and write declarative operations. Target SSH servers, local machine and Docker containers. Fast and scales from one server to thousands.
    🔗 pyinfra.com

  94. adafruit/circuitpython ⭐ 4,305
    CircuitPython - a Python implementation for teaching coding with microcontrollers
    🔗 circuitpython.org

  95. evhub/coconut ⭐ 4,224
    Coconut (coconut-lang.org) is a variant of Python that adds on top of Python syntax new features for simple, elegant, Pythonic functional programming.
    🔗 coconut-lang.org

  96. miguelgrinberg/python-socketio ⭐ 4,201
    Python Socket.IO server and client

  97. joblib/joblib ⭐ 4,107
    Computing with Python functions.
    🔗 joblib.readthedocs.org

  98. hynek/structlog ⭐ 4,083
    Simple, powerful, and fast logging for Python.
    🔗 www.structlog.org

  99. spotify/basic-pitch ⭐ 4,061
    A lightweight yet powerful audio-to-MIDI converter with pitch bend detection
    🔗 basicpitch.io

  100. python-markdown/markdown ⭐ 4,021
    A Python implementation of John Gruber’s Markdown with Extension support.
    🔗 python-markdown.github.io

  101. more-itertools/more-itertools ⭐ 3,928
    More routines for operating on iterables, beyond itertools
    🔗 more-itertools.rtfd.io

  102. zeromq/pyzmq ⭐ 3,926
    PyZMQ: Python bindings for zeromq
    🔗 zguide.zeromq.org/py:all

  103. rspeer/python-ftfy ⭐ 3,925
    Fixes mojibake and other glitches in Unicode text, after the fact.
    🔗 ftfy.readthedocs.org

  104. pydata/xarray ⭐ 3,902
    N-D labeled arrays and datasets in Python
    🔗 xarray.dev

  105. pypi/warehouse ⭐ 3,758
    The Python Package Index
    🔗 pypi.org

  106. tartley/colorama ⭐ 3,699
    Simple cross-platform colored terminal text in Python

  107. jorisschellekens/borb ⭐ 3,491
    borb is a library for reading, creating and manipulating PDF files in python.
    🔗 borbpdf.com

  108. osohq/oso ⭐ 3,486
    Deprecated: See README

  109. suor/funcy ⭐ 3,436
    A fancy and practical functional tools

  110. pyserial/pyserial ⭐ 3,401
    Python serial port access library

  111. camelot-dev/camelot ⭐ 3,346
    A Python library to extract tabular data from PDFs
    🔗 camelot-py.readthedocs.io

  112. pydantic/logfire ⭐ 3,325
    Uncomplicated Observability for Python and beyond! 🪵🔥
    🔗 logfire.pydantic.dev/docs

  113. libaudioflux/audioFlux ⭐ 3,108
    A library for audio and music analysis, feature extraction.
    🔗 audioflux.top

  114. tinche/aiofiles ⭐ 3,097
    Library for handling local disk files in asyncio applications.

  115. legrandin/pycryptodome ⭐ 3,048
    A self-contained cryptographic library for Python
    🔗 www.pycryptodome.org

  116. jcrist/msgspec ⭐ 2,987
    A fast serialization and validation library, with builtin support for JSON, MessagePack, YAML, and TOML
    🔗 jcristharif.com/msgspec

  117. tox-dev/pipdeptree ⭐ 2,900
    A command line utility to display dependency tree of the installed Python packages
    🔗 pypi.python.org/pypi/pipdeptree

  118. lxml/lxml ⭐ 2,870
    The lxml XML toolkit for Python
    🔗 lxml.de

  119. cdgriffith/Box ⭐ 2,738
    Python dictionaries with advanced dot notation access
    🔗 github.com/cdgriffith/box/wiki

  120. whylabs/whylogs ⭐ 2,732
    An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈
    🔗 whylogs.readthedocs.io

  121. yaml/pyyaml ⭐ 2,719
    Canonical source repository for PyYAML

  122. pypa/setuptools ⭐ 2,715
    Official project repository for the Setuptools build system
    🔗 pypi.org/project/setuptools

  123. pexpect/pexpect ⭐ 2,714
    A Python module for controlling interactive programs in a pseudo-terminal
    🔗 pexpect.readthedocs.io

  124. liiight/notifiers ⭐ 2,704
    The easy way to send notifications
    🔗 notifiers.readthedocs.io

  125. scrapinghub/dateparser ⭐ 2,693
    python parser for human readable dates

  126. litl/backoff ⭐ 2,685
    Python library providing function decorators for configurable backoff and retry

  127. rhettbull/osxphotos ⭐ 2,666
    Python app to work with pictures and associated metadata from Apple Photos on macOS. Also includes a package to provide programmatic access to the Photos library, pictures, and metadata.

  128. hgrecco/pint ⭐ 2,592
    Operate and manipulate physical quantities in Python
    🔗 pint.readthedocs.org

  129. grantjenks/python-diskcache ⭐ 2,565
    Python disk-backed cache (Django-compatible). Faster than Redis and Memcached. Pure-Python.
    🔗 www.grantjenks.com/docs/diskcache

  130. tkem/cachetools ⭐ 2,547
    Various memoizing collections and decorators, including variants of the Python Standard Library's @lru_cache function decorator

  131. nschloe/tikzplotlib ⭐ 2,513
    📊 Save matplotlib figures as TikZ/PGFplots for smooth integration into LaTeX.

  132. dosisod/refurb ⭐ 2,511
    A tool for refurbishing and modernizing Python codebases

  133. pyston/pyston ⭐ 2,509
    (No longer maintained) A faster and highly-compatible implementation of the Python programming language.
    🔗 www.pyston.org

  134. dateutil/dateutil ⭐ 2,484
    Useful extensions to the standard Python datetime features

  135. pndurette/gTTS ⭐ 2,482
    Python library and CLI tool to interface with Google Translate's text-to-speech API
    🔗 gtts.readthedocs.org

  136. kiminewt/pyshark ⭐ 2,391
    Python wrapper for tshark, allowing python packet parsing using wireshark dissectors

  137. nateshmbhat/pyttsx3 ⭐ 2,370
    Offline Text To Speech synthesis for python

  138. abseil/abseil-py ⭐ 2,370
    A collection of Python library code for building Python applications. The code is collected from Google's own Python code base, and has been extensively tested and used in production.

  139. astanin/python-tabulate ⭐ 2,367
    Pretty-print tabular data in Python, a library and a command-line utility. Repository migrated from bitbucket.org/astanin/python-tabulate.
    🔗 pypi.org/project/tabulate

  140. pyparsing/pyparsing ⭐ 2,361
    Python library for creating PEG parsers

  141. seperman/deepdiff ⭐ 2,298
    DeepDiff: Deep Difference and search of any Python object/data. DeepHash: Hash of any object based on its contents. Delta: Use deltas to reconstruct objects by adding deltas together.
    🔗 zepworks.com

  142. omry/omegaconf ⭐ 2,184
    Flexible Python configuration system. The last one you will ever need.

  143. mitmproxy/pdoc ⭐ 2,159
    API Documentation for Python Projects
    🔗 pdoc.dev

  144. grahamdumpleton/wrapt ⭐ 2,152
    A Python module for decorators, wrappers and monkey patching.

  145. ianmiell/shutit ⭐ 2,143
    Automation framework for programmers
    🔗 ianmiell.github.io/shutit

  146. ariebovenberg/whenever ⭐ 2,131
    ⏰ Modern datetime library for Python
    🔗 whenever.rtfd.io

  147. google/gin-config ⭐ 2,111
    Gin provides a lightweight configuration framework for Python

  148. hbldh/bleak ⭐ 2,105
    A cross platform Bluetooth Low Energy Client for Python using asyncio

  149. anthropics/anthropic-sdk-python ⭐ 2,103
    SDK providing access to Anthropic's safety-first language model APIs

  150. numba/llvmlite ⭐ 2,096
    A lightweight LLVM python binding for writing JIT compilers
    🔗 llvmlite.pydata.org

  151. python-rope/rope ⭐ 2,078
    a python refactoring library

  152. open-telemetry/opentelemetry-python ⭐ 2,067
    OpenTelemetry Python API and SDK
    🔗 opentelemetry.io

  153. samuelcolvin/watchfiles ⭐ 2,047
    Simple, modern and fast file watching and code reload for Python, written in Rust
    🔗 watchfiles.helpmanual.io

  154. pyfilesystem/pyfilesystem2 ⭐ 2,045
    Python's Filesystem abstraction layer
    🔗 www.pyfilesystem.org

  155. julienpalard/Pipe ⭐ 2,044
    A Python library to use infix notation in Python

  156. p0dalirius/Coercer ⭐ 2,022
    A python script to automatically coerce a Windows server to authenticate on an arbitrary machine through 12 methods.
    🔗 podalirius.net

  157. landscapeio/prospector ⭐ 2,018
    Inspects Python source files and provides information about type and location of classes, methods etc

  158. pygments/pygments ⭐ 1,994
    Pygments is a generic syntax highlighter written in Python
    🔗 pygments.org

  159. carpedm20/emoji ⭐ 1,980
    emoji terminal output for Python

  160. home-assistant/supervisor ⭐ 1,971
    🏡 Home Assistant Supervisor
    🔗 home-assistant.io/hassio

  161. pydoit/doit ⭐ 1,952
    CLI task management & automation tool
    🔗 pydoit.org

  162. chaostoolkit/chaostoolkit ⭐ 1,943
    Chaos Engineering Toolkit & Orchestration for Developers
    🔗 chaostoolkit.org

  163. mkdocstrings/mkdocstrings ⭐ 1,930
    📘 Automatic documentation from sources, for MkDocs.
    🔗 mkdocstrings.github.io

  164. konradhalas/dacite ⭐ 1,892
    Simple creation of data classes from dictionaries.

  165. rubik/radon ⭐ 1,859
    Various code metrics for Python code
    🔗 radon.readthedocs.org

  166. joowani/binarytree ⭐ 1,813
    Python Library for Studying Binary Trees
    🔗 binarytree.readthedocs.io

  167. kalliope-project/kalliope ⭐ 1,741
    Kalliope is a framework that will help you to create your own personal assistant.
    🔗 kalliope-project.github.io

  168. quodlibet/mutagen ⭐ 1,731
    Python module for handling audio metadata
    🔗 mutagen.readthedocs.io

  169. instagram/LibCST ⭐ 1,710
    A concrete syntax tree parser and serializer library for Python that preserves many aspects of Python's abstract syntax tree
    🔗 libcst.readthedocs.io

  170. aerkalov/ebooklib ⭐ 1,651
    A library for managing EPUB2/EPUB3. It's capable of reading and writing EPUB files programmatically.
    🔗 ebooklib.readthedocs.io

  171. facebookincubator/Bowler ⭐ 1,612
    Safe code refactoring for modern Python.
    🔗 pybowler.io

  172. imageio/imageio ⭐ 1,608
    Python library for reading and writing image data
    🔗 imageio.readthedocs.io

  173. lcompilers/lpython ⭐ 1,601
    Python compiler
    🔗 lpython.org

  174. fabiocaccamo/python-benedict ⭐ 1,570
    📘 dict subclass with keylist/keypath support, built-in I/O operations (base64, csv, html, ini, json, pickle, plist, query

Contributors 2

  •  
  •