A high-throughput and memory-efficient inference and serving engine for LLMs
-
Updated
May 13, 2024 - Python
A high-throughput and memory-efficient inference and serving engine for LLMs
High-efficiency floating-point neural network inference operators for mobile, server, and Web
AICI: Prompts as (Wasm) Programs
A high-performance inference system for large language models, designed for production environments.
Port of OpenAI's Whisper model in C/C++
Cross-platform, customizable ML solutions for live and streaming media.
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
Disparity Proxies and Social Determinants of Health
A universal scalable machine learning model deployment solution
Woodwork is a Python library that provides robust methods for managing and communicating data typing information.
Seamlessly integrate with top LLM APIs for speedy, robust, and scalable querying. Ideal for developers needing quick, reliable AI-powered responses.
Making large AI models cheaper, faster and more accessible
🔮 SuperDuperDB: Bring AI to your database! Build, deploy and manage any AI application directly with your existing data infrastructure, without moving your data. Including streaming inference, scalable model training and vector search.
Large Language Model Text Generation Inference
A Rust wrapper for ONNX Runtime
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
A fast, easy-to-use, production-ready inference server for computer vision supporting deployment of many popular model architectures and fine-tuned models.
Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.
Add a description, image, and links to the inference topic page so that developers can more easily learn about it.
To associate your repository with the inference topic, visit your repo's landing page and select "manage topics."