OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
-
Updated
May 9, 2024 - C++
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"
Making large AI models cheaper, faster and more accessible
A high-throughput and memory-efficient inference and serving engine for LLMs
🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
📚 Jupyter notebook tutorials for OpenVINO™
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
🔮 SuperDuperDB: Bring AI to your database! Build, deploy and manage any AI application directly with your existing data infrastructure, without moving your data. Including streaming inference, scalable model training and vector search.
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
High-efficiency floating-point neural network inference operators for mobile, server, and Web
A high-performance inference system for large language models, designed for production environments.
LLaMA-2 in native Go
⏱ Benchmarks of machine learning inference for Go
Wingman is the fastest and easiest way to run Llama models on your PC or Mac.
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
An Alternative for Triton Inference Server. Boosting DL Service Throughput 1.5-4x by Ensemble Pipeline Serving with Concurrent CUDA Streams for PyTorch/LibTorch Frontend and TensorRT/CVCUDA, etc., Backends
Port of OpenAI's Whisper model in C/C++
Add a description, image, and links to the inference topic page so that developers can more easily learn about it.
To associate your repository with the inference topic, visit your repo's landing page and select "manage topics."