#

inference

Here are 1,176 public repositories matching this topic...

openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

nlp natural-language-processing ai computer-vision deep-learning transformers inference speech-recognition yolo recommendation-system performance-boost good-first-issue openvino diffusion-models stable-diffusion generative-ai llm-inference optimize-ai deploy-ai

Updated May 9, 2024
C++

google / jetstream-pytorch

PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"

inference pytorch batching attention llama gemma model-serving tpu llm llm-inference llama2

Updated May 9, 2024
Python

ColossalAI

hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible

ai deep-learning hpc distributed-computing inference big-model large-scale data-parallelism model-parallelism pipeline-parallelism foundation-models heterogeneous-training

Updated May 9, 2024
Python

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

amd cuda inference pytorch transformer llama gpt rocm model-serving mlops llm inferentia llmops llm-serving trainium

Updated May 9, 2024
Python

huggingface / optimum

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools

training optimization intel transformers inference pytorch quantization onnx tflite onnxruntime graphcore habana

Updated May 9, 2024
Python

openvino_notebooks

openvinotoolkit / openvino_notebooks

📚 Jupyter notebook tutorials for OpenVINO™

machine-learning computer-vision deep-learning inference openvino

Updated May 9, 2024
Jupyter Notebook

microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

machine-learning compression deep-learning gpu inference pytorch zero data-parallelism model-parallelism mixture-of-experts pipeline-parallelism billion-parameters trillion-parameters

Updated May 9, 2024
Python

superduperdb

SuperDuperDB / superduperdb

🔮 SuperDuperDB: Bring AI to your database! Build, deploy and manage any AI application directly with your existing data infrastructure, without moving your data. Including streaming inference, scalable model training and vector search.

Updated May 9, 2024
Python

xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.

Updated May 9, 2024
Python

alibaba / rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

inference llama gpt model-serving llm llmops llm-serving

Updated May 9, 2024
C++

google / XNNPACK

High-efficiency floating-point neural network inference operators for mobile, server, and Web

cpu neural-network inference multithreading simd matrix-multiplication neural-networks convolutional-neural-networks convolutional-neural-network inference-optimization mobile-inference

Updated May 9, 2024
C

JRigh / Basic-Statistics-handwritten-notes

statistics notes mathematics inference modelling

Updated May 9, 2024

vectorch-ai / ScaleLLM

A high-performance inference system for large language models, designed for production environments.

performance gpu model production cuda efficiency inference transformer llama speculative serving llm llm-inference llama3

Updated May 9, 2024
C++

nikolaydubina / llama2.go

LLaMA-2 in native Go

machine-learning ml inference neural-networks llama small-code llm large-language-model llama2

Updated May 9, 2024
Go

nikolaydubina / go-ml-benchmarks

⏱ Benchmarks of machine learning inference for Go

python go machine-learning cpp scikit-learn grpc inference xgboost benchmarks

Updated May 9, 2024
Go

wingman

curtisgray / wingman

Wingman is the fastest and easiest way to run Llama models on your PC or Mac.

windows macos linux downloader ai local download gpu chatbot inference openai gpu-acceleration llama inference-server inference-engine gpu-monitoring llm chatgpt llamacpp

Updated May 9, 2024
TypeScript

triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

machine-learning cloud deep-learning gpu inference edge datacenter

Updated May 9, 2024
Python

torchpipe / torchpipe

An Alternative for Triton Inference Server. Boosting DL Service Throughput 1.5-4x by Ensemble Pipeline Serving with Concurrent CUDA Streams for PyTorch/LibTorch Frontend and TensorRT/CVCUDA, etc., Backends

deployment inference pytorch ray serve tensorrt serving pipeline-parallelism torch2trt triton-inference-server ray-serve cvcuda

Updated May 9, 2024
C++

rajatkrishna / nlp-benchspark

Benchmark inference of custom and pre-trained NLP models with Spark NLP.

nlp benchmarking benchmark spark inference pyspark spark-nlp llm

Updated May 9, 2024
Python

whisper.cpp

ggerganov / whisper.cpp

Port of OpenAI's Whisper model in C/C++

inference transformer speech-recognition openai speech-to-text whisper

Updated May 8, 2024
C

Improve this page

Add a description, image, and links to the inference topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the inference topic, visit your repo's landing page and select "manage topics."