Skip to content

tensorchord/inference-benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Inference Benchmark

Maximize the potential of your models with the inference benchmark (tool).

discord invitation link trackgit-views

What is it

Inference benchmark provides a standard way to measure the performance of inference workloads. It is also a tool that allows you to evaluate and optimize the performance of your inference workloads.

Results

Bert

We benchmarked pytriton (triton-inference-server) and mosec with bert. We enabled dynamic batching for both frameworks with max batch size 32 and max wait time 10ms. Please checkout the result for more details.

DistilBert

More results with different models on different serving frameworks are coming soon.

About

Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published