Inference Benchmark

Maximize the potential of your models with the inference benchmark (tool).

What is it

Inference benchmark provides a standard way to measure the performance of inference workloads. It is also a tool that allows you to evaluate and optimize the performance of your inference workloads.

Results

Bert

We benchmarked pytriton (triton-inference-server) and mosec with bert. We enabled dynamic batching for both frameworks with max batch size 32 and max wait time 10ms. Please checkout the result for more details.

More results with different models on different serving frameworks are coming soon.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
benchmark		benchmark
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

benchmark

benchmark

src

src

.gitignore

.gitignore

Dockerfile

Dockerfile

Makefile

Makefile

README.md

README.md

pyproject.toml

pyproject.toml

Repository files navigation

Inference Benchmark

What is it

Results

Bert

About

Releases

Packages

Contributors 3

Languages

tensorchord/inference-benchmark

Folders and files

Latest commit

History

Repository files navigation

Inference Benchmark

What is it

Results

Bert

About

Topics

Resources

Stars

Watchers

Forks

Languages