Skip to content

tenstorrent/tt-metal

Repository files navigation

ttnn logo

TT-NN is python & C++ Neural Network OP library.


Grayskull (GS) Models

Model Batch End-to-end throughput [1] Device throughput [2] Target
ResNet-50 (fps) 20 2,070 7,200 10,000
BERT-Large (sen/s) 12 362 406 410
Falcon7B-decode (t/s) 32 135 135 140
ViT (fps) 8 480 823 2000
T5 small (sen/s) 140
Bloom (sen/s) 70
U-Net coming soon

[1] - Observed from the host. Includes dispatch overahed and kernel execution time.

[2] - Ignoring host overhead. Kernel execution time only.

Wormhole (WH) Models

Model Gen. Token [3] Batch End-to-end throughput [1] Device throughput [2] Target
Falcon7B-decode 129th 32 9.9 t/s/u - 317 t/s 13.5 t/s/u - 432 t/s 21 t/s/u
Mistral-7B-decode 33rd 32 7.9 t/s/u - 253 t/s 10.9 t/s/u - 349 t/s 21 t/s/u
Mamba-2.8B-decode any 32 1.7 t/s/u - 54 t/s 2.0 t/s/u - 64 t/s 17 t/s/u
Stable Diffusion 1.4 512x512 coming soon 1

[3] - Generating the i'th token in a sequence while the kv_cache is filled with i-1 rows.

T3000 (2x4 mesh of WHs) Models

Model Gen. Token [3] Batch End-to-end throughput [1] Device throughput [2] Target
Falcon7B-decode 1025th 256 5.3 t/s/u - 1359 t/s coming soon 21 t/s/u
LLaMA-2-70B-decode 129th 32 2.4 t/s/u - 76.0 t/s 8.4 t/s/u - 268.8 t/s 20 t/s/u
LLaMA-3-70B-decode 129th 32 2.4 t/s/u - 75.4 t/s 7.7 t/s/u - 246.4 t/s 20 t/s/u
Falcon40B-decode coming soon
Mixtral7Bx8-decode coming soon
ResNet50 (data parallel) coming soon

Using TT-NN ops and tensors

import ttnn
import torch

with ttnn.manage_device(device_id=0) as device:
   a = torch.ones((5, 7))
   b = torch.ones((1, 7))

   a = ttnn.from_torch(a, device=device, dtype=ttnn.bfloat16, layout=ttnn.TILE_LAYOUT)
   b = ttnn.from_torch(b, device=device, dtype=ttnn.bfloat16, layout=ttnn.TILE_LAYOUT)

   output = a + b
   output = ttnn.to_torch(output)

print(output)

TT-Metalium logo

TT-Metalium is our low-level programming model, enabling kernel development for Tenstorrent hardware.

Getting started

Get started with simple kernels.