Releases · tinygrad/tinygrad

09 Jan 18:16

geohot

v0.8.0

2c6f2e8

tinygrad 0.8.0 Latest

Latest

Close to the new limit of 5000 lines at 4981.

Release Highlights

Real dtype support within kernels!
New .schedule() API to separate concerns of scheduling and running
New lazy.py implementation doesn't reorder at build time. GRAPH=1 is usable to debug issues
95 TFLOP FP16->FP32 matmuls on 7900XTX
GPT2 runs (jitted) in 2 ms on NVIDIA 3090
Powerful and fast kernel beam search with BEAM=2
GPU/CUDA/HIP backends switched to gpuctypes
New (alpha) multigpu sharding API with .shard

See the full changelog: `v0.7.0...v0.8.0`

Join the Discord!

Assets 2

0 Join discussion

27 Aug 16:40

wozeparrot

v0.7.0

8b354b3

tinygrad 0.7.0

Bigger again at 4311 lines :( But, tons of new features this time!

Just over 500 commits since 0.6.0.

Release Highlights

Windows support has been dropped to focus on Linux and Mac OS.
- Some functionality may work on Windows but no support will be provided, use WSL instead.
DiskTensors: a way to store tensors on disk has been added.
- This is coupled with functionality in state.py which supports saving/loading safetensors and loading torch weights.
Tensor Cores are supported on M1/Apple Silicon and on the 7900 XTX (WMMA).
- Support on the 7900 XTX requires weights and data to be in float16, full float16 compute support will come in a later release.
- Tensor Core behaviour/usage is controlled by the TC envvar.
Kernel optimization with nevergrad
- This optimizes the shapes going into the kernel, gated by the KOPT envvar.
P2P buffer transfers are supported on most AMD GPUs when using a single python process.
- This is controlled by the P2P envvar.
LLaMA 2 support.
- A requirement of this is bfloat16 support for loading the weights, which is semi-supported by casting them to float16, proper bfloat16 support is tracked at #1290.
- The LLaMA example now also supports 8-bit quantization using the flag --quantize.
Most MLPerf models have working inference examples. Training these models is currently being worked on.
Initial multigpu training support.
- slow multigpu training by copying through host shared memory.
- Somewhat follows torch's multiprocessing and DistributedDataParallel high-level design.
- See the hlb_cifar10.py example.
SymbolicShapeTracker and Symbolic JIT.
- These two things combined allow models with changing shapes to be jitted like transformers.
- This means that LLaMA can now be jitted for a massive increase in performance.
- Be warned that the API for this is very WIP and may change in the future, similarly with the rest of the tinygrad API.
aarch64 and ptx assembly backend.
WebGPU backend, see the compile_efficientnet.py example.
Support for torch like tensor indexing by other tensors.
Some more nn layers were promoted, namely Embedding and various Conv layers.
VITS and so-vits-svc examples added.
Initial documentation work.
- Quickstart guide: /docs/quickstart.md
- Environment variable reference: /docs/env_vars.md

And lots of small optimizations all over the codebase.

See the full changelog: `v0.6.0...v0.7.0`

See the known issues: https://github.com/tinygrad/tinygrad/issues?q=is%3Aissue+is%3Aopen+label%3Abug+sort%3Aupdated-desc

Join the Discord!

Assets 2

9 Join discussion

26 May 01:02

geohot

v0.6.0

f4f23dc

tinygrad 0.6.0

2516 lines now. Some day I promise a release will make it smaller.

float16 support (needed for LLaMA)
Fixed critical bug in training BatchNorm
Limited support for multiple GPUs
ConvNeXt + several MLPerf models in models/
More torch-like methods in tensor.py
Big refactor of the codegen into the Linearizer and CStyle
Removed CompiledBuffer, use the LazyBuffer ShapeTracker

Assets 2

07 Mar 02:21

geohot

v0.5.0

d8dda2a

tinygrad 0.5.0

An upsetting 2223 lines of code, but so much great stuff!

7 backends : CLANG, CPU, CUDA, GPU, LLVM, METAL, and TORCH
A TinyJit for speed (decorate your GPU function today)
Support for a lot of onnx, including all the models in the backend tests
No more MLOP convs, all HLOP (autodiff for convs)
Improvements to shapetracker and symbolic engine
15% faster at running the openpilot model

Assets 2

2 Join discussion

08 Nov 16:49

geohot

v0.4.0

8dc28dd

tinygrad 0.4.0

So many changes since 0.3.0

Fairly stable and correct, though still not fast. The hlops/mlops are solid, just needs work on the llops.

The first automated release, so hopefully it works?

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release Highlights

See the full changelog: `v0.7.0...v0.8.0`

Join the Discord!

Release Highlights

See the full changelog: `v0.6.0...v0.7.0`

See the known issues: https://github.com/tinygrad/tinygrad/issues?q=is%3Aissue+is%3Aopen+label%3Abug+sort%3Aupdated-desc

Join the Discord!

Releases: tinygrad/tinygrad

tinygrad 0.8.0

Release Highlights

See the full changelog: v0.7.0...v0.8.0

Join the Discord!

tinygrad 0.7.0

Release Highlights

See the full changelog: v0.6.0...v0.7.0

See the known issues: https://github.com/tinygrad/tinygrad/issues?q=is%3Aissue+is%3Aopen+label%3Abug+sort%3Aupdated-desc

Join the Discord!

tinygrad 0.6.0

tinygrad 0.5.0

tinygrad 0.4.0

See the full changelog: `v0.7.0...v0.8.0`

See the full changelog: `v0.6.0...v0.7.0`