Skip to content

tinygrad 0.7.0

Compare
Choose a tag to compare
@wozeparrot wozeparrot released this 27 Aug 16:40
· 2161 commits to master since this release
8b354b3

Bigger again at 4311 lines :( But, tons of new features this time!

Just over 500 commits since 0.6.0.

Release Highlights

  • Windows support has been dropped to focus on Linux and Mac OS.
    • Some functionality may work on Windows but no support will be provided, use WSL instead.
  • DiskTensors: a way to store tensors on disk has been added.
    • This is coupled with functionality in state.py which supports saving/loading safetensors and loading torch weights.
  • Tensor Cores are supported on M1/Apple Silicon and on the 7900 XTX (WMMA).
    • Support on the 7900 XTX requires weights and data to be in float16, full float16 compute support will come in a later release.
    • Tensor Core behaviour/usage is controlled by the TC envvar.
  • Kernel optimization with nevergrad
    • This optimizes the shapes going into the kernel, gated by the KOPT envvar.
  • P2P buffer transfers are supported on most AMD GPUs when using a single python process.
    • This is controlled by the P2P envvar.
  • LLaMA 2 support.
    • A requirement of this is bfloat16 support for loading the weights, which is semi-supported by casting them to float16, proper bfloat16 support is tracked at #1290.
    • The LLaMA example now also supports 8-bit quantization using the flag --quantize.
  • Most MLPerf models have working inference examples. Training these models is currently being worked on.
  • Initial multigpu training support.
    • slow multigpu training by copying through host shared memory.
    • Somewhat follows torch's multiprocessing and DistributedDataParallel high-level design.
    • See the hlb_cifar10.py example.
  • SymbolicShapeTracker and Symbolic JIT.
    • These two things combined allow models with changing shapes to be jitted like transformers.
    • This means that LLaMA can now be jitted for a massive increase in performance.
    • Be warned that the API for this is very WIP and may change in the future, similarly with the rest of the tinygrad API.
  • aarch64 and ptx assembly backend.
  • WebGPU backend, see the compile_efficientnet.py example.
  • Support for torch like tensor indexing by other tensors.
  • Some more nn layers were promoted, namely Embedding and various Conv layers.
  • VITS and so-vits-svc examples added.
  • Initial documentation work.

And lots of small optimizations all over the codebase.

See the full changelog: v0.6.0...v0.7.0

See the known issues: https://github.com/tinygrad/tinygrad/issues?q=is%3Aissue+is%3Aopen+label%3Abug+sort%3Aupdated-desc

Join the Discord!