Skip to content

v0.13.0

Compare
Choose a tag to compare
@angeloskath angeloskath released this 10 May 01:21
· 35 commits to main since this release
8bd6bfa

Highlights

  • Block sparse matrix multiply speeds up MoEs by >2x
  • Improved quantization algorithm should work well for all networks
  • Improved gpu command submission speeds up training and inference

Core

  • Bitwise ops added:
    • mx.bitwise_[or|and|xor], mx.[left|right]_shift, operator overloads
  • Groups added to Conv1d
  • Added mx.metal.device_info to get better informed memory limits
  • Added resettable memory stats
  • mlx.optimizers.clip_grad_norm and mlx.utils.tree_reduce added
  • Add mx.arctan2
  • Unary ops now accept array-like inputs ie one can do mx.sqrt(2)

Bugfixes

  • Fixed shape for slice update
  • Bugfix in quantize that used slightly wrong scales/biases
  • Fixed memory leak for multi-output primitives encountered with gradient checkpointing
  • Fixed conversion from other frameworks for all datatypes
  • Fixed index overflow for matmul with large batch size
  • Fixed initialization ordering that occasionally caused segfaults