v0.13.0

angeloskath released this 10 May 01:21

· 35 commits to main since this release

Highlights

Block sparse matrix multiply speeds up MoEs by >2x
- some numbers
Improved quantization algorithm should work well for all networks
- see evaluations
Improved gpu command submission speeds up training and inference
- some numbers

Core

Bitwise ops added:
- mx.bitwise_[or|and|xor], mx.[left|right]_shift, operator overloads
Groups added to Conv1d
Added mx.metal.device_info to get better informed memory limits
Added resettable memory stats
mlx.optimizers.clip_grad_norm and mlx.utils.tree_reduce added
Add mx.arctan2
Unary ops now accept array-like inputs ie one can do mx.sqrt(2)

Bugfixes

Fixed shape for slice update
Bugfix in quantize that used slightly wrong scales/biases
Fixed memory leak for multi-output primitives encountered with gradient checkpointing
Fixed conversion from other frameworks for all datatypes
Fixed index overflow for matmul with large batch size
Fixed initialization ordering that occasionally caused segfaults

Assets 2