Releases: EricLBuehler/mistral.rs
Releases Β· EricLBuehler/mistral.rs
v0.1.10
What's Changed
- Fixes and verbosity improvements for device mapping by @EricLBuehler in #332
- chore:
SimpleModelPaths
should be renamed toLocalModelPaths
by @polarathene in #331 - Remove candle-layer-norm dep by @EricLBuehler in #333
- Refactor layers.rs by @EricLBuehler in #338
- chore: Simplify
utils/token.rs:get_token()
by @polarathene in #328 - chore: Use
strum
to simplifyGGUFArchitecture
maintenance by @polarathene in #334 - Fix mistral model repeat kv by @EricLBuehler in #340
New Contributors
- @polarathene made their first contribution in #331
Full Changelog: v0.1.9...v0.1.10
v0.1.9
What's Changed
- Improve chat templates docs by @EricLBuehler in #327
- Use cuBLASlt in attention by @EricLBuehler in #325
Full Changelog: v0.1.8...v0.1.9
v0.1.8
Overview
- Documentation improvements
- Better handling of CTRL-C in interactive mode
- Matmul via low-precision kernels to take advantage of faster cuBLAS GEMM kernels (thanks @lucasavila00)
- New loading API (thanks @Jeadie)
- Various small bug fixes
- Reduce dependancy complexity (thanks @LLukas22)
What's Changed
- bug fix: llama kv cache part by @keisuke-niimi-insightedge-jp in #300
- Refactor cache manager and kv cache by @EricLBuehler in #304
- Update the docs for ISQ and misc by @EricLBuehler in #310
- Make
pyo3
an optional dependency inmistralrs-core
by @LLukas22 in #303 - Update kv cache by @EricLBuehler in #312
- Print gguf metadata consistently by @EricLBuehler in #313
- Allow loading LoRA without activating adapters and fix bugs by @EricLBuehler in #306
- Remove spurious tokenizer warnings by @EricLBuehler in #314
- Better handling of ctrlc by @EricLBuehler in #315
- Add analysis bot by @EricLBuehler in #316
- Quantized: Use cublas for prompt by @lucasavila00 in #238
- Support loading model into pipeline from local filesystem by @Jeadie in #308
- Fix the ctrlc handler by @EricLBuehler in #318
- Don't force QLlama to have >2 input dims @Jeadie by @Jeadie in #320
- Matmul via f16 when possible by @EricLBuehler in #317
New Contributors
- @keisuke-niimi-insightedge-jp made their first contribution in #300
- @Jeadie made their first contribution in #308
Full Changelog: v0.1.7...v0.1.8
v0.1.7
What's Changed
- Add terminate on next step handler via ctrlc by @EricLBuehler in #301
- Update containers to cuda 12.4 + Fix missing libraries by @LLukas22 in #302
Full Changelog: v0.1.6...v0.1.7
This release has relatively few changes, its major purpose is to update the containers and synchronize the versions.
v0.1.6
What's Changed
- Causal Masking and model selection from
.toml
files by @EricLBuehler in #278 - Remove sliding window mask from quantized phi3 by @EricLBuehler in #280
- Fix Causal Mask by @EricLBuehler in #282
- Fix mask caching by @EricLBuehler in #283
- More intelligent scheduler by @EricLBuehler in #279
- Use
warn!
macro by @EricLBuehler in #289 - Use a public repo for tests tokenizer.json by @EricLBuehler in #290
- Implement Speculative Decoding by @EricLBuehler in #242
- Add X-LoRA support for GGUF by @EricLBuehler in #293
- Add some "senseful" fallbacks for
isq
by @LLukas22 in #272 - Implement dynamic LoRA swapping by @EricLBuehler in #262
- More verbose logging when loading locally by @EricLBuehler in #298
- Make speculative decoding faster without anything fancy by @EricLBuehler in #297
- fix bug with mistralrs cuda by @joshpopelka20 in #299
New Contributors
- @joshpopelka20 made their first contribution in #299
New Features
- Speculative decoding introduced
- GGUF support for Phi 3
- Dynamic LoRA adapter activation support
Full Changelog: v0.1.5...v0.1.6
v0.1.5
What's Changed
- Warmup pass for mistralrs-bench by @EricLBuehler in #270
- Fix short param conflict for LoRA by @EricLBuehler in #271
- Add build.rs to PyO3 to improve compat when extension_module by @EricLBuehler in #274
- Add the quantized phi3 model by @EricLBuehler in #276
Full Changelog: v0.1.4...v0.1.5
v0.1.4
What's Changed
- Major pipeline refactor by @EricLBuehler in #261
- docs: update README.md by @eltociear in #264
- Support EOF in interactive mode by @EricLBuehler in #267
- Fix concat in PhiRotaryEmbedding by @EricLBuehler in #268
- More organized config printing by @EricLBuehler in #269
Full Changelog: v0.1.3...v0.1.4
v0.1.3
What's Changed
- Add automatic pypi upload and docker build on release by @EricLBuehler in #255
- Update PyO3 to take dict by @EricLBuehler in #257
Full Changelog: v0.1.2...v0.1.3
v0.1.2
New features
- Initial
async
integrations (#198, #236) thanks to @lucasavila00. - More flexibility with
bos
andeos
tokens (#248) - Intermediate loading for ISQ models on CPU (#229)
- Fixed Phi 3 128k finally, it is fully working now! (#251)
Changelog
- Update README.md by @KPCOFGS in #224
- Fix api_dir_list! and show better error by @EricLBuehler in #225
- Default to
none
when cannot find token by @EricLBuehler in #226 - docs: update ADAPTER_MODELS.md by @eltociear in #227
- Fix debug log timing of first token by @lucasavila00 in #231
- Implement intermediate loading for ISQ on CPU by @EricLBuehler in #229
- Async sampling by @lucasavila00 in #198
- Fix quantized example by @lucasavila00 in #237
- Source bos, eos tokens from generation_config.json by @EricLBuehler in #243
- Sliding window for phi3 by @EricLBuehler in #244
- Fix docker images by @LLukas22 in #249
- Remove forced max seq len for llama models by @EricLBuehler in #250
- Fix Phi3 128k finally: use position ids to switch between short/long scaling by @EricLBuehler in #251
- Update README.md by @criminact in #253
- Async channels by @lucasavila00 in #236
New Contributors
- @KPCOFGS made their first contribution in #224
- @eltociear made their first contribution in #227
- @criminact made their first contribution in #253
Full Changelog: v0.1.0...v0.1.2
v0.1.0
Update version