Releases · EricLBuehler/mistral.rs

22 May 11:52

EricLBuehler

v0.1.10

5c3e9f1

v0.1.10 Latest

Latest

What's Changed

Fixes and verbosity improvements for device mapping by @EricLBuehler in #332
chore: SimpleModelPaths should be renamed to LocalModelPaths by @polarathene in #331
Remove candle-layer-norm dep by @EricLBuehler in #333
Refactor layers.rs by @EricLBuehler in #338
chore: Simplify utils/token.rs:get_token() by @polarathene in #328
chore: Use strum to simplify GGUFArchitecture maintenance by @polarathene in #334
Fix mistral model repeat kv by @EricLBuehler in #340

New Contributors

@polarathene made their first contribution in #331

Full Changelog: v0.1.9...v0.1.10

Contributors

polarathene and EricLBuehler

Assets 2

18 May 15:11

EricLBuehler

v0.1.9

5a2a577

v0.1.9

What's Changed

Improve chat templates docs by @EricLBuehler in #327
Use cuBLASlt in attention by @EricLBuehler in #325

Full Changelog: v0.1.8...v0.1.9

Contributors

EricLBuehler

Assets 2

16 May 16:32

EricLBuehler

v0.1.8

ca9bf7d

v0.1.8

Overview

Documentation improvements
Better handling of CTRL-C in interactive mode
Matmul via low-precision kernels to take advantage of faster cuBLAS GEMM kernels (thanks @lucasavila00)
New loading API (thanks @Jeadie)
Various small bug fixes
Reduce dependancy complexity (thanks @LLukas22)

What's Changed

bug fix: llama kv cache part by @keisuke-niimi-insightedge-jp in #300
Refactor cache manager and kv cache by @EricLBuehler in #304
Update the docs for ISQ and misc by @EricLBuehler in #310
Make pyo3 an optional dependency in mistralrs-core by @LLukas22 in #303
Update kv cache by @EricLBuehler in #312
Print gguf metadata consistently by @EricLBuehler in #313
Allow loading LoRA without activating adapters and fix bugs by @EricLBuehler in #306
Remove spurious tokenizer warnings by @EricLBuehler in #314
Better handling of ctrlc by @EricLBuehler in #315
Add analysis bot by @EricLBuehler in #316
Quantized: Use cublas for prompt by @lucasavila00 in #238
Support loading model into pipeline from local filesystem by @Jeadie in #308
Fix the ctrlc handler by @EricLBuehler in #318
Don't force QLlama to have >2 input dims @Jeadie by @Jeadie in #320
Matmul via f16 when possible by @EricLBuehler in #317

New Contributors

@keisuke-niimi-insightedge-jp made their first contribution in #300
@Jeadie made their first contribution in #308

Full Changelog: v0.1.7...v0.1.8

Contributors

lucasavila00, Jeadie, and 3 other contributors

Assets 2

14 May 12:13

EricLBuehler

v0.1.7

8c7c455

v0.1.7

What's Changed

Add terminate on next step handler via ctrlc by @EricLBuehler in #301
Update containers to cuda 12.4 + Fix missing libraries by @LLukas22 in #302

Full Changelog: v0.1.6...v0.1.7

This release has relatively few changes, its major purpose is to update the containers and synchronize the versions.

Contributors

LLukas22 and EricLBuehler

Assets 2

13 May 09:33

EricLBuehler

v0.1.6

70aebcc

v0.1.6

What's Changed

Causal Masking and model selection from .toml files by @EricLBuehler in #278
Remove sliding window mask from quantized phi3 by @EricLBuehler in #280
Fix Causal Mask by @EricLBuehler in #282
Fix mask caching by @EricLBuehler in #283
More intelligent scheduler by @EricLBuehler in #279
Use warn! macro by @EricLBuehler in #289
Use a public repo for tests tokenizer.json by @EricLBuehler in #290
Implement Speculative Decoding by @EricLBuehler in #242
Add X-LoRA support for GGUF by @EricLBuehler in #293
Add some "senseful" fallbacks for isq by @LLukas22 in #272
Implement dynamic LoRA swapping by @EricLBuehler in #262
More verbose logging when loading locally by @EricLBuehler in #298
Make speculative decoding faster without anything fancy by @EricLBuehler in #297
fix bug with mistralrs cuda by @joshpopelka20 in #299

New Contributors

@joshpopelka20 made their first contribution in #299

New Features

Speculative decoding introduced
GGUF support for Phi 3
Dynamic LoRA adapter activation support

Full Changelog: v0.1.5...v0.1.6

Contributors

LLukas22, EricLBuehler, and joshpopelka20

Assets 2

09 May 12:19

EricLBuehler

v0.1.5

94664f3

v0.1.5

What's Changed

Warmup pass for mistralrs-bench by @EricLBuehler in #270
Fix short param conflict for LoRA by @EricLBuehler in #271
Add build.rs to PyO3 to improve compat when extension_module by @EricLBuehler in #274
Add the quantized phi3 model by @EricLBuehler in #276

Full Changelog: v0.1.4...v0.1.5

Contributors

EricLBuehler

Assets 2

08 May 09:09

EricLBuehler

v0.1.4

60583df

v0.1.4

What's Changed

Major pipeline refactor by @EricLBuehler in #261
docs: update README.md by @eltociear in #264
Support EOF in interactive mode by @EricLBuehler in #267
Fix concat in PhiRotaryEmbedding by @EricLBuehler in #268
More organized config printing by @EricLBuehler in #269

Full Changelog: v0.1.3...v0.1.4

Contributors

eltociear and EricLBuehler

Assets 2

02 May 01:14

EricLBuehler

v0.1.3

9577d54

v0.1.3

What's Changed

Add automatic pypi upload and docker build on release by @EricLBuehler in #255
Update PyO3 to take dict by @EricLBuehler in #257

Full Changelog: v0.1.2...v0.1.3

Contributors

EricLBuehler

Assets 2

30 Apr 09:52

EricLBuehler

v0.1.2

4ffe68d

v0.1.2

New features

Initial async integrations (#198, #236) thanks to @lucasavila00.
More flexibility with bos and eos tokens (#248)
Intermediate loading for ISQ models on CPU (#229)
Fixed Phi 3 128k finally, it is fully working now! (#251)

Changelog

Update README.md by @KPCOFGS in #224
Fix api_dir_list! and show better error by @EricLBuehler in #225
Default to none when cannot find token by @EricLBuehler in #226
docs: update ADAPTER_MODELS.md by @eltociear in #227
Fix debug log timing of first token by @lucasavila00 in #231
Implement intermediate loading for ISQ on CPU by @EricLBuehler in #229
Async sampling by @lucasavila00 in #198
Fix quantized example by @lucasavila00 in #237
Source bos, eos tokens from generation_config.json by @EricLBuehler in #243
Sliding window for phi3 by @EricLBuehler in #244
Fix docker images by @LLukas22 in #249
Remove forced max seq len for llama models by @EricLBuehler in #250
Fix Phi3 128k finally: use position ids to switch between short/long scaling by @EricLBuehler in #251
Update README.md by @criminact in #253
Async channels by @lucasavila00 in #236

New Contributors

@KPCOFGS made their first contribution in #224
@eltociear made their first contribution in #227
@criminact made their first contribution in #253

Full Changelog: v0.1.0...v0.1.2

Contributors

lucasavila00, eltociear, and 4 other contributors

Assets 2

27 Apr 10:20

EricLBuehler

v0.1.0

08e95f6

v0.1.0

Update version

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

New Contributors

Contributors

What's Changed

Contributors

Overview

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

New Contributors

New Features

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

New features

Changelog

New Contributors

Contributors

Releases: EricLBuehler/mistral.rs

v0.1.10

What's Changed

New Contributors

Contributors

v0.1.9

What's Changed

Contributors

v0.1.8

Overview

What's Changed

New Contributors

Contributors

v0.1.7

What's Changed

Contributors

v0.1.6

What's Changed

New Contributors

New Features

Contributors

v0.1.5

What's Changed

Contributors

v0.1.4

What's Changed

Contributors

v0.1.3

What's Changed

Contributors

v0.1.2

New features

Changelog

New Contributors

Contributors

v0.1.0