Release v1.6.0 - Llama3 and Qwen2 series models supported. · intel/xFasterTransformer

v1.6.0 - Llama3 and Qwen2 series models supported.

Functionality

Support Llama3 and Qwen2 series models.
Add INT8 KV cache datatype, using kv_cache_dtype params to specify, including int8, fp16(default) and fp32.
More models enable full BF16 pipline, includes Chatglm2/3 and yarn-llama.
Add invokeMLPLLaMA FP16 API.
Support logits output using forward() api.

Dependency

Bump transformers to 4.40.0 to support Llama3 models.

Performance

Update xDNN to release v1.4.6

BUG fix

Fix numeric overflow when calculate softmax in sampling.
fix assert bug when concat gate&up.

What's Changed

Generated release nots

[Model] Expose KV cache data type in Llama model. by @pujiang2018 in #313
[API] Format rotary_embedding api. by @changqi1 in #303
[Kernel] Add kernel support for INT8 KV cache. by @pujiang2018 in #314
[Convert] Fix Qwen convert issue. by @marvin-Yu in #315
[API] Add invokeMLPLLaMA FP16 API. by @changqi1 in #302
[Build] Fix build issue. by @changqi1 in #316
Chatglm2/3 bf16 pipeline support by @a3213105 in #301
[README] Add README_CN.md. by @Duyi-Wang in #317
[Kernel] Bug fix for small_gemm_transb by @pujiang2018 in #318
[Eval] Get logits output. by @marvin-Yu in #319
[CMake] Add oneccl build depends for comm_helper. by @Duyi-Wang in #322
[Layers] fix assert bug when concat gate&up by @abenmao in #323
[Sample] Fix numeric overflow when calculate softmax. by @Duyi-Wang in #326
[Models] Use factory class to create decoder. by @Duyi-Wang in #321
[RAEDME] Update readme for the dependent lib. by @xwang98 in #331
[KVCache] INT8 KV cache implementation and related changes by @pujiang2018 in #320
[Model] Add Qwen2 model. by @marvin-Yu in #330
[KVCache] Add inferface and register for kvcache. by @Duyi-Wang in #336
[Demo] Add kvcache type option in web demo. by @Duyi-Wang in #338
[Benchmark] Add KVCache data type option. by @Duyi-Wang in #337
[model] Add llama3 model. by @marvin-Yu in #340
[Kernel] Add 'acc' param in small_gemm, add lacked and remove unused small_gemm kernels. by @pujiang2018 in #346
[xDNN] Release v1.4.6. by @changqi1 in #342
[Evaluation] fix the model register bug in evaluation by @abenmao in #347
[Models] YaRN-Llama full-link bf16 support by @abenmao in #344
[UT] Remove beam search test temporarily. by @Duyi-Wang in #349
[Version] v1.6.0. by @Duyi-Wang in #352

New Contributors

@xwang98 made their first contribution in #331

Full Changelog: v1.5.0...v1.6.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.6.0 - Llama3 and Qwen2 series models supported.

Functionality

Dependency

Performance

BUG fix

What's Changed

New Contributors

Contributors