Intel® Extension for Transformers v1.4.1 Release

kevinintel released this 21 Apr 08:38

· 52 commits to main since this release

Highlights
Improvements
Examples
Bug Fixing

Highlights

Support Weight-only Quantization on MTL iGPU
Upgrade lm-eval to 0.4.2
Support Llama3

Improvements

Support TPP for Xeon Tensor Parallel (5f0430f )
Refine Model from_pretrained When use_neural_speed (39ecf38e )

Examples

Add vision front-end demo (1c6550 )
Add example for table extraction, and enabled multi-page table handling pipeline (db9e6fb )
Adapted textual inversion distillation for quantization example to latest transformers and diffusers packages (0ec83b1 )
Update NeuralChat Notebooks (83bb65a, 629b9d4 )

Bug Fixing

Fix QBits actshuf buf overflow under large batch (a6f3ab3 )
Fix TPP support for single socket (a690072 )
Fix retrieval dependency (281b0a3 )
Fix loading issue of woq model with parameters (37f9db25 )

Validated Configurations

Python 3.10
Ubuntu 22.04
PyTorch 2.2.0+cpu
Intel® Extension for Torch 2.2.0+cpu

Assets 2