Releases: alibaba/rtp-llm
Releases · alibaba/rtp-llm
v0.1.13
feat
- support gte-Qwen1.5-7B-instruct
- support Qwen1.5-MoE
fix
- fix V100 performance
- fix MULTI_TASK_PROMPT and MULTI_TASK_PROMPT_STR env
- fix starcode-7b load failed
- fix llava renderer sep
- fix split_k_factor
v0.1.12
feature:
- 支持新模型llama3/code-qwen2/cohere
bug fix:
- bloom weight加载错误
- temperature不生效
v0.1.10
feat
- sp support TP
- suport tie_word_embeddings option in hf config.json
- update transformers version to 4.39.3
refactor
- add log for weight load: lora apply success / miss weight
fix
- lora support one q/k/v weight is miss
docs
v0.1.9
feat
- support awq
- mv attention mask when use FMHA
- support sparse&robert embedding, support calc similarity
refactor
- use asyncio.future to avoid resource exclusivity
- mv asyncio lock to asyncmodel
fix
- tmp fix filelock version
- moe model size
- add headers for image downloading
- update whl version
- cutlass interface
docs
v0.1.8
feat
- support qwen2 gptq
- update multi_task_prompt create
- speculative support tp
- support roberta
refactor
- refactor multimodal model process
fix
- fix kv cache int8 bug: add dequantization method in reuse block scenario
- fix stream output stop words
- fix lora
v0.1.7
features
- support int4 (experimental) on Qwen GPTQ
- support V100 fmha
- support Bert
- Optimize VIT Engine by TensorRT
refactor
- refactor schedule strategy, malloc kv cache in schedule new stream
- refactor MOE
docs
v0.1.6
features
- support starcoder2
- support gemma
fixs
- fix lora merge
- fix num_return_sequences 1
- fix query cancel not release source
- fix tp block num sync
- fix some model rotary embedding dim 64
v0.1.5
features
- refactor large amount of server code
fixs
- fix inference server concurrency limit no decrease
- cancel request correctly when client disconnected
- fix ptuning with separate path
v0.1.4
features
- support qwen 2
- support qwen 1b8 vl
- add throughput test
fixes
- chatglm3 not output correctly
- potential error when pydantic>=2.6.0
- concurrency controller not working correctly