30 Apr 06:38

zerozw

366e1b5

v0.1.13

feat

support gte-Qwen1.5-7B-instruct
support Qwen1.5-MoE

fix

fix V100 performance
fix MULTI_TASK_PROMPT and MULTI_TASK_PROMPT_STR env
fix starcode-7b load failed
fix llava renderer sep
fix split_k_factor

Assets 4

21 Apr 11:08

baowendin

v0.1.12

d10c98c

v0.1.12 Latest

Latest

feature:

支持新模型llama3/code-qwen2/cohere
bug fix:
bloom weight加载错误
temperature不生效

Assets 4

12 Apr 09:50

netaddi

v0.1.11

01b7919

v0.1.11

fix

int4 tp issue

Assets 4

07 Apr 15:05

jianglan89

v0.1.10

3a81bbe

v0.1.10

feat

sp support TP
suport tie_word_embeddings option in hf config.json
update transformers version to 4.39.3

refactor

add log for weight load: lora apply success / miss weight

fix

lora support one q/k/v weight is miss

docs

add Quantization docs

Assets 4

01 Apr 03:42

ySingularity

v0.1.9

6def4b3

v0.1.9

feat

support awq
mv attention mask when use FMHA
support sparse&robert embedding, support calc similarity

refactor

use asyncio.future to avoid resource exclusivity
mv asyncio lock to asyncmodel

fix

tmp fix filelock version
moe model size
add headers for image downloading
update whl version
cutlass interface

docs

update pipeline usage

Assets 4

25 Mar 13:32

xinfeishi

v0.1.8

796698a

v0.1.8

feat

support qwen2 gptq
update multi_task_prompt create
speculative support tp
support roberta

refactor

refactor multimodal model process

fix

fix kv cache int8 bug: add dequantization method in reuse block scenario
fix stream output stop words
fix lora

Assets 4

19 Mar 02:53

dongjiyingdjy

v0.1.7

5b6b9f2

v0.1.7

features

support int4 (experimental) on Qwen GPTQ
support V100 fmha
support Bert
Optimize VIT Engine by TensorRT

refactor

refactor schedule strategy, malloc kv cache in schedule new stream
refactor MOE

docs

update supported models

Assets 4

09 Mar 07:06

zerozw

v0.1.6

de4761a

v0.1.6

features

support starcoder2
support gemma

fixs

fix lora merge
fix num_return_sequences 1
fix query cancel not release source
fix tp block num sync
fix some model rotary embedding dim 64

Assets 4

01 Mar 09:25

baowendin

v0.1.5

688b31e

v0.1.5

features

refactor large amount of server code

fixs

fix inference server concurrency limit no decrease
cancel request correctly when client disconnected
fix ptuning with separate path

Assets 4

26 Feb 06:15

netaddi

v0.1.4

0df9101

v0.1.4

features

support qwen 2
support qwen 1b8 vl
add throughput test

fixes

chatglm3 not output correctly
potential error when pydantic>=2.6.0
concurrency controller not working correctly

Assets 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat

fix

fix

feat

refactor

fix

docs

feat

refactor

fix

docs

feat

refactor

fix

features

refactor

docs

features

fixs

features

fixs

features

fixes

Releases: alibaba/rtp-llm

v0.1.13

feat

fix

v0.1.12

v0.1.11

fix

v0.1.10

feat

refactor

fix

docs

v0.1.9

feat

refactor

fix

docs

v0.1.8

feat

refactor

fix

v0.1.7

features

refactor

docs

v0.1.6

features

fixs

v0.1.5

features

fixs

v0.1.4

features

fixes