Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Bug]: Async engine hangs with 0.4.* releases
bug
Something isn't working
#4789
opened May 13, 2024 by
glos-nv
[Bug]: RAM OOM Error Loading 480GB MoE Model Despite Fix in PR #1395
bug
Something isn't working
#4786
opened May 13, 2024 by
hxer7963
[Bug]: multi-gpu for baichuan2-13B-Chat benchmark_serving
bug
Something isn't working
#4785
opened May 13, 2024 by
shudct
[Bug]: deploy Phi-3-mini-128k-instruct AssertionError
bug
Something isn't working
#4784
opened May 13, 2024 by
hxujal
[Usage]: How to change the batch size when testing the throughput of VLLM by running benchmark_throughput
usage
How to use vllm
#4783
opened May 13, 2024 by
Ourspolaire1
[Doc]: Doc for using tensorizer_uri with LLM is incorrect
documentation
Improvements or additions to documentation
#4782
opened May 13, 2024 by
GRcharles
[Feature]: Support the OpenAI Batch Chat Completions file format
feature request
#4777
opened May 13, 2024 by
wuisawesome
[Bug]: Unexpected Special Tokens in prompt_logprobs Output for Llama3 Prompt
bug
Something isn't working
#4772
opened May 12, 2024 by
leejamesss
[Feature]: CI: Test on NVLink-enabled machine
feature request
#4770
opened May 12, 2024 by
youkaichao
[Feature]: could paged_attention_v1 support parameter 'attn_bias'
feature request
#4766
opened May 11, 2024 by
cillinzhang
[Feature]: Support W4A8KV4 Quantization(QServe/QoQ)
feature request
#4763
opened May 11, 2024 by
bratao
[Performance]: Why the avg. througput generation is low?
performance
Performance-related issues
#4760
opened May 11, 2024 by
rvsh2
[Bug]: CUDA error when running mistral-7b + lora with tensor_para=8
bug
Something isn't working
#4756
opened May 11, 2024 by
sfc-gh-zhwang
Regression in support of customized "role" in OpenAI compatible API (v.0.4.2)
good first issue
Good for newcomers
#4755
opened May 10, 2024 by
simon-mo
[Usage]: Vllm AutoAWQ with 4-GPU doesnt utilize GPU
usage
How to use vllm
#4744
opened May 10, 2024 by
danielstankw
[RFC]: Support specifying quant_config details in the LLM or Server entrypoints
feature request
RFC
#4743
opened May 10, 2024 by
mgoin
[Bug]: ValueError when using LoRA with CohereForCausalLM model
bug
Something isn't working
#4742
opened May 10, 2024 by
onlyfish79
[Bug]: squeezeLLM with sparse could not work.
bug
Something isn't working
#4741
opened May 10, 2024 by
RyanWMHI
[Bug]: why the logits is different between 0.4.1 and 0.4.2
bug
Something isn't working
#4740
opened May 10, 2024 by
sitabulaixizawaluduo
[New Model]: Blip2 Support required
new model
Requests to new models
#4739
opened May 10, 2024 by
anisingh1
[New Model]: fastspeech2_conformer (just need a new attention mechanism: RelPositionMultiHeadedAttention)
new model
Requests to new models
#4736
opened May 10, 2024 by
cillinzhang
[Bug]: When enforce_eager is True or False, the paged_attention version used is inconsistent
bug
Something isn't working
#4731
opened May 10, 2024 by
liangxuegang
Previous Next
ProTip!
no:milestone will show everything without a milestone.