-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[New Model]: IBM Granite Code Models
new model
Requests to new models
#5095
opened May 29, 2024 by
Semihal
[Bug]: Can't run vllm distributed inference with vLLM + Ray
bug
Something isn't working
#5094
opened May 29, 2024 by
linchen111
[Bug]: The implementation of DynamicNTKScalingRotaryEmbedding may have errors.
bug
Something isn't working
#5093
opened May 29, 2024 by
macheng6
[Bug]: Gemma model fails with GPTQ marlin
bug
Something isn't working
#5088
opened May 28, 2024 by
arunpatala
[Installation]: Error when importing LLM from vllm
installation
Installation problems
#5086
opened May 28, 2024 by
manishkumar0709
[Feature]: Adopt Colossal Inference Features (55% speedup over vLLM)
feature request
#5085
opened May 28, 2024 by
casper-hansen
[Bug]: The vllm is disconnected after running for some time
bug
Something isn't working
#5084
opened May 28, 2024 by
zxcdsa45687
curl http://localhost:8000/generate {"detail":"Not Found"}[Usage] generate relu can not ues
usage
How to use vllm
#5082
opened May 28, 2024 by
fishingcatgo
[Usage]: Inference on LLama3-8b-Instruct using Lora Adapter
usage
How to use vllm
#5078
opened May 28, 2024 by
jetlime
[Bug]: The inference speed of vllm running command-r-plus-gptq is very slow
bug
Something isn't working
#5076
opened May 28, 2024 by
leoterry-ulrica
[Performance]: A few performance-related questions.
performance
Performance-related issues
#5072
opened May 27, 2024 by
maxin9966
[Bug]: Build/Install Issues with pip install -e .
bug
Something isn't working
#5071
opened May 27, 2024 by
Msiavashi
[Bug]: The VRAM usage of calculating log_probs is not considered in profile run
bug
Something isn't working
#5067
opened May 27, 2024 by
Conless
[Bug]: When load model weights, there are infinite loading
bug
Something isn't working
#5062
opened May 27, 2024 by
tjrlwjd1
[Bug]: vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already.
bug
Something isn't working
#5060
opened May 26, 2024 by
heungson
Running Vllm on ray cluster, logging stuck at loading
bug
Something isn't working
#5052
opened May 25, 2024 by
maherr13
[Feature]: Add num_requests_preempted metric
feature request
#5051
opened May 25, 2024 by
sathyanarays
[Bug]: 英伟达最新驱动555.85,vllm运行报错
bug
Something isn't working
#5035
opened May 24, 2024 by
gaye746560359
[Bug]: Command-R incorrect output contains Something isn't working
<EOS_TOKEN>
and seems to do text prediction rather than conversation
bug
#5030
opened May 24, 2024 by
epignatelli
[Usage]: I use llama3. I found that one token is 'Ġor' in tokenizer.get_vocab(). But when I use vllm server, I got ' or' in response.
usage
How to use vllm
#5028
opened May 24, 2024 by
fengshansi
Previous Next
ProTip!
Find all open issues with in progress development work with linked:pr.