New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: I have created a docker image of 0.2.0 and ran same model - neuralmagic/OpenHermes-2.5-Mistral-7B-marlin, it returns NULL #185
Comments
Hey @sungkim11 , thanks for reporting the issue. Can you share:
I just ran the following to install: python3 -m venv env
source env/bin/activate
pip install nm-vllm And then the following for inference, and it seemed to be okay: from vllm import LLM
model = LLM("neuralmagic/OpenHermes-2.5-Mistral-7B-marlin", max_model_len=4096)
output = model.generate("Hello my name is")
print(output[0].outputs[0].text)
# >> Marissa Cariaga, I am 18 years old. I |
Note: we also have a pre-made Docker image which should have everything you need. docker run
--gpus all \
--shm-size \
ghcr.io/neuralmagic/nm-vllm-openai:v0.2.0 --model neuralmagic/OpenHermes-2.5-Mistral-7B-marlin --max-model-len 4096 |
I did the following today:
Inference:
It works fine with 0.1.0 image, but gets bunch of blank line in 0.2.0 |
You can pull the docker image from docker hub -> sungkimmw/nm-vllm-openai:0.2.0 and sungkimmw/nm-vllm:latest (for 0.1.0). I need to delete the 0.1.0 |
I tried "ghcr.io/neuralmagic/nm-vllm-openai:v0.2.0 " and I am getting bunch of blank lines. |
Just tried each of these:
With this client: from openai import OpenAI
# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"
client = OpenAI(
# defaults to os.environ.get("OPENAI_API_KEY")
api_key=openai_api_key,
base_url=openai_api_base,
)
models = client.models.list()
model = models.data[0].id
chat_completion = client.chat.completions.create(
messages=[{
"role": "system",
"content": "You are a helpful assistant."
}, {
"role": "user",
"content": "Who won the world series in 2020?"
}, {
"role":
"assistant",
"content":
"The Los Angeles Dodgers won the World Series in 2020."
}, {
"role": "user",
"content": "Where was it played?"
}],
model=model,
)
print("Chat completion results:")
print(chat_completion) And got: ChatCompletion(id='cmpl-00e280aade9d47448768b2d903b9b04a', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='The 2020 World Series was played in Arlington, Texas. The games took place at Globe Life Field and Globe Life Park, both of which are located in Arlington. This was due to the COVID-19 pandemic, which led to the games being played at a neutral site in order to minimize travel and potential exposure to the virus.', role='assistant', function_call=None, tool_calls=None), stop_reason=None)], created=1712972481, model='neuralmagic/OpenHermes-2.5-Mistral-7B-marlin', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=75, prompt_tokens=67, total_tokens=142)) |
Could you provide the exact client code you are running? |
|
Thanks. Reproduced. Everything is working fine with the completions API, but not with the chat completions API. I believe I know what caused this issue, will work on resolving |
Thank you for working on this. |
No problem. Thank you for reporting it :) |
Okay, I tried with:
For whatever model and version of vllm that I used (upstream or downstream), I had issues when the following was included: response_format = {
"type": "json_object"
}, Whenever this was removed, things worked properly. JSON guided decoding is a relatively new feature in vLLM. I am going to dive in tomorrow to see if I can debug and let other maintainers know about the issue. |
I was wondering why it was not returning JSON as requested. |
The |
@sungkim11 I am working with the upstream maintainers to look into this |
Thank you! I was wondering why I am getting blanks from vLLM as well. This bug may be originated from there. |
Your current environment
馃悰 Describe the bug
I have created a docker image of 0.2.0 and ran the same model - neuralmagic/OpenHermes-2.5-Mistral-7B-marlin, it returns a series of blank spaces whereas 0.1.0 works fine.
The text was updated successfully, but these errors were encountered: