Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Openai API not allow temperature=0.0 for llama-2-7b-chat-hf #139

Open
yutianchen666 opened this issue Mar 12, 2024 · 1 comment
Open

Openai API not allow temperature=0.0 for llama-2-7b-chat-hf #139

yutianchen666 opened this issue Mar 12, 2024 · 1 comment

Comments

@yutianchen666
Copy link
Collaborator

When running the llama-2-7b-chat-hf model with openai api for gsm8k(Mathematical Ability Test), it needs to set temperature=0.0

But I get unexpected error like

lm_eval --model local-chat-completions --model_args model=llama-2-7b-chat-hf,base_url=http://localhost:8000/v1 --task gsm8k
2024-03-12:16:09:56,344 INFO [main.py:225] Verbosity set to INFO
2024-03-12:16:09:56,344 INFO [init.py:373] lm_eval.tasks.initialize_tasks() is deprecated and no longer necessary. It will be removed in v0.4.2 release. TaskManager will instead be used.
2024-03-12:16:10:01,070 INFO [main.py:311] Selected Tasks: ['gsm8k']
2024-03-12:16:10:01,070 INFO [main.py:312] Loading selected tasks...
2024-03-12:16:10:01,075 INFO [evaluator.py:129] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234
2024-03-12:16:10:01,419 INFO [evaluator.py:190] get_task_dict has been updated to accept an optional argument, task_managerRead more here:https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/interface.md#external-library-usage
2024-03-12:16:10:17,655 INFO [task.py:395] Building contexts for gsm8k on rank 0...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1319/1319 [00:06<00:00, 192.73it/s]
2024-03-12:16:10:24,524 INFO [evaluator.py:357] Running generate_until requests
0%| 2024-03-12:16:11:08,170 INFO [_client.py:1026] HTTP Request: POST http://localhost:8000/v1/chat/completions "HTTP/1.1 500 Internal Server Error"
2024-03-12:16:11:08,171 INFO [_base_client.py:952] Retrying request to /chat/completions in 0.788895 seconds
2024-03-12:16:11:09,010 INFO [_client.py:1026] HTTP Request: POST http://localhost:8000/v1/chat/completions "HTTP/1.1 500 Internal Server Error"
2024-03-12:16:11:09,011 INFO [_base_client.py:952] Retrying request to /chat/completions in 1.621023 seconds
2024-03-12:16:11:10,683 INFO [_client.py:1026] HTTP Request: POST http://localhost:8000/v1/chat/completions "HTTP/1.1 500 Internal Server Error"
Traceback (most recent call last):
File "/home/yutianchen/Project/lm-evaluation-harness/lm_eval/models/utils.py", line 333, in wrapper
return func(*args, **kwargs)
File "/home/yutianchen/Project/lm-evaluation-harness/lm_eval/models/openai_completions.py", line 75, in completion
return client.chat.completions.create(**kwargs)
File "/home/yutianchen/anaconda3/envs/llm-eval/lib/python3.9/site-packages/openai/_utils/_utils.py", line 303, in wrapper
return func(*args, **kwargs)
File "/home/yutianchen/anaconda3/envs/llm-eval/lib/python3.9/site-packages/openai/resources/chat/completions.py", line 598, in create
return self._post(
File "/home/yutianchen/anaconda3/envs/llm-eval/lib/python3.9/site-packages/openai/_base_client.py", line 1088, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
File "/home/yutianchen/anaconda3/envs/llm-eval/lib/python3.9/site-packages/openai/_base_client.py", line 853, in request
return self._request(
File "/home/yutianchen/anaconda3/envs/llm-eval/lib/python3.9/site-packages/openai/_base_client.py", line 916, in _request
return self._retry_request(
File "/home/yutianchen/anaconda3/envs/llm-eval/lib/python3.9/site-packages/openai/_base_client.py", line 958, in _retry_request
return self._request(
File "/home/yutianchen/anaconda3/envs/llm-eval/lib/python3.9/site-packages/openai/_base_client.py", line 916, in _request
return self._retry_request(
File "/home/yutianchen/anaconda3/envs/llm-eval/lib/python3.9/site-packages/openai/_base_client.py", line 958, in _retry_request
return self._request(
File "/home/yutianchen/anaconda3/envs/llm-eval/lib/python3.9/site-packages/openai/_base_client.py", line 930, in _request
raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Error code: 500 - {'generated_text': None, 'num_input_tokens': None, 'num_input_tokens_batch': None, 'num_generated_tokens': None, 'num_generated_tokens_batch': None, 'preprocessing_time': None, 'generation_time': None, 'timestamp': 1710259870.67887, 'finish_reason': None, 'error': {'object': 'error', 'message': 'Internal Server Error', 'internal_message': 'Internal Server Error', 'type': 'InternalServerError', 'param': {}, 'code': 500}}

The error is similar when testing temperature=0.0 using llm-on-ray query_openai_sdk.py --model_name llama-2-7b-chat-hf --temperature 0.0

python examples/inference/api_server_openai/query_openai_sdk.py --model_name llama-2-7b-chat-hf --temperature 0.0
Traceback (most recent call last):
File "/home/yutianchen/Project/latest_lib/llm-on-ray/examples/inference/api_server_openai/query_openai_sdk.py", line 98, in
for i in chunk_chat():
File "/home/yutianchen/Project/latest_lib/llm-on-ray/examples/inference/api_server_openai/query_openai_sdk.py", line 75, in chunk_chat
output = client.chat.completions.create(
File "/home/yutianchen/anaconda3/envs/llm-on-ray/lib/python3.9/site-packages/openai/_utils/_utils.py", line 275, in wrapper
return func(*args, **kwargs)
File "/home/yutianchen/anaconda3/envs/llm-on-ray/lib/python3.9/site-packages/openai/resources/chat/completions.py", line 663, in create
return self._post(
File "/home/yutianchen/anaconda3/envs/llm-on-ray/lib/python3.9/site-packages/openai/_base_client.py", line 1200, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
File "/home/yutianchen/anaconda3/envs/llm-on-ray/lib/python3.9/site-packages/openai/_base_client.py", line 889, in request
return self._request(
File "/home/yutianchen/anaconda3/envs/llm-on-ray/lib/python3.9/site-packages/openai/_base_client.py", line 965, in _request
return self._retry_request(
File "/home/yutianchen/anaconda3/envs/llm-on-ray/lib/python3.9/site-packages/openai/_base_client.py", line 1013, in _retry_request
return self._request(
File "/home/yutianchen/anaconda3/envs/llm-on-ray/lib/python3.9/site-packages/openai/_base_client.py", line 965, in _request
return self._retry_request(
File "/home/yutianchen/anaconda3/envs/llm-on-ray/lib/python3.9/site-packages/openai/_base_client.py", line 1013, in _retry_request
return self._request(
File "/home/yutianchen/anaconda3/envs/llm-on-ray/lib/python3.9/site-packages/openai/_base_client.py", line 980, in _request
raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Error code: 500 - {'generated_text': None, 'num_input_tokens': None, 'num_input_tokens_batch': None, 'num_generated_tokens': None, 'num_generated_tokens_batch': None, 'preprocessing_time': None, 'generation_time': None, 'timestamp': 1710260245.4014304, 'finish_reason': None, 'error': {'object': 'error', 'message': 'Internal Server Error', 'internal_message': 'Internal Server Error', 'type': 'InternalServerError', 'param': {}, 'code': 500}}

Both LLaMA https://github.com/facebookresearch/llama/issues/687 and Transformers https://github.com/huggingface/transformers/pull/25722 officials suggest to ”set do_sample = False in case temperature = 0“
image

But openai api’s
client.chat.completions.create(**kwargs)
It does not support the do_sample parameter, and there is no suitable args to solve the problem of temperature=0.0

@KepingYan
Copy link
Contributor

Please also paste the log of server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants