-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Inference ] Integrate chat template in llm-on-ray #199
Conversation
41af76c
to
cc356f6
Compare
default_chat_template: str = ( | ||
"Below is an instruction that describes a task. Write a response that appropriately completes the request." | ||
"{% if messages[0]['role'] == 'system' %}" | ||
"{% set loop_messages = messages[1:] %}" | ||
"{% set system_message = messages[0]['content'] %}" | ||
"{% else %}{% set loop_messages = messages %}" | ||
"{% set system_message = false %}{% endif %}" | ||
"{% for message in loop_messages %}" | ||
"{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}" | ||
"{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}" | ||
"{% endif %}" | ||
"{% if message['role'] == 'user' %}" | ||
"{{ '### Instruction: ' + message['content'].strip() }}" | ||
"{% elif message['role'] == 'assistant' %}" | ||
"{{ '### Response:' + message['content'].strip() }}" | ||
"{% endif %}{% endfor %}" | ||
"{% if add_generation_prompt %}{{'### Response:\n'}}{% endif %}" | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to set a default chat template? Is it better to load the model's default chat template? If a model does not have chat template, we also set it to None by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our priority order: user configured chat_template > model's chat_template > our default template.
config: | ||
use_auth_token: '' | ||
chat_model_with_image: true | ||
chat_template: "{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% set system_message = false %}{% endif %}{% for message in loop_messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if loop.index0 == 0 and system_message != false %}{% set content = '<<SYS>>\\n' + system_message + '\\n<</SYS>>\\n\\n' + message['content'] %}{% else %}{% set content = message['content'] %}{% endif %}{% if message['role'] == 'user' %}{{ bos_token + '[INST] ' + content.strip() + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ ' ' + content.strip() + ' ' + eos_token }}{% endif %}{% endfor %}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this format is difficult for users to understand and configure. Is there a way to simplify this setup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I will create a new PR later and add Jinja usage instructions to the doc.
prompt: | ||
intro: 'Below is an instruction that describes a task. Write a response that appropriately | ||
completes the request. | ||
|
||
' | ||
human_id: ' | ||
|
||
### Instruction' | ||
bot_id: ' | ||
|
||
### Response' | ||
stop_words: [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does gpt-j-6b have a default chat template in transformers? If not, please add chat_template key for it. This is also required for other models' configuration file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our default chat template is in gpt-j-6b format.
I also want to confirm whether the required changes in UI will be implemented in a new PR. It seems that the UI cannot run successfully now. |
Yes, it has been achieved. I'll check the Web UI PR for CI issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @minmingzhu , this looks much clear now. I left a few comments.
1b9a0b5
to
685c5f0
Compare
7bc5fe5
to
591491b
Compare
human_id: '' | ||
bot_id: '' | ||
stop_words: [] | ||
chat_template: "llm_on_ray/common/templates/template_codellama.jinja" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need to use a relative path the current yaml file, otherwise it's hard for user to specify the file since they may put the template files elsewhere.
tests/test_getting_started.sh
Outdated
@@ -33,7 +33,7 @@ curl $ENDPOINT_URL/chat/completions \ | |||
-H "Content-Type: application/json" \ | |||
-d '{ | |||
"model": "gpt2", | |||
"messages": [{"role": "assistant", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello!"}], | |||
"messages": [{"role": "user", "content": "Hello!"}], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the previous messages a chat sequence? why changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chat model must alternate user/assistant/user/assistant/...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KepingYan could you check if this is the correct sequence? I am not sure why we need to change it.
llm_on_ray/inference/utils.py
Outdated
@@ -194,3 +195,15 @@ def module_import_and_init(module_name, clazz, **clazzs_kwargs): | |||
module = importlib.import_module(module_name) | |||
class_ = getattr(module, clazz) | |||
return class_(**clazzs_kwargs) | |||
|
|||
|
|||
def parse_jinja_file(chat_template: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be Optional[str] as you are checking None below, right? And should it return something if chat_template is None ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our priority order for chat templates is as follows: user-configured template takes precedence over the model's template, which in turn takes precedence over our default template. Therefore, if the user-configured template is not provided (None), the default template must have a value.
self.predictor.tokenizer.chat_template = (
parse_jinja_file(self.predictor.infer_conf.model_description.chat_template)
or self.predictor.tokenizer.chat_template
or parse_jinja_file(self.predictor.infer_conf.model_description.default_chat_template)
)
or parse_jinja_file(self.predictor.infer_conf.model_description.default_chat_template) | ||
) | ||
|
||
if input and isinstance(input[0], (ChatMessage, dict)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does dict represent here? is it different from ChatMessage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ChatMeassge is used by OpenAI, dict is used by simple
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, could you add a comment for that
Signed-off-by: minmingzhu <minming.zhu@intel.com>
2. add chat template unit test 3. fix comments Signed-off-by: minmingzhu <minming.zhu@intel.com>
Signed-off-by: minmingzhu <minming.zhu@intel.com>
* Update VLLM installation script and documentation Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * nit Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * Update vLLM installation message Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * Update installation instructions for vLLM CPU Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * Update Dockerfile.vllm Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * Update VLLM version to 0.4.1 Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * update doc Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * nit Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * nit Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> --------- Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
* docker2sh test * codepath * codepath * codepath * add * add * add * add * add * add * df * docker.sh * docker bash * docker bash * docker bash * docker bash * inference docker bash * merge main0312 * merge main0312 * merge main0312 * test set-e * fix test * fix * fix * fix * test error * test error * add map * test install error * test install error * test install error * test install error * test * test * fix * fix * fix * only inference * fux * fux * fux * target * target * target * fix proxy * fix proxy * fix proxy * fix proxy * fix proxy * fix proxy * fix proxy * fix fuc * fix fuc * fix fuc * all inference * add finetune * fix * fix * fix * fix * fix finetune * fix finetune * fix review * fix review * fix review * add info output * Update proxy settings and Docker configurations Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * fix vllm pr212 * fix * fix * change name --------- Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> Co-authored-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: minmingzhu <minming.zhu@intel.com>
Signed-off-by: minmingzhu <minming.zhu@intel.com>
* add llama-2-70b * nit * fix vllm inference ci * Revert "fix vllm inference ci" This reverts commit 36062bd.
Signed-off-by: minmingzhu <minming.zhu@intel.com>
Signed-off-by: minmingzhu <minming.zhu@intel.com>
c6ef59c
to
367d06a
Compare
Signed-off-by: minmingzhu <45281494+minmingzhu@users.noreply.github.com>
Signed-off-by: minmingzhu <minming.zhu@intel.com>
Signed-off-by: minmingzhu <minming.zhu@intel.com>
Signed-off-by: minmingzhu <minming.zhu@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the work.
No description provided.