Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Inference ] Integrate chat template in llm-on-ray #199

Merged
merged 46 commits into from May 16, 2024

Conversation

minmingzhu
Copy link
Collaborator

No description provided.

@minmingzhu minmingzhu force-pushed the inference_chat_template branch 4 times, most recently from 41af76c to cc356f6 Compare April 28, 2024 05:50
Comment on lines 117 to 125
default_chat_template: str = (
"Below is an instruction that describes a task. Write a response that appropriately completes the request."
"{% if messages[0]['role'] == 'system' %}"
"{% set loop_messages = messages[1:] %}"
"{% set system_message = messages[0]['content'] %}"
"{% else %}{% set loop_messages = messages %}"
"{% set system_message = false %}{% endif %}"
"{% for message in loop_messages %}"
"{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}"
"{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}"
"{% endif %}"
"{% if message['role'] == 'user' %}"
"{{ '### Instruction: ' + message['content'].strip() }}"
"{% elif message['role'] == 'assistant' %}"
"{{ '### Response:' + message['content'].strip() }}"
"{% endif %}{% endfor %}"
"{% if add_generation_prompt %}{{'### Response:\n'}}{% endif %}"
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to set a default chat template? Is it better to load the model's default chat template? If a model does not have chat template, we also set it to None by default.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our priority order: user configured chat_template > model's chat_template > our default template.

config:
use_auth_token: ''
chat_model_with_image: true
chat_template: "{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% set system_message = false %}{% endif %}{% for message in loop_messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if loop.index0 == 0 and system_message != false %}{% set content = '<<SYS>>\\n' + system_message + '\\n<</SYS>>\\n\\n' + message['content'] %}{% else %}{% set content = message['content'] %}{% endif %}{% if message['role'] == 'user' %}{{ bos_token + '[INST] ' + content.strip() + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ ' ' + content.strip() + ' ' + eos_token }}{% endif %}{% endfor %}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this format is difficult for users to understand and configure. Is there a way to simplify this setup?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I will create a new PR later and add Jinja usage instructions to the doc.

Comment on lines -19 to -30
prompt:
intro: 'Below is an instruction that describes a task. Write a response that appropriately
completes the request.

'
human_id: '

### Instruction'
bot_id: '

### Response'
stop_words: []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does gpt-j-6b have a default chat template in transformers? If not, please add chat_template key for it. This is also required for other models' configuration file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our default chat template is in gpt-j-6b format.

llm_on_ray/inference/predictor_deployment.py Outdated Show resolved Hide resolved
@KepingYan
Copy link
Contributor

I also want to confirm whether the required changes in UI will be implemented in a new PR. It seems that the UI cannot run successfully now.

@minmingzhu
Copy link
Collaborator Author

I also want to confirm whether the required changes in UI will be implemented in a new PR. It seems that the UI cannot run successfully now.

Yes, it has been achieved. I'll check the Web UI PR for CI issues.

Copy link
Contributor

@carsonwang carsonwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @minmingzhu , this looks much clear now. I left a few comments.

llm_on_ray/inference/chat_template_process.py Outdated Show resolved Hide resolved
llm_on_ray/inference/chat_template_process.py Outdated Show resolved Hide resolved
llm_on_ray/inference/chat_template_process.py Outdated Show resolved Hide resolved
llm_on_ray/inference/chat_template_process.py Outdated Show resolved Hide resolved
.github/workflows/config/gpt2-ci.yaml Outdated Show resolved Hide resolved
llm_on_ray/inference/inference_config.py Outdated Show resolved Hide resolved
llm_on_ray/inference/models/neural-chat-7b-v3-1.yaml Outdated Show resolved Hide resolved
human_id: ''
bot_id: ''
stop_words: []
chat_template: "llm_on_ray/common/templates/template_codellama.jinja"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to use a relative path the current yaml file, otherwise it's hard for user to specify the file since they may put the template files elsewhere.

@@ -33,7 +33,7 @@ curl $ENDPOINT_URL/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt2",
"messages": [{"role": "assistant", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello!"}],
"messages": [{"role": "user", "content": "Hello!"}],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the previous messages a chat sequence? why changed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chat model must alternate user/assistant/user/assistant/...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KepingYan could you check if this is the correct sequence? I am not sure why we need to change it.

@@ -194,3 +195,15 @@ def module_import_and_init(module_name, clazz, **clazzs_kwargs):
module = importlib.import_module(module_name)
class_ = getattr(module, clazz)
return class_(**clazzs_kwargs)


def parse_jinja_file(chat_template: str):
Copy link
Contributor

@xwu99 xwu99 May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be Optional[str] as you are checking None below, right? And should it return something if chat_template is None ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our priority order for chat templates is as follows: user-configured template takes precedence over the model's template, which in turn takes precedence over our default template. Therefore, if the user-configured template is not provided (None), the default template must have a value.

self.predictor.tokenizer.chat_template = (
parse_jinja_file(self.predictor.infer_conf.model_description.chat_template)
or self.predictor.tokenizer.chat_template
or parse_jinja_file(self.predictor.infer_conf.model_description.default_chat_template)
)

or parse_jinja_file(self.predictor.infer_conf.model_description.default_chat_template)
)

if input and isinstance(input[0], (ChatMessage, dict)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does dict represent here? is it different from ChatMessage?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ChatMeassge is used by OpenAI, dict is used by simple

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, could you add a comment for that

minmingzhu and others added 9 commits May 14, 2024 21:13
Signed-off-by: minmingzhu <minming.zhu@intel.com>
Signed-off-by: minmingzhu <minming.zhu@intel.com>
Signed-off-by: minmingzhu <minming.zhu@intel.com>
Signed-off-by: minmingzhu <minming.zhu@intel.com>
Signed-off-by: minmingzhu <minming.zhu@intel.com>
Signed-off-by: minmingzhu <minming.zhu@intel.com>
Signed-off-by: minmingzhu <minming.zhu@intel.com>
Signed-off-by: minmingzhu <minming.zhu@intel.com>
minmingzhu and others added 19 commits May 14, 2024 21:13
2. add chat template unit test
3. fix comments

Signed-off-by: minmingzhu <minming.zhu@intel.com>
Signed-off-by: minmingzhu <minming.zhu@intel.com>
Signed-off-by: minmingzhu <minming.zhu@intel.com>
Signed-off-by: minmingzhu <minming.zhu@intel.com>
Signed-off-by: minmingzhu <minming.zhu@intel.com>
Signed-off-by: minmingzhu <minming.zhu@intel.com>
Signed-off-by: minmingzhu <minming.zhu@intel.com>
* Update VLLM installation script and documentation

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

* nit

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

* Update vLLM installation message

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

* Update installation instructions for vLLM CPU

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

* Update Dockerfile.vllm

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

* Update VLLM version to 0.4.1

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

* update doc

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

* nit

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

* nit

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

---------

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
* docker2sh test

* codepath

* codepath

* codepath

* add

* add

* add

* add

* add

* add

* df

* docker.sh

* docker bash

* docker bash

* docker bash

* docker bash

* inference docker bash

* merge main0312

* merge main0312

* merge main0312

* test set-e

* fix test

* fix

* fix

* fix

* test error

* test error

* add map

* test install error

* test install error

* test install error

* test install error

* test

* test

* fix

* fix

* fix

* only inference

* fux

* fux

* fux

* target

* target

* target

* fix proxy

* fix proxy

* fix proxy

* fix proxy

* fix proxy

* fix proxy

* fix proxy

* fix fuc

* fix fuc

* fix fuc

* all inference

* add finetune

* fix

* fix

* fix

* fix

* fix finetune

* fix finetune

* fix review

* fix review

* fix review

* add info output

* Update proxy settings and Docker configurations

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

* fix vllm pr212

* fix

* fix

* change name

---------

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Co-authored-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: minmingzhu <minming.zhu@intel.com>
Signed-off-by: minmingzhu <minming.zhu@intel.com>
* add llama-2-70b

* nit

* fix vllm inference ci

* Revert "fix vllm inference ci"

This reverts commit 36062bd.
2. fix comments

Signed-off-by: minmingzhu <minming.zhu@intel.com>
Signed-off-by: minmingzhu <minming.zhu@intel.com>
Signed-off-by: minmingzhu <minming.zhu@intel.com>
Signed-off-by: minmingzhu <minming.zhu@intel.com>
minmingzhu and others added 7 commits May 14, 2024 21:18
Signed-off-by: minmingzhu <45281494+minmingzhu@users.noreply.github.com>
Signed-off-by: minmingzhu <minming.zhu@intel.com>
Signed-off-by: minmingzhu <minming.zhu@intel.com>
Signed-off-by: minmingzhu <minming.zhu@intel.com>
Signed-off-by: minmingzhu <minming.zhu@intel.com>
Signed-off-by: minmingzhu <minming.zhu@intel.com>
Signed-off-by: minmingzhu <minming.zhu@intel.com>
Copy link
Contributor

@carsonwang carsonwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the work.

@carsonwang carsonwang changed the title [Inference ] Integrate chat template on llm-n-ray [Inference ] Integrate chat template in llm-on-ray May 16, 2024
@carsonwang carsonwang merged commit 620800f into intel:main May 16, 2024
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants