[Inference ] Integrate chat template in llm-on-ray #199

minmingzhu · 2024-04-22T01:42:44Z

No description provided.

KepingYan · 2024-04-30T07:07:40Z

llm_on_ray/inference/inference_config.py

+    default_chat_template: str = (
+        "Below is an instruction that describes a task. Write a response that appropriately completes the request."
+        "{% if messages[0]['role'] == 'system' %}"
+        "{% set loop_messages = messages[1:] %}"
+        "{% set system_message = messages[0]['content'] %}"
+        "{% else %}{% set loop_messages = messages %}"
+        "{% set system_message = false %}{% endif %}"
+        "{% for message in loop_messages %}"
+        "{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}"
+        "{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}"
+        "{% endif %}"
+        "{% if message['role'] == 'user' %}"
+        "{{ '### Instruction: ' + message['content'].strip() }}"
+        "{% elif message['role'] == 'assistant' %}"
+        "{{ '### Response:'  + message['content'].strip() }}"
+        "{% endif %}{% endfor %}"
+        "{% if add_generation_prompt %}{{'### Response:\n'}}{% endif %}"
+    )
+


Do we need to set a default chat template? Is it better to load the model's default chat template? If a model does not have chat template, we also set it to None by default.

Our priority order: user configured chat_template > model's chat_template > our default template.

KepingYan · 2024-04-30T07:10:00Z

llm_on_ray/inference/models/fuyu8b.yaml

-  config:
-    use_auth_token: ''
+  chat_model_with_image: true
+  chat_template: "{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% set system_message = false %}{% endif %}{% for message in loop_messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if loop.index0 == 0 and system_message != false %}{% set content = '<<SYS>>\\n' + system_message + '\\n<</SYS>>\\n\\n' + message['content'] %}{% else %}{% set content = message['content'] %}{% endif %}{% if message['role'] == 'user' %}{{ bos_token + '[INST] ' + content.strip() + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ ' '  + content.strip() + ' ' + eos_token }}{% endif %}{% endfor %}"


I'm not sure if this format is difficult for users to understand and configure. Is there a way to simplify this setup?

I think I will create a new PR later and add Jinja usage instructions to the doc.

KepingYan · 2024-04-30T07:12:24Z

llm_on_ray/inference/models/gpt-j-6b.yaml

-  prompt:
-    intro: 'Below is an instruction that describes a task. Write a response that appropriately
-      completes the request.
-
-      '
-    human_id: '
-
-      ### Instruction'
-    bot_id: '
-
-      ### Response'
-    stop_words: []


Does gpt-j-6b have a default chat template in transformers? If not, please add chat_template key for it. This is also required for other models' configuration file.

Our default chat template is in gpt-j-6b format.

llm_on_ray/inference/predictor_deployment.py

KepingYan · 2024-04-30T08:43:24Z

I also want to confirm whether the required changes in UI will be implemented in a new PR. It seems that the UI cannot run successfully now.

minmingzhu · 2024-05-06T01:49:34Z

I also want to confirm whether the required changes in UI will be implemented in a new PR. It seems that the UI cannot run successfully now.

Yes, it has been achieved. I'll check the Web UI PR for CI issues.

carsonwang

Thank you @minmingzhu , this looks much clear now. I left a few comments.

llm_on_ray/inference/chat_template_process.py

.github/workflows/config/gpt2-ci.yaml

llm_on_ray/inference/inference_config.py

llm_on_ray/inference/models/neural-chat-7b-v3-1.yaml

llm_on_ray/inference/inference_config.py

llm_on_ray/inference/models/CodeLlama-7b-hf.yaml

tests/inference/test_chat_template.py

xwu99 · 2024-05-14T01:48:49Z

llm_on_ray/inference/models/CodeLlama-7b-hf.yaml

-    human_id: ''
-    bot_id: ''
-    stop_words: []
+  chat_template: "llm_on_ray/common/templates/template_codellama.jinja"


We may need to use a relative path the current yaml file, otherwise it's hard for user to specify the file since they may put the template files elsewhere.

xwu99 · 2024-05-14T01:59:34Z

tests/test_getting_started.sh

@@ -33,7 +33,7 @@ curl $ENDPOINT_URL/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
    "model": "gpt2",
-    "messages": [{"role": "assistant", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello!"}],
+    "messages": [{"role": "user", "content": "Hello!"}],


Are the previous messages a chat sequence? why changed?

Chat model must alternate user/assistant/user/assistant/...

@KepingYan could you check if this is the correct sequence? I am not sure why we need to change it.

llm_on_ray/common/templates/template_codellama.jinja

llm_on_ray/inference/chat_template_process.py

xwu99 · 2024-05-14T02:12:33Z

llm_on_ray/inference/utils.py

@@ -194,3 +195,15 @@ def module_import_and_init(module_name, clazz, **clazzs_kwargs):
    module = importlib.import_module(module_name)
    class_ = getattr(module, clazz)
    return class_(**clazzs_kwargs)
+
+
+def parse_jinja_file(chat_template: str):


should be Optional[str] as you are checking None below, right？ And should it return something if chat_template is None ?

Our priority order for chat templates is as follows: user-configured template takes precedence over the model's template, which in turn takes precedence over our default template. Therefore, if the user-configured template is not provided (None), the default template must have a value.

self.predictor.tokenizer.chat_template = (
parse_jinja_file(self.predictor.infer_conf.model_description.chat_template)
or self.predictor.tokenizer.chat_template
or parse_jinja_file(self.predictor.infer_conf.model_description.default_chat_template)
)

xwu99 · 2024-05-14T02:16:59Z

llm_on_ray/inference/chat_template_process.py

+            or parse_jinja_file(self.predictor.infer_conf.model_description.default_chat_template)
+        )
+
+        if input and isinstance(input[0], (ChatMessage, dict)):


what does dict represent here? is it different from ChatMessage?

ChatMeassge is used by OpenAI, dict is used by simple

I see, could you add a comment for that

llm_on_ray/inference/models/template/inference_config_template.yaml