Integrate web UI with chat template #205

minmingzhu · 2024-04-26T01:35:25Z

No description provided.

Signed-off-by: minmingzhu <minming.zhu@intel.com>

…hu/llm-on-ray into inference_chat_template

Signed-off-by: minmingzhu <minming.zhu@intel.com>

2. modify chat template Signed-off-by: minmingzhu <minming.zhu@intel.com>

Signed-off-by: minmingzhu <minming.zhu@intel.com>

2. add unit test Signed-off-by: minmingzhu <minming.zhu@intel.com>

Signed-off-by: minmingzhu <minming.zhu@intel.com>

* update * fix blocking * update Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * update Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * fix setup and getting started Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * update Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * update Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * nit Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * Add dependencies for tests and update pyproject.toml Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * Update dependencies and test workflow Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * Update dependencies and fix torch_dist.py Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * Update OpenAI SDK installation and start ray cluster Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> --------- Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

* single test * single test * single test * single test * fix hang error

Signed-off-by: minmingzhu <minming.zhu@intel.com>

* use base model mpt-7b instead of mpt-7b-chat Signed-off-by: minmingzhu <minming.zhu@intel.com> * manual setting specify tokenizer Signed-off-by: minmingzhu <minming.zhu@intel.com> * update Signed-off-by: minmingzhu <minming.zhu@intel.com> * update doc/finetune_parameters.md Signed-off-by: minmingzhu <minming.zhu@intel.com> --------- Signed-off-by: minmingzhu <minming.zhu@intel.com>

Signed-off-by: minmingzhu <minming.zhu@intel.com>

xwu99 · 2024-05-10T01:38:15Z

llm_on_ray/inference/models/CodeLlama-7b-hf.yaml

@@ -6,16 +6,11 @@ cpus_per_worker: 24
 gpus_per_worker: 0
 deepspeed: false
 workers_per_group: 2
-device: cpu
+device: "cpu"


There is no need to add extra " to yaml. Is it needed to touch this part for your PR?

xwu99 · 2024-05-10T01:38:48Z

llm_on_ray/inference/models/gpt2.yaml

@@ -6,17 +6,12 @@ cpus_per_worker: 24
 gpus_per_worker: 0
 deepspeed: false
 workers_per_group: 2
-device: cpu
+device: CPU


pay attention to use lowercase device for consistency

xwu99 · 2024-05-10T01:40:10Z

llm_on_ray/inference/models/bloom-560m.yaml

@@ -6,16 +6,10 @@ cpus_per_worker: 24
 gpus_per_worker: 0
 deepspeed: false
 workers_per_group: 2
-device: cpu
+device: CPU


why change the device name to capital case?

xwu99 · 2024-05-10T01:44:09Z

docs/finetune_parameters.md

@@ -15,6 +15,7 @@ The following are the parameters supported in the finetuning workflow.
 |lora_config|task_type: CAUSAL_LM<br>r: 8<br>lora_alpha: 32<br>lora_dropout: 0.1|Will be passed to the LoraConfig `__init__()` method, then it'll be used as config to build Peft model object.|
 |deltatuner_config|"algo": "lora"<br>"denas": True<br>"best_model_structure": "/path/to/best_structure_of_deltatuner_model"|Will be passed to the DeltaTunerArguments `__init__()` method, then it'll be used as config to build [Deltatuner model](https://github.com/intel/e2eAIOK/tree/main/e2eAIOK/deltatuner) object.|
 |enable_gradient_checkpointing|False|enable gradient checkpointing to save GPU memory, but will cost more compute runtime|
+|chat_template|None|User-defined chat template.|


Add description and link to the doc of huggingface otherwise user will not know what it is.

xwu99 · 2024-05-10T01:46:57Z

examples/inference/api_server_simple/query_single.py

-prompt = "Once upon a time,"
+# prompt = "Once upon a time,"
+prompt = [
+    {"role": "user", "content": "Which is bigger, the moon or the sun?"},


don't modify this as api_server_simple/query_single.py is for simple protocol. it's not formatted like this. focus on openapi support, don't need to support chat temple for simple protocol if need to change query format.

minmingzhu and others added 30 commits April 28, 2024 13:49

integrate inference chat template

94df92c

Signed-off-by: minmingzhu <minming.zhu@intel.com>

update

f847569

Signed-off-by: minmingzhu <minming.zhu@intel.com>

update

0df70f1

Signed-off-by: minmingzhu <minming.zhu@intel.com>

update

6534808

Signed-off-by: minmingzhu <minming.zhu@intel.com>

update

5a864dc

Signed-off-by: minmingzhu <minming.zhu@intel.com>

update

e06105e

Signed-off-by: minmingzhu <minming.zhu@intel.com>

Update query_http_requests.py

9a11e52

update

02ee02d

Signed-off-by: minmingzhu <minming.zhu@intel.com>

update

5d11e45

Signed-off-by: minmingzhu <minming.zhu@intel.com>

update

62ab1bf

update

cc356f6

update

11718e8

update yaml file

d254f26

update yaml

94f061a

format yaml

06c6579

update

c5766a1

Update mpt_deltatuner.yaml

dad4224

Update neural-chat-7b-v3-1.yaml

f28f4cd

update

eec2124

Merge branch 'inference_chat_template' of https://github.com/minmingz…

f94e8bb

…hu/llm-on-ray into inference_chat_template

Update predictor_deployment.py

419aea3

implement fine-tuning chat template function

dc6bb3b

Signed-off-by: minmingzhu <minming.zhu@intel.com>

update

22b0ae5

Signed-off-by: minmingzhu <minming.zhu@intel.com>

update

1768e2a

Signed-off-by: minmingzhu <minming.zhu@intel.com>

update

2f256e5

Signed-off-by: minmingzhu <minming.zhu@intel.com>

integrate gbt for transformer 4.26.0

0e5aca8

Signed-off-by: minmingzhu <minming.zhu@intel.com>

update

df9e84e

Signed-off-by: minmingzhu <minming.zhu@intel.com>

update

0a60379

Signed-off-by: minmingzhu <minming.zhu@intel.com>

1. remove is_base_model tag

b242993

2. modify chat template Signed-off-by: minmingzhu <minming.zhu@intel.com>

update

5afd158

Signed-off-by: minmingzhu <minming.zhu@intel.com>

minmingzhu and others added 17 commits May 6, 2024 10:37

1. update doc/finetune_parameters.md

bbf7925

2. add unit test Signed-off-by: minmingzhu <minming.zhu@intel.com>

update

c026adf

Signed-off-by: minmingzhu <minming.zhu@intel.com>

[Tests] Add query single test (intel#156)

63d2ef8

* single test * single test * single test * single test * fix hang error

format

05d63ef

Signed-off-by: minmingzhu <minming.zhu@intel.com>

fix license issues

42ecf63

Signed-off-by: minmingzhu <minming.zhu@intel.com>

Update finetune.yaml

85520e9

integrate inference chat template

968e616

Signed-off-by: minmingzhu <minming.zhu@intel.com>

update

43c333f

Signed-off-by: minmingzhu <minming.zhu@intel.com>

update

b5b7f28

Signed-off-by: minmingzhu <minming.zhu@intel.com>

update

9500d96

Signed-off-by: minmingzhu <minming.zhu@intel.com>

update

0c41b8b

Signed-off-by: minmingzhu <minming.zhu@intel.com>

Integrate Web UI

0ff3d0b

Signed-off-by: minmingzhu <minming.zhu@intel.com>

update

0ec9205

Signed-off-by: minmingzhu <minming.zhu@intel.com>

update

a328494

Signed-off-by: minmingzhu <minming.zhu@intel.com>

update

a8e7b38

minmingzhu force-pushed the Integrate_web_ui branch from 8698a17 to a8e7b38 Compare May 6, 2024 03:01

update

cbae213

xwu99 reviewed May 10, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate web UI with chat template #205

Integrate web UI with chat template #205

minmingzhu commented Apr 26, 2024

xwu99 May 10, 2024 •

edited

xwu99 May 10, 2024

xwu99 May 10, 2024

xwu99 May 10, 2024

xwu99 May 10, 2024

Integrate web UI with chat template #205

Are you sure you want to change the base?

Integrate web UI with chat template #205

Conversation

minmingzhu commented Apr 26, 2024

xwu99 May 10, 2024 • edited

Choose a reason for hiding this comment

xwu99 May 10, 2024

Choose a reason for hiding this comment

xwu99 May 10, 2024

Choose a reason for hiding this comment

xwu99 May 10, 2024

Choose a reason for hiding this comment

xwu99 May 10, 2024

Choose a reason for hiding this comment

xwu99 May 10, 2024 •

edited