[NeuralChat] Add Multi-Socket LLM Inference Example #1073

letonghan · 2023-12-25T08:38:21Z

Type of Change

Add NeuralChat example
API not changed

Description

Add Multi-Socket LLM inference example for NeuralChat.
Related DeepSpeed PR: microsoft/DeepSpeed#4750 (not merged yet)

Expected Behavior & Potential Risk

Custormers are able to run LLM inference using multi-socket with DeepSpeed following this example.

How has this PR been tested?

Local tested on SPR server.

Dependency Change?

no.

Signed-off-by: LetongHan <letong.han@intel.com>

letonghan added 2 commits December 25, 2023 15:58

add multi_host folder for Xeon code_gen

e3d2a49

Signed-off-by: LetongHan <letong.han@intel.com>

add multi_socket example for codegen

a84c4ff

Signed-off-by: LetongHan <letong.han@intel.com>

letonghan requested a review from lvliang-intel as a code owner December 25, 2023 08:38

letonghan added the draft label Dec 25, 2023

mengfei25 pushed a commit to mengfei25/intel-extension-for-transformers that referenced this pull request Dec 27, 2023

Chat index (intel#1073)

28ff3b2

Merge branch 'main' into letong/multi_socket

e28cf29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NeuralChat] Add Multi-Socket LLM Inference Example #1073

[NeuralChat] Add Multi-Socket LLM Inference Example #1073

letonghan commented Dec 25, 2023 •

edited

[NeuralChat] Add Multi-Socket LLM Inference Example #1073

Are you sure you want to change the base?

[NeuralChat] Add Multi-Socket LLM Inference Example #1073

Conversation

letonghan commented Dec 25, 2023 • edited

Type of Change

Description

Expected Behavior & Potential Risk

How has this PR been tested?

Dependency Change?

letonghan commented Dec 25, 2023 •

edited