[Framework] Continuous Batching Support #357

pujiang2018 · 2024-04-29T07:16:50Z

No description provided.

…>DecoderLayer->Attention/MLP) (#375)

* [Common] Add sequenceMeta, sequenceGroup and sequenecePool. (#343) * merge batchSize and seqLen into one in TokenEembedding * merge batchSize and seqLen into one in TokenEembedding (#350) * [Common] Move Martix into xft namespace. (#351) * remove unsed function in DecoderLayer * [Layer] Remove unused functions in Decoder layer (#353) * fix compile error of embeddingForward * [Model] Fix compile error of embeddingForward in YaRNLlama (#358) * [Common] Add sampling params into group seq. (#356) * remove DecoderContext in computeSoftmax * [Util] Remove DecoderContext in computeSoftmax (#362) * [Common] Refactor sequence.h. (#363) * [kernels] refactor flash attention for continuous batching (#361) * [models] Add attnMeta for continuous batching (#364) * [Layers] fix build error (#365) * [Model] add interface for seq meta. (#366) * refactor resize function in DecoderContext to support CB, and qkScores member removed * [Common] Modify resize() in DecoderContext to support (#367) * add some code to CommonDecoder::forward() * SequenceMeta refactor * [Model] New CommonDecoder::forward impl. skeleton (#369) * new KVCacheMgr supporting CB * fix typo & set default prefixId to -1 in addSequence() * [Common] New KVCacheMgr to support CB (#371) * [Sampling] Add repetition penalty for new seq type. (#373) * New foward to support CB (CommonDecoder->DecoderBlock->DecoderLayer->Attention/MLP) * add todo * [Sampling] Add greedy search for cb path. (#376) * logic issue fix * code fix to make new forward work * add maxSeqLen limitation * cross attention impl. for CB * DecoderContext::resize fix * correct the output of the new forward * add cb_check * fix incorrect buffer size calculation * 2 sequences -> 3 sequences * better method to prepare KV cache --------- Co-authored-by: Changqing Li <changqing.li@intel.com> Co-authored-by: Duyi-Wang <duyi.wang@intel.com> Co-authored-by: Meng,Chen <chen.meng@intel.com>

pujiang2018 marked this pull request as draft April 29, 2024 07:17

pujiang2018 changed the title ~~[MOCK PR] Continuous batching check~~ [MOCK PR] Continuous Batching Check Apr 29, 2024

Duyi-Wang added the enhancement New feature or request label Apr 29, 2024

Duyi-Wang force-pushed the cb_dev branch from 9377f83 to 4a90500 Compare May 9, 2024 00:57

Duyi-Wang added the continuous batching continuous batching label May 9, 2024

changqi1 and others added 25 commits May 15, 2024 08:45

[Common] Add sequenceMeta, sequenceGroup and sequenecePool. (#343)

1cc9c1d

merge batchSize and seqLen into one in TokenEembedding (#350)

d01be1a

[Common] Move Martix into xft namespace. (#351)

db0c4e9

[Layer] Remove unused functions in Decoder layer (#353)

a4f4b25

[Model] Fix compile error of embeddingForward in YaRNLlama (#358)

45bcfa3

[Common] Add sampling params into group seq. (#356)

a704873

[Util] Remove DecoderContext in computeSoftmax (#362)

987a874

[Common] Refactor sequence.h. (#363)

451ef21

[kernels] refactor flash attention for continuous batching (#361)

5e98e6d

[models] Add attnMeta for continuous batching (#364)

2b5e266

[Model] add interface for seq meta. (#366)

e12ffa8

[Common] Modify resize() in DecoderContext to support (#367)

a4442f0

[Model] New CommonDecoder::forward impl. skeleton (#369)

fb52594

[Common] New KVCacheMgr to support CB (#371)

aa48f7e

[Sampling] Add repetition penalty for new seq type. (#373)

3f15904

[Sampling] Add greedy search for cb path. (#376)

8c2e6b4

[Model/Layer] New forward to support CB (CommonDecoder->DecoderBlock-…

aac0167

…>DecoderLayer->Attention/MLP) (#375)

[Model] Return seqIDs when set input. (#377)

0e35c8f

[Framework] Code fix to make new path for CB work (#379)

f441906

[Layer] update mlp for CB. (#384)

f9bfb49

[Framework] Update set_input for cb. (#381)

3f232c5

[Layers] Added RotaryEmbedding forward for cb mode & Fixed rope uts (#…

6625b01

…383)

[Layer] Cross attention impl. for CB (#382)

f220fe0

[Build] Fix namespace build issue. (#388)

eb417af

[Common] DecoderContext::resize bug fix (#387)

b5bda0c

pujiang2018 and others added 9 commits May 15, 2024 08:57

[Model][Layer] Correct output of the new forward (#389)

35562c0

[Example] add cb_check example (#390)

e67e455

[Bug] Fix incorrect buffer size calculation (#391)

c576aff

[Example] Fix continuous batching C++ example. (#392)

eff6a75

[Example] More check in C++ continuous batching example (#393)

2b374ff

[Model] Check maxLen should be [input len, model max len]. (#394)

7e9d731

[Interface] Add python api for continuous batching. (#398)

3865654

[Example] Reactivate the old path.

af0aae8

Duyi-Wang force-pushed the cb_dev branch from 90d74f9 to af0aae8 Compare May 15, 2024 01:19

[Build] Fix build issue.

cef27bc

pujiang2018 marked this pull request as ready for review May 15, 2024 01:34

pujiang2018 changed the title ~~[MOCK PR] Continuous Batching Check~~ [Framework] Continuous Batching Support May 15, 2024

Duyi-Wang approved these changes May 15, 2024

View reviewed changes

pujiang2018 merged commit 7a113f2 into main May 15, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Framework] Continuous Batching Support #357

[Framework] Continuous Batching Support #357

pujiang2018 commented Apr 29, 2024

[Framework] Continuous Batching Support #357

[Framework] Continuous Batching Support #357

Conversation

pujiang2018 commented Apr 29, 2024