Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Framework] Continuous Batching Support #357

Merged
merged 35 commits into from May 15, 2024
Merged

[Framework] Continuous Batching Support #357

merged 35 commits into from May 15, 2024

Conversation

pujiang2018
Copy link
Contributor

No description provided.

@pujiang2018 pujiang2018 marked this pull request as draft April 29, 2024 07:17
@pujiang2018 pujiang2018 changed the title [MOCK PR] Continuous batching check [MOCK PR] Continuous Batching Check Apr 29, 2024
@Duyi-Wang Duyi-Wang added the enhancement New feature or request label Apr 29, 2024
@Duyi-Wang Duyi-Wang added the continuous batching continuous batching label May 9, 2024
changqi1 and others added 25 commits May 15, 2024 08:45
pujiang2018 and others added 9 commits May 15, 2024 08:57
* [Common] Add sequenceMeta, sequenceGroup and sequenecePool. (#343)

* merge batchSize and seqLen into one in TokenEembedding

* merge batchSize and seqLen into one in TokenEembedding (#350)

* [Common] Move Martix into xft namespace. (#351)

* remove unsed function in DecoderLayer

* [Layer] Remove unused functions in Decoder layer (#353)

* fix compile error of embeddingForward

* [Model] Fix compile error of embeddingForward in YaRNLlama (#358)

* [Common] Add sampling params into group seq. (#356)

* remove DecoderContext in computeSoftmax

* [Util] Remove DecoderContext in computeSoftmax (#362)

* [Common] Refactor sequence.h. (#363)

* [kernels] refactor flash attention for continuous batching (#361)

* [models] Add attnMeta for continuous batching (#364)

* [Layers] fix build error (#365)

* [Model] add interface for seq meta. (#366)

* refactor resize function in DecoderContext to support CB, and qkScores member removed

* [Common] Modify resize() in DecoderContext to support  (#367)

* add some code to CommonDecoder::forward()

* SequenceMeta refactor

* [Model] New CommonDecoder::forward impl. skeleton (#369)

* new KVCacheMgr supporting CB

* fix typo & set default prefixId to -1 in addSequence()

* [Common] New KVCacheMgr to support CB (#371)

* [Sampling] Add repetition penalty for new seq type. (#373)

* New foward to support CB (CommonDecoder->DecoderBlock->DecoderLayer->Attention/MLP)

* add todo

* [Sampling] Add greedy search for cb path. (#376)

* logic issue fix

* code fix to make new forward work

* add maxSeqLen limitation

* cross attention impl. for CB

* DecoderContext::resize fix

* correct the output of the new forward

* add cb_check

* fix incorrect buffer size calculation

* 2 sequences -> 3 sequences

* better method to prepare KV cache

---------

Co-authored-by: Changqing Li <changqing.li@intel.com>
Co-authored-by: Duyi-Wang <duyi.wang@intel.com>
Co-authored-by: Meng,Chen <chen.meng@intel.com>
@pujiang2018 pujiang2018 marked this pull request as ready for review May 15, 2024 01:34
@pujiang2018 pujiang2018 changed the title [MOCK PR] Continuous Batching Check [Framework] Continuous Batching Support May 15, 2024
@pujiang2018 pujiang2018 merged commit 7a113f2 into main May 15, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
continuous batching continuous batching enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants