Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

h2o for kv cache compression #1468

Open
wants to merge 50 commits into
base: main
Choose a base branch
from
Open

h2o for kv cache compression #1468

wants to merge 50 commits into from

Conversation

n1ck-guo
Copy link
Collaborator

@n1ck-guo n1ck-guo commented Apr 10, 2024

Type of Change

feature

Description

H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
paper

NTD

  • example
  • refactor code to same style
  • add seq len api
  • support for more models, both sim and real
  • mean accumulate score function

Expected Behavior & Potential Risk

None

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

Signed-off-by: n1ck-guo <heng.guo@intel.com>
Copy link

github-actions bot commented Apr 10, 2024

⛈️ Required checks status: Has failure 🔴

Warning
If you do not have the access to re-run the CI-Summary bot, please contact VincyZhang for help. If you push a new commit, all of the workflow will be re-triggered.

Groups summary

🔴 Format Scan Tests workflow
Check ID Status Error details
format-scan (pylint) failure download
format-scan (bandit) success
format-scan (cloc) success
format-scan (cpplint) success

These checks are required after the changes to intel_extension_for_transformers/transformers/modeling/kv_cache_compression/__init__.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/h2o.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/models/__init__.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/models/modeling_bloom.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/models/modeling_gpt_neox.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/models/modeling_llama.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/models/modeling_mistral.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/models/modeling_mixtral.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/models/modeling_opt.py.

🟢 Optimize Unit Test workflow
Check ID Status Error details
optimize-unit-test-baseline success
optimize-unit-test-PR-test success
Genreate-OptimizeUT-Report success

These checks are required after the changes to intel_extension_for_transformers/transformers/modeling/kv_cache_compression/__init__.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/h2o.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/models/__init__.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/models/modeling_bloom.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/models/modeling_gpt_neox.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/models/modeling_llama.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/models/modeling_mistral.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/models/modeling_mixtral.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/models/modeling_opt.py.

🟢 NeuralChat Unit Test
Check ID Status Error details
neuralchat-unit-test-baseline success
neuralchat-unit-test-PR-test success
Generate-NeuralChat-Report success

These checks are required after the changes to intel_extension_for_transformers/transformers/modeling/kv_cache_compression/__init__.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/h2o.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/models/__init__.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/models/modeling_bloom.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/models/modeling_gpt_neox.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/models/modeling_llama.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/models/modeling_mistral.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/models/modeling_mixtral.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/models/modeling_opt.py.

🔴 Engine Unit Test workflow
Check ID Status Error details
engine-unit-test-baseline failure download
engine-unit-test-PR-test cancelled 🚫
Genreate-Engine-Report skipped

These checks are required after the changes to intel_extension_for_transformers/transformers/modeling/kv_cache_compression/__init__.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/h2o.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/models/__init__.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/models/modeling_bloom.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/models/modeling_gpt_neox.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/models/modeling_llama.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/models/modeling_mistral.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/models/modeling_mixtral.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/models/modeling_opt.py.

🟢 Chat Bot Test workflow
Check ID Status Error details
call-inference-llama-2-7b-chat-hf / inference test success
call-inference-mpt-7b-chat / inference test success

These checks are required after the changes to intel_extension_for_transformers/transformers/modeling/kv_cache_compression/__init__.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/h2o.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/models/__init__.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/models/modeling_bloom.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/models/modeling_gpt_neox.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/models/modeling_llama.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/models/modeling_mistral.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/models/modeling_mixtral.py, intel_extension_for_transformers/transformers/modeling/kv_cache_compression/models/modeling_opt.py.


Thank you for your contribution! 💜

Note
This comment is automatically generated and will be updates every 180 seconds within the next 6 hours. If you have any other questions, contact VincyZhang or XuehaoSun for help.

BiaoFangAIA and others added 3 commits April 23, 2024 16:36
Signed-off-by: biao.fang <biao.fang@intel.com>
Signed-off-by: biao.fang <biao.fang@intel.com>
n1ck-guo and others added 4 commits April 25, 2024 03:01
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
n1ck-guo and others added 3 commits May 7, 2024 04:26
@VincyZhang VincyZhang added the WIP label May 13, 2024
n1ck-guo and others added 7 commits May 14, 2024 01:43
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: biao.fang <biao.fang@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
n1ck-guo and others added 7 commits May 16, 2024 21:14
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
n1ck-guo and others added 15 commits May 20, 2024 14:36
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
@n1ck-guo
Copy link
Collaborator Author

pre-commit.ci autofix

pre-commit-ci bot and others added 5 commits May 21, 2024 07:09
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants