Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NPU] support npu llama2-13B export & inference #8442

Merged
merged 2 commits into from
May 20, 2024

Conversation

ronny1996
Copy link
Contributor

PR types

New features

PR changes

Others

Description

support npu llama2-13B export & inference

Copy link

paddle-bot bot commented May 15, 2024

Thanks for your contribution!

Copy link

codecov bot commented May 15, 2024

Codecov Report

Attention: Patch coverage is 0% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 54.29%. Comparing base (05acad5) to head (fefe28d).
Report is 13 commits behind head on develop.

Files Patch % Lines
...erimental/transformers/fused_transformer_layers.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #8442      +/-   ##
===========================================
- Coverage    55.42%   54.29%   -1.14%     
===========================================
  Files          617      617              
  Lines        96281    96339      +58     
===========================================
- Hits         53366    52304    -1062     
- Misses       42915    44035    +1120     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

llm/llama/npu/llama_npu_process_params.py Outdated Show resolved Hide resolved
llm/predictor.py Outdated Show resolved Hide resolved
@ronny1996 ronny1996 force-pushed the llama2_dev branch 3 times, most recently from a59da09 to 0ee4655 Compare May 16, 2024 11:55
@@ -0,0 +1,14 @@
# PaddleNLP 自定义 OP
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里建议不要单独创建新的csrc目录,因为后续多硬件接入会非常多,建议直接在csrc目录创建一个npu目录

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改,csrc_npu -> csrc/npu


# 1. 安装 PaddleCustomDevice

参考 [PaddleCustomDevice NPU 安装文档](https://github.com/PaddlePaddle/PaddleCustomDevice/blob/develop/backends/npu/README_cn.md) 进行安装
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前CustomDevice有NPU编译后的版本吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前没有用于高性能推理的包,推荐是自行编译

@@ -570,7 +570,7 @@ def compute_layernorm_before_qkv(self, src, i):
return ln_out

def compute_qkv_linear(self, ln_out, i):
if float(paddle.version.cuda()) < 11.6:
if paddle.version.cuda() == "False" or float(paddle.version.cuda()) < 11.6:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

昆仑芯推理也是走了这个逻辑吗,如果是对昆仑芯推理有影响吗?

Copy link
Contributor Author

@ronny1996 ronny1996 May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没有影响,这里只影响 paddle-cpu 版本 (npu用的paddle-cpu版本) 走上面这个分支,否则 float(paddle.version.cuda()) 会报错,cpu版本的paddle.version.cuda() 返回的是字符串 False

if predictor_args.device == "npu":
from llama.npu.export_utils import process_params

process_params(os.path.join(export_args.output_path, predictor_args.model_prefix))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里对NPU的模型的op的attr进行修改的原因是什么了?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NPU高性能推理时

  1. 权重转置后的矩阵乘性能更好
  2. npu的dequant scale有特殊格式
  3. 这里修改避免引入硬件强相关的代码到组网

@wawltor wawltor merged commit 87e4c4f into PaddlePaddle:develop May 20, 2024
7 of 12 checks passed
@ronny1996 ronny1996 deleted the llama2_dev branch May 20, 2024 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants