Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

注意力权重转换问题 #58

Open
noob-ctrl opened this issue Feb 19, 2024 · 2 comments
Open

注意力权重转换问题 #58

noob-ctrl opened this issue Feb 19, 2024 · 2 comments

Comments

@noob-ctrl
Copy link

本代码有关注意力权重的转换代码如下:
image

看其他的一些权重转换代码,针对于注意力权重有进行视图维度转换的操作,如下所示:
image

二者都会在后续再进行chunk来进行tensor切分操作,但两者的操作结果不一样吧?

请问本代码为什么没有考虑视图维度转换呢?

@noob-ctrl
Copy link
Author

@li-yi-dong Can you help me answer the above question, or is there a bug in the code?

@li-yi-dong
Copy link
Collaborator

这个仓库里面Attention 使用QKV 的地方做了相应的修改
https://github.com/alibaba/Megatron-LLaMA/blob/main/megatron/model/transformer.py#L553

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants