Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error occurs when editing Baichuan-13B #7

Open
hiyouga opened this issue Jul 11, 2023 · 2 comments
Open

Error occurs when editing Baichuan-13B #7

hiyouga opened this issue Jul 11, 2023 · 2 comments
Labels
solved This problem has been already solved.

Comments

@hiyouga
Copy link
Owner

hiyouga commented Jul 11, 2023

loss 3.28 = 3.28 + 0.0 avg prob of [Rishi Sunak] 0.0498
loss nan = nan + nan avg prob of [Rishi Sunak] nan
loss nan = nan + nan avg prob of [Rishi Sunak] nan
loss nan = nan + nan avg prob of [Rishi Sunak] nan

The gradient of delta weight becomes nan after the first backward operation.

By using:

with torch.autograd.detect_anomaly():
     loss.backward()

We caught a runtime error by the script.

RuntimeError: Function 'MmBackward0' returned nan values in its 0th output.

I suppose that it may be related to the alibi attention masks of Baichuan-13B.

@hiyouga
Copy link
Owner Author

hiyouga commented Jul 13, 2023

It may be caused by the alibi position encoding of the current implementation of the Baichuan-13B model. The alibi position encoding does not accept the attention mask thus it is incompatible with left-padding. We are trying to fix it through re-implement the Baichuan-13B model.

@hiyouga
Copy link
Owner Author

hiyouga commented Jul 16, 2023

This problem has been fixed, please replace the model file of Baichuan-13B with the updated version in [1] and rerun the editing script.

[1] https://github.com/hiyouga/LLaMA-Efficient-Tuning/blob/main/tests/modeling_baichuan.py

@hiyouga hiyouga added the solved This problem has been already solved. label Jul 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved.
Projects
None yet
Development

No branches or pull requests

1 participant