This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

transformer multihead attention scaling layer error #108

Open

skswldndi opened this issue Jan 29, 2022 · 0 comments

skswldndi commented Jan 29, 2022

Hi. I think there's an problem in transformer scaling layer.
When I run UNMT, got Exceptionerror in NMT/src/modules/multihead_attention.py line 97.

line 97 : q = self.scaling
line 30 : self.scaling = self.head_dim*-0.5

I could not find the reason.
So I just change my code to

line 97 : q = q / math.sqrt(self.head_dim)

and it worked.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.