You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I was looking at the pseudocode for the adafactor paper, and I noticed that it is slightly different from the fairseq implementation in that, in fairseq's adafactor we use mean to reduce
馃悰 Bug
Hi! I was looking at the pseudocode for the adafactor paper, and I noticed that it is slightly different from the fairseq implementation in that, in fairseq's adafactor we use mean to reduce
and
when they are all supposed to be sum instead? Here's the pseudo-code from the paper
I made a pr to fix this issue if needed!
The text was updated successfully, but these errors were encountered: