You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Well, you can look into the idea via code, in code, author used softmax in the first dimension, in other words, 'split' dimension, in code, they used 4. You can think it in this way: for each element in one splitted part, it will be calculated with softmax operation across all splitted parts in the same position.
The denominator of formula (8) is a vector in this papper. The author does not clearly explain the process of calculating the Softmax.
The text was updated successfully, but these errors were encountered: