Fixed the missing token normalization for cross-attention computation #82

goutamyg · 2023-07-31T16:51:04Z

For a downstream task, I see better training convergence upon normalizing both x and x_prev during the computation of cross-attention here: https://github.com/apple/ml-cvnets/blob/main/cvnets/modules/transformer.py#L258

Currently, I am conducting model training with and without the proposed normalization of x_prev and will share the results for the two cases. In the meantime, if this change makes sense, kindly include it. Let me know if you need any related info.

…tion

fixed the missing normalization of tokens for cross-attention computa…

b5831e3

…tion

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed the missing token normalization for cross-attention computation #82

Fixed the missing token normalization for cross-attention computation #82

goutamyg commented Jul 31, 2023 •

edited

Fixed the missing token normalization for cross-attention computation #82

Are you sure you want to change the base?

Fixed the missing token normalization for cross-attention computation #82

Conversation

goutamyg commented Jul 31, 2023 • edited

goutamyg commented Jul 31, 2023 •

edited