Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A problem in the PositionalEncoding model code #42

Open
kir1to455 opened this issue Apr 29, 2024 · 3 comments
Open

A problem in the PositionalEncoding model code #42

kir1to455 opened this issue Apr 29, 2024 · 3 comments

Comments

@kir1to455
Copy link

Hi,
Thank you for developing Corigami !
I have encountered some problems when I use corigami to train my data.

image
After the encoder step,here, the transposed matrix is input into attn.
the matrix x is : Tensor, shape [batch_size, seq_lenth, embedding_dim], not Tensor, shape [seq_lenth, batch_size, embedding_dim] !
image
if perform this step: x = x + self.pe[:x.size(0)] will return the wrong location information result.

I think the code may have made an error in transponse after the encoder.
63062fd2a7490cbc735a0d4005c233f

Best wishes,
Kirtio

@tanjimin
Copy link
Owner

Hi @kir1to455 Thank you for raising this issue!

The move feature_forward function is actually for adjusting the difference between CNN vs Transformer. CNN will have channels/hiddens as the second feature [batch, hidden, length] and NLP models (including transformer) usually have hidden at the end [batch, length, hidden]. We need to swap the last two axis back and forth.

I think you are correct about the PE which is a separate issue. One simple fix could be change the dimension of the pe loaded and let pytorch handle the broad casting. You can try this:

x = x + self.pe[:x.size(1)].transpose(0, 1)

I think the decoder compensated for the pe issue, and I'll definitely fix this in the next version! Let me know if you have other questions.

Jimin

@kir1to455
Copy link
Author

Hi, Jimin @tanjimin
Thank you for your reply.
Another question I have is why decoder is used in this article instead of transformerdecoder?
In Decoder, you use Dilated Convolution to get more receptive field.
But why not use transformerdecoder?

Best wishes,
Kirito

@tanjimin
Copy link
Owner

Hi @kir1to455 , the decoder here is actually a dilated 2D conv resnet. It is very different from the typical transformer unless you consider ViT. I named it decoder because it follows the encoder-decoder architecture and decodes genomic features to the final Hi-C map.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants