Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip-connection in Transformer #17

Open
hoangcuong2011 opened this issue Jan 20, 2019 · 1 comment
Open

Skip-connection in Transformer #17

hoangcuong2011 opened this issue Jan 20, 2019 · 1 comment

Comments

@hoangcuong2011
Copy link

Hello,

Thanks for a great project, which helps me build model on top of that.

I was wondering one thing: it seems like you do not implement skip connection (residual network) in Transformer?

Is it because you implemented it and you didn't observe improvement?

Or is it just because you didn't implement it?

I asked because when I use more layers, I got worser performance actually. I am not sure whether it is what it is (i.e. having more layers does not help), or it is because I don't have skip connections, which usually helps build a deeper model.

Best,

@lsdefine
Copy link
Owner

There are skip connections.
See Add() in EncoderLayer/DecoderLayer.
The tricks (lr scheduler, etc.) should be used if the network is deep, even if there are skip connections.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants