[Bug][Fix][WIP] Fix pre-layernormalization in Transformer #1488

sxjscience · 2021-01-18T01:06:39Z

Description

Fix the additional of the residual connection. The previous implementation was not correct. I'm rerunning the Transformer-Big-pre-ln experiment.

@yongyi-wu

Checklist

Essentials

PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

cc @dmlc/gluon-nlp-team

codecov · 2021-01-18T01:22:30Z

Codecov Report

Merging #1488 (3c7c4c1) into master (c582b64) will increase coverage by 3.21%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1488      +/-   ##
==========================================
+ Coverage   81.98%   85.19%   +3.21%     
==========================================
  Files          52       52              
  Lines        6909     6822      -87     
==========================================
+ Hits         5664     5812     +148     
+ Misses       1245     1010     -235

Impacted Files	Coverage Δ
src/gluonnlp/data/batchify.py	`88.72% <ø> (ø)`
src/gluonnlp/layers.py	`87.15% <100.00%> (+0.03%)`	⬆️
src/gluonnlp/models/transformer.py	`98.93% <100.00%> (-0.01%)`	⬇️
conftest.py	`76.31% <0.00%> (-8.69%)`	⬇️
src/gluonnlp/data/loading.py	`75.75% <0.00%> (-7.64%)`	⬇️
src/gluonnlp/utils/lazy_imports.py	`58.42% <0.00%> (-2.25%)`	⬇️
src/gluonnlp/utils/misc.py	`52.51% <0.00%> (-1.06%)`	⬇️
src/gluonnlp/data/tokenizers/yttm.py	`81.73% <0.00%> (-1.02%)`	⬇️
src/gluonnlp/data/tokenizers/spacy.py	`65.33% <0.00%> (-0.91%)`	⬇️
src/gluonnlp/data/tokenizers/huggingface.py	`71.06% <0.00%> (-0.78%)`	⬇️
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c582b64...3c7c4c1. Read the comment docs.

yongyi-wu · 2021-01-19T15:31:33Z

Looks good— it seems all issues related to pre-norm and skip connection have been fixed.

github-actions · 2021-01-19T17:55:53Z

The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1488/fix_pre_ln/index.html

sxjscience · 2021-01-20T10:07:12Z

I noticed that the performance becomes worse after I changed the implementation. Still investigating the issue.

sxjscience added 2 commits January 17, 2021 16:48

fix pre_ln

e5e29af

update

b77b1e3

sxjscience requested a review from a team as a code owner January 18, 2021 01:06

sxjscience added 2 commits January 17, 2021 17:14

fix

8a031a5

fix

698c0a7

sxjscience changed the title ~~[Bug][Fix] Fix pre-layernormalization in Transformer~~ [Bug][Fix][WIP] Fix pre-layernormalization in Transformer Jan 18, 2021

sxjscience added 3 commits January 17, 2021 17:25

fix

da6f3ff

fix document

caf2305

fix doc

fb630ac

Update transformer.py

3c7c4c1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug][Fix][WIP] Fix pre-layernormalization in Transformer #1488

[Bug][Fix][WIP] Fix pre-layernormalization in Transformer #1488

sxjscience commented Jan 18, 2021

codecov bot commented Jan 18, 2021 •

edited

yongyi-wu commented Jan 19, 2021

github-actions bot commented Jan 19, 2021

sxjscience commented Jan 20, 2021

[Bug][Fix][WIP] Fix pre-layernormalization in Transformer #1488

Are you sure you want to change the base?

[Bug][Fix][WIP] Fix pre-layernormalization in Transformer #1488

Conversation

sxjscience commented Jan 18, 2021

Description

Checklist

Essentials

Changes

Comments

codecov bot commented Jan 18, 2021 • edited

Codecov Report

yongyi-wu commented Jan 19, 2021

github-actions bot commented Jan 19, 2021

sxjscience commented Jan 20, 2021

codecov bot commented Jan 18, 2021 •

edited