[WIP] LAMB optimizer #1460

francoishernandez · 2019-06-05T14:55:24Z

[DO NOT MERGE]

This is a WIP on implementing LAMB optimizer from BERT. It apparently allows to scale training on huge batches. There are some ambiguities : different algorithms between v1 and v2/v3 of the paper, some blurry definitions and no official implementation yet (a few ones are out there but differ on a few points), no clear learning_rate schedule in the paper despite detailed experiments, etc.
Also, there might be some significant tuning to do in order to find appropriate values for our tasks.
I open this PR for future work, when we'll have more elements.

The current version here is based on https://github.com/cybertronai/pytorch-lamb, which itself is based on torch.optimizers.Adam.

alphadl · 2019-07-16T00:48:02Z

LGTM

francoishernandez added 3 commits May 23, 2019 14:52

WIP LAMB Optimizer

8e1b645

add a few comments

c2ee86b

fix flake

9dd286f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] LAMB optimizer #1460

[WIP] LAMB optimizer #1460

francoishernandez commented Jun 5, 2019 •

edited

alphadl commented Jul 16, 2019

[WIP] LAMB optimizer #1460

Are you sure you want to change the base?

[WIP] LAMB optimizer #1460

Conversation

francoishernandez commented Jun 5, 2019 • edited

alphadl commented Jul 16, 2019

francoishernandez commented Jun 5, 2019 •

edited