Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] icemix.py port: Optimizer not handling weight decay for cls_token #713

Open
timinar opened this issue May 8, 2024 · 1 comment
Open
Labels
bug Something isn't working

Comments

@timinar
Copy link
Collaborator

timinar commented May 8, 2024

The DeepIce model contains a method called no_weight_decay() which is intended to specify that the cls_token parameter should not be subject to weight decay during training:

@torch.jit.ignore
def no_weight_decay(self) -> Set:
    """cls_tocken should not be subject to weight decay during training."""
    return {"cls_token"}

However, optimizer_grouped_parameters are not specified during training, so this method has no effect.
I believe that in the original 2nd place code, FastAI's wrapper around AdamW handled this automatically.

@timinar timinar added the bug Something isn't working label May 8, 2024
@ChenLi2049
Copy link
Collaborator

ChenLi2049 commented May 16, 2024

Yes, this method def no_weight_decay() is called in BEiT-2 when training. However, I searched for it and it is not called in the original 2nd place solution or fastai.

Also, I can find optimizer_grouped_parameters in BEiT-2, but not in the original 2nd place solution or fastai.

The 2nd place solution uses fastai.vision.all.OptimWrapper. But fastai.vision.all.OptimWrapper or its base class do not contain this method or call this method.

I find this to be a historical reason by BEiT-2, and maybe this method def no_weight_decay() can be removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants