Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add drop path schedule #1835

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open

Conversation

leng-yue
Copy link
Contributor

@leng-yue leng-yue commented May 31, 2023

Update the drop path schedule adheres to the original implementation found in DINOv2.
Add an efficient drop path to accelerate training. #1836

Given 40% drop rate, we can see a 38% performance improvement:
ViT-L/14 eval took 8.701655239999809
ViT-L/14 with efficient drop path eval took 8.702854548999994
ViT-L/14 train took 8.81138907400009
ViT-L/14 with efficient drop path train took 5.4026294970001345

Ref: DinoV2.
Benchmark: COLAB

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@leng-yue leng-yue marked this pull request as ready for review June 3, 2023 01:24
@CodiumAI-Agent
Copy link

Questions: summarize me the PR in 5 simple lines

Answer: 1. The PR introduces an efficient drop path schedule to accelerate training, inspired by the original implementation found in DINOv2.
2. A new class EfficientDropPathBlock is added to the vision_transformer.py file, which implements the efficient drop path.
3. The PR also includes the addition of a new optimizer, NAdamW, in the nadamw.py file.
4. The drop path schedule can be either 'linear' or 'uniform', and this is asserted in the VisionTransformer's __init__ method.
5. Several new models are registered in the vision_transformer.py file, including 'vit_huge_patch14_224_ijepa', 'vit_huge_patch16_448_ijepa', and 'vit_gigantic_patch16_224_ijepa'.

@CodiumAI-Agent
Copy link

Questions: Write me a poem about the PR

Answer:
In the realm of code, where logic is the key,
A pull request emerges, for all to see.
A vision transformer, it seeks to enhance,
With an efficient drop path, it takes a stance.

A new class is born, EfficientDropPathBlock,
In the depths of Python, it begins to walk.
With layers and norms, it's a sight to behold,
A tale of performance improvement, waiting to be told.

Drop path schedules, linear and uniform,
In the dance of training, they perform.
A boost in speed, a significant gain,
Like a parched desert, welcoming the rain.

New models are added, with a careful touch,
vit_huge, vit_gigantic, they offer so much.
In the world of AI, they're the new stars,
Guiding us forward, towards Mars.

NAdamW optimizer, a new addition,
Enhancing the training with its mission.
With beta, learning rate, and decay,
It optimizes in a unique way.

A poem of code, of additions and more,
A story of progress, it has in store.
In the world of git, it's a common sight,
A pull request, shining bright.

@netw0rkf10w
Copy link

@leng-yue Any insight on why this is faster? If the entire batch is dropped (instead of randomly chosen rows), then I guess this doesn't bring any gains?

@leng-yue
Copy link
Contributor Author

leng-yue commented Jul 6, 2023

Previous implementation didn't in fact drop paths, the tokens were still fed in FFN and attention, while the efficient drop path truly avoided these calculations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants