Add drop path schedule #1835

leng-yue · 2023-05-31T05:40:42Z

Update the drop path schedule adheres to the original implementation found in DINOv2.
Add an efficient drop path to accelerate training. #1836

Given 40% drop rate, we can see a 38% performance improvement:
ViT-L/14 eval took 8.701655239999809
ViT-L/14 with efficient drop path eval took 8.702854548999994
ViT-L/14 train took 8.81138907400009
ViT-L/14 with efficient drop path train took 5.4026294970001345

Ref: DinoV2.
Benchmark: COLAB

HuggingFaceDocBuilderDev · 2023-05-31T05:44:25Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

It seems like 4 is the correct default value

* EfficientNet/MobileNetV3/HRNetFeatures cls and FX mode support -ve index * MobileNetV3 allows feature_cfg mode to bypass MobileNetV3Features

CodiumAI-Agent · 2023-07-06T08:00:05Z

Questions: summarize me the PR in 5 simple lines

Answer: 1. The PR introduces an efficient drop path schedule to accelerate training, inspired by the original implementation found in DINOv2.
2. A new class EfficientDropPathBlock is added to the vision_transformer.py file, which implements the efficient drop path.
3. The PR also includes the addition of a new optimizer, NAdamW, in the nadamw.py file.
4. The drop path schedule can be either 'linear' or 'uniform', and this is asserted in the VisionTransformer's __init__ method.
5. Several new models are registered in the vision_transformer.py file, including 'vit_huge_patch14_224_ijepa', 'vit_huge_patch16_448_ijepa', and 'vit_gigantic_patch16_224_ijepa'.

CodiumAI-Agent · 2023-07-06T08:04:17Z

Questions: Write me a poem about the PR

Answer:
In the realm of code, where logic is the key,
A pull request emerges, for all to see.
A vision transformer, it seeks to enhance,
With an efficient drop path, it takes a stance.

A new class is born, EfficientDropPathBlock,
In the depths of Python, it begins to walk.
With layers and norms, it's a sight to behold,
A tale of performance improvement, waiting to be told.

Drop path schedules, linear and uniform,
In the dance of training, they perform.
A boost in speed, a significant gain,
Like a parched desert, welcoming the rain.

New models are added, with a careful touch,
vit_huge, vit_gigantic, they offer so much.
In the world of AI, they're the new stars,
Guiding us forward, towards Mars.

NAdamW optimizer, a new addition,
Enhancing the training with its mission.
With beta, learning rate, and decay,
It optimizes in a unique way.

A poem of code, of additions and more,
A story of progress, it has in store.
In the world of git, it's a common sight,
A pull request, shining bright.

netw0rkf10w · 2023-07-06T10:59:54Z

@leng-yue Any insight on why this is faster? If the entire batch is dropped (instead of randomly chosen rows), then I guess this doesn't bring any gains?

leng-yue · 2023-07-06T17:18:29Z

Previous implementation didn't in fact drop paths, the tokens were still fed in FFN and attention, while the efficient drop path truly avoided these calculations.

leng-yue added 2 commits May 31, 2023 00:22

add drop path schedule

a9cdb4f

update different attn class support

de24696

leng-yue mentioned this pull request May 31, 2023

[FEATURE] Effective drop path. #1836

Open

leng-yue added 3 commits May 31, 2023 16:21

update dpr config

bd88d53

add efficient drop path

6a6a786

fix efficient drop path call

e0c1f5f

leng-yue marked this pull request as ready for review June 3, 2023 01:24

leng-yue and others added 14 commits June 5, 2023 02:13

fix serious bug that cause value explosion

5bfb679

make jit happy

e8ebf3d

add EfficientDropPathBlock

ff3ef1e

rm efficient_drop_path from vit

38c8c3d

update droppath

4c310fc

Fix Pytorch 2.0 breakage for Lookahead optimizer adapter

2b543f9

Changed help_string of --worker

8ee7440

It seems like 4 is the correct default value

[doc build] Use secrets

6261c39

Add NadamW based on mlcommons algorithm, added multi-tensor step

0b1bac7

Missed nadamw.py

06699f6

Missed extra nadam algo step for capturable path

12fba36

‘add ViT weight from I-JEPA pretrain’

736dd2a

Support other features only modes for EfficientNet

f9636f2

Clean more feature extract issues

c438598

* EfficientNet/MobileNetV3/HRNetFeatures cls and FX mode support -ve index * MobileNetV3 allows feature_cfg mode to bypass MobileNetV3Features

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add drop path schedule #1835

Add drop path schedule #1835

leng-yue commented May 31, 2023 •

edited

HuggingFaceDocBuilderDev commented May 31, 2023

CodiumAI-Agent commented Jul 6, 2023

CodiumAI-Agent commented Jul 6, 2023

netw0rkf10w commented Jul 6, 2023

leng-yue commented Jul 6, 2023

Add drop path schedule #1835

Are you sure you want to change the base?

Add drop path schedule #1835

Conversation

leng-yue commented May 31, 2023 • edited

HuggingFaceDocBuilderDev commented May 31, 2023

CodiumAI-Agent commented Jul 6, 2023

CodiumAI-Agent commented Jul 6, 2023

netw0rkf10w commented Jul 6, 2023

leng-yue commented Jul 6, 2023

leng-yue commented May 31, 2023 •

edited