Update ML Decoder #2045

fffffgggg54 · 2023-11-27T20:18:29Z

Update ML Decoder's TransformerDecoderLayerOptimal module to comply with what nn.TransformerDecoder expects. Current changes work with resnet50.

add_ml_decoder_head needs to be updated for other models. In my limited testing, the following case works with RegNet:

elif hasattr(model, 'head'):    # ClassifierHead and ConvNext
    if hasattr(model.head, 'flatten'):  # ConvNext case
        model.head.flatten = nn.Identity()
    model.head.global_pool = nn.Identity()
    del model.head.fc
    num_classes = model.num_classes
    num_features = model.num_features
    model.head.fc = MLDecoder(num_classes=num_classes, initial_num_features=num_features)

HuggingFaceDocBuilderDev · 2023-11-27T20:46:18Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

fffffgggg54 · 2023-12-18T07:09:06Z

Currently have 3 variants: original with compatibility changes for pt 2.1+, a version identical to the original but with a performance fix that's giving ~35% training speedup, and a WIP reimplementation that updates styling and changes the decoder block implementation. The last one involves architectural changes that removes some of odd components from #1012 (residual on dropouts and queries, mlp residual location, norm locations). I'm in the process of testing these changes.

@mrT23 do you have any comments? @rwightman are you aware of any pytorch transformer decoder/cross attention implementations that follow the style of this library or that I could reference? I inferred the cross attention impl based on the ViT attn impl and the block structure from the original impl, but I would rather this impl follow something standard (less the the self attn in the decoder), than introduce other odd implementation choices.

Model compatibility is iffy, there are a few with odd architectures (combining multiple feature maps, distillation architectures, etc) that would be a pain to special case and probably won't be used and a few other more prominent architectures that don't work because they use nhwc. Overall seems like it would be difficult to maintain along with #2048. @rwightman Do you think a revised classifierhead that supports additional pooling and head formats would be better? There are quite a few structures that can be placed there (pool->ffn, various ml-decoder-like mechanisms). Since many models already have forward split into forward_features and forward_head, I hope that a change like this will help abstract the design of the head from the model and provide a convenient and unified way to modify/swap the head.

rwightman · 2023-12-20T00:28:42Z

@fffffgggg54 curious what your goals are for this impl, what sort of applications, etc.

For cross attn, don't have many examples in this codebase, but prefer to keep the naming of inputs parallel... ie x_q/x_kv, inputs_q/inputs_kv
Making these sorts of pool + head or pool only additions universally applicable to all the timm models is a bit of a pain due to decisions in the past to keep some compatibility with original weights, would have been easier if I'd pulled all head related weights out into their own submodule for ALL models. Still have some ideas to work on here but currently focused on a document ai project.
The most common type of attention pooling / head impl I've been working with in recent months is the attention pooling of the CLIP variety (attention_pool2d.py) and especially SigLIP style with a latent q (attention_pool.py), which isn't too far different from the perceiver resampler. Been fiddling with the latent style with a non singular q seq_len and qk norms for another project...

fffffgggg54 · 2023-12-20T15:10:16Z

Goals are primarily performance, compatibility, and consistent styling with newer timm implementations. Legacy version provides support and improved performance and reimplementation attempts to match other timm models and removes nn.TransformerDecoder and nn.MultiheadAttention. Original impl gives me around 2300 img/s, new impls give me 2850-2900 img/s, gap head gives me around 3000 img/s at 100 groups and 1588 classes.

I work almost exclusively with a dataset (danbooru) that poses a multi-label positive-unlabeled problem. I use MLDecoder for this sometimes and recently noticed the pt 2.1 and extensive model compat issues along the slow groupFC impl, odd dropout, etc, prompting both versions. In addition to PU-specific techniques (often in math-heavy papers that are a bit of a headache to read), some researchers focus on aspects of the model, often what comes after the backbone (GNNs, text towers, MLDecoder, activation functions). The labeling scheme of danbooru is set up such that the labels present in an image can be mapped out hierarchically, similar to a scene graph. This is also done internally via label implications. I'm also working on implementing a from-scratch impl of DependencyViT for this, not going well, hopefully can exploit the tree structure for this.

I have a drop-in replacement for ClassifierHead in a notebook right now that works as long as self.head(x) and self.head.reset style call is used (hence #2050, #2051). My plan for the head submodule would be to gradually change models to follow regular ClassifierHead models and maintain compat with original weights either via aliases or modified weight remap fn. I hope that this change will help standardize head impls for current/new models and help with integrating different head cfgs. I'm down to impl this, just not sure of how you'd like it done.

fffffgggg54 · 2023-12-26T00:24:34Z

Added tests from my own testing script, will fail because there are models that don't work, 95 variants specifically, mostly due to distill/multiple feature maps/wrong input shape. The code to add the head is messy, universal head should fix. MLDecoderHead would basically be how the universal head would have to be configured in order for MLDecoder to work.

fffffgggg54 added 2 commits November 27, 2023 11:57

Update ml_decoder.py

8997ccf

Update ml_decoder.py

69f3634

fffffgggg54 added 27 commits December 8, 2023 13:17

Update beit.py

92f0b7f

Update beit.py

d10f002

Update beit.py

f59c1f1

Update davit.py

318369a

Merge branch 'davit_head_update' into ml-decoder

08b7837

Update edgenext.py

03606ab

Update edgenext.py

4f8898a

Update edgenext.py

c0206b8

Merge branch 'edgenext_head_update' into ml-decoder

babdb21

Merge branch 'huggingface:main' into ml-decoder

13b33c6

vectorize GroupFC

95bd9dc

Update ml_decoder.py

1448b7d

Update ml_decoder.py

ab3eacb

Update ml_decoder.py

b4afe6c

Update ml_decoder.py

dd8d231

Update ml_decoder.py

51c85bd

Update ml_decoder.py

36d69a4

Update ml_decoder.py

4161159

Update ml_decoder.py

aa19600

Update ml_decoder.py

6b99c91

Update ml_decoder.py

db24aea

Update ml_decoder.py

09749be

Update ml_decoder.py

2a05a28

Update ml_decoder.py

50fe44f

Update ml_decoder.py

565a57a

Update ml_decoder.py

95a24fc

Update ml_decoder.py

7ebfb85

fffffgggg54 added 2 commits December 16, 2023 23:16

Update ml_decoder.py

b2917f1

Update ml_decoder.py

e0c9f14

Update ml_decoder.py

0124d0f

fffffgggg54 added 4 commits December 25, 2023 13:24

Update ml_decoder.py

5a39bb3

Update ml_decoder.py

b680c48

Update ml_decoder.py

3b506e9

tests

4f1b76b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update ML Decoder #2045

Update ML Decoder #2045

fffffgggg54 commented Nov 27, 2023 •

edited

HuggingFaceDocBuilderDev commented Nov 27, 2023

fffffgggg54 commented Dec 18, 2023

rwightman commented Dec 20, 2023

fffffgggg54 commented Dec 20, 2023

fffffgggg54 commented Dec 26, 2023 •

edited

Update ML Decoder #2045

Are you sure you want to change the base?

Update ML Decoder #2045

Conversation

fffffgggg54 commented Nov 27, 2023 • edited

HuggingFaceDocBuilderDev commented Nov 27, 2023

fffffgggg54 commented Dec 18, 2023

rwightman commented Dec 20, 2023

fffffgggg54 commented Dec 20, 2023

fffffgggg54 commented Dec 26, 2023 • edited

fffffgggg54 commented Nov 27, 2023 •

edited

fffffgggg54 commented Dec 26, 2023 •

edited