Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example Doubt #19683

Closed
emi-dm opened this issue May 7, 2024 · 3 comments
Closed

Example Doubt #19683

emi-dm opened this issue May 7, 2024 · 3 comments
Assignees
Labels
stat:awaiting response from contributor type:support User is asking for help / asking an implementation question. Stackoverflow would be better suited.

Comments

@emi-dm
Copy link

emi-dm commented May 7, 2024

Can someone explain me why CLS token is not included in this example and how I could include it for any backend?https://keras.io/examples/vision/image_classification_with_vision_transformer/

@sineeli
Copy link

sineeli commented May 8, 2024

Hi @emi-dm,

This design is inherited from the Transformer model for text, and we use it throughout the main
paper. An initial attempt at using only image-patch embeddings, globally average-pooling (GAP)
them, followed by a linear classifier—just like ResNet’s final feature map—performed very poorly.
However, we found that this is neither due to the extra token, nor to the GAP operation. Instead  
the difference in performance is fully explained by the requirement for a different learning-rate

Taken from ViT paper

with CLS and without CLS ViT can be constructed as per the paper. In case you want to use CLS token create a extra token embedding of ViT hidden dimension(d_model) and prepend to the Porojected Patches.

The attached new embedding can be considered as a separate single keras layer with a weight vector and this can work with all backends.

Example

class TokenLayer(keras.layers.Layer):
    
    def build(self, input_shape):
        self.cls_token = self.add_weight(
            name='cls',
            shape=(1, 1, input_shape[-1]),
            initializer='zeros'
        )
    
    def call(self, inputs):
        cls_token = self.cls_token + keras.ops.zeros_like(inputs[:, 0:1]) 
        out = keras.layers.Concatenate(axis=1)([cls_token, inputs])
        
        return out

Thanks and hope this helps.

@sachinprasadhs sachinprasadhs added type:support User is asking for help / asking an implementation question. Stackoverflow would be better suited. stat:awaiting response from contributor labels May 8, 2024
@emi-dm
Copy link
Author

emi-dm commented May 9, 2024

Thank you so much @sineeli!!! I couldn't dept the necessary into the original paper, so this caused my doubt! Really appreciated :)

@emi-dm emi-dm closed this as completed May 9, 2024
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting response from contributor type:support User is asking for help / asking an implementation question. Stackoverflow would be better suited.
Projects
None yet
Development

No branches or pull requests

3 participants