Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New ViT findings via registers (2309.16588) #184

Open
Infinitay opened this issue Oct 1, 2023 · 0 comments
Open

New ViT findings via registers (2309.16588) #184

Infinitay opened this issue Oct 1, 2023 · 0 comments

Comments

@Infinitay
Copy link

Infinitay commented Oct 1, 2023

There was a paper released very recently by Facebook, now Meta, and INRIA discovering and improvement when they added registers to ViT. I'm not too familiar in the space so I won't pretend to understand it but I will leave you with the abstract,

Transformers have recently emerged as a powerful tool for learning visual representations. In this paper, we identify and characterize artifacts in feature maps of both supervised and self-supervised ViT networks. The artifacts correspond to high-norm tokens appearing during inference primarily in low-informative background areas of images, that are repurposed for internal computations. We propose a simple yet effective solution based on providing additional tokens to the input sequence of the Vision Transformer to fill that role. We show that this solution fixes that problem entirely for both supervised and self-supervised models, sets a new state of the art for self-supervised visual models on dense visual prediction tasks, enables object discovery methods with larger models, and most importantly leads to smoother feature maps and attention maps for downstream visual processing.

Source: VISION TRANSFORMERS NEED REGISTERS

Would BLIP be able to benefit from this new technique?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant