Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement DINO strategy for learning. #203

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Implement DINO strategy for learning. #203

wants to merge 3 commits into from

Conversation

brunosan
Copy link
Member

@brunosan brunosan commented Mar 29, 2024

This PR changes the learning method (we do not change the architecture or outputs) from using the MAE (Masked Autoencoder) to the DINO (Distillation with No Labels) approach.

Background on MAE:
MAE operates on the principle of masking a significant portion of the input data (typically 75%) and training the model to reconstruct these missing parts. This approach encourages the model to learn representations based on the context provided by the unmasked portions, leveraging transformer technology to generate detailed embeddings for each data patch. In scenarios where unique features are isolated within single patches, it might not always effectively infer their presence.

DINO:
DINO shifts the focus from reconstruction to a student-teacher framework (two models running in parallel). Here, the "student" model learns to replicate the output of the "teacher" model, which itself is an aggregate of the student model's past iterations. This method emphasizes learning from the entirety of the input data, as opposed to focusing on the missing parts, aiming to refine the model's understanding and representation capabilities.

Key Differences and Advantages:

  • Holistic Learning vs. Reconstruction by extrapolation: Unlike MAE, where learning is driven by the need to fill in gaps, DINO encourages the model to understand the full scope of the input data.

  • Dynamic Updating: The teacher model in DINO is dynamically updated, slowing moving the target towards better reconstructions.

Patch-Level Embeddings: Both MAE and DINO generate detailed embeddings at the patch level, but DINO is able to capture more nuanced patterns within and around each patch, informed by the accumulated tries of the teacher model.

DINO downsides:

  • We need to mantain 2 copies while trainning, so more memory footprint.
  • The target is not fixed, so it might need more computation to converge.
  • More hyperparameters of each model and their interaction.
  • MAE is sensitive to the smallest and most unique features. DINO, since it always looks at the whole image, might not give due attention to rare small features.

Currently running a small experiment over Bali with DINO and then I'll do same with MAE and compare runs.

@brunosan
Copy link
Member Author

brunosan commented Apr 3, 2024

Screenshot 2024-04-03 at 10 14 58

Promising training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant