Implement DINO strategy for learning. #203

brunosan · 2024-03-29T15:13:11Z

This PR changes the learning method (we do not change the architecture or outputs) from using the MAE (Masked Autoencoder) to the DINO (Distillation with No Labels) approach.

Background on MAE:
MAE operates on the principle of masking a significant portion of the input data (typically 75%) and training the model to reconstruct these missing parts. This approach encourages the model to learn representations based on the context provided by the unmasked portions, leveraging transformer technology to generate detailed embeddings for each data patch. In scenarios where unique features are isolated within single patches, it might not always effectively infer their presence.

DINO:
DINO shifts the focus from reconstruction to a student-teacher framework (two models running in parallel). Here, the "student" model learns to replicate the output of the "teacher" model, which itself is an aggregate of the student model's past iterations. This method emphasizes learning from the entirety of the input data, as opposed to focusing on the missing parts, aiming to refine the model's understanding and representation capabilities.

Key Differences and Advantages:

Holistic Learning vs. Reconstruction by extrapolation: Unlike MAE, where learning is driven by the need to fill in gaps, DINO encourages the model to understand the full scope of the input data.
Dynamic Updating: The teacher model in DINO is dynamically updated, slowing moving the target towards better reconstructions.

Patch-Level Embeddings: Both MAE and DINO generate detailed embeddings at the patch level, but DINO is able to capture more nuanced patterns within and around each patch, informed by the accumulated tries of the teacher model.

DINO downsides:

We need to mantain 2 copies while trainning, so more memory footprint.
The target is not fixed, so it might need more computation to converge.
More hyperparameters of each model and their interaction.
MAE is sensitive to the smallest and most unique features. DINO, since it always looks at the whole image, might not give due attention to rare small features.

Currently running a small experiment over Bali with DINO and then I'll do same with MAE and compare runs.

brunosan · 2024-04-03T08:16:02Z

Promising training.

Implement DINO w/ better logs

7b92c1a

brunosan force-pushed the DINO branch from 8edc519 to 7b92c1a Compare March 30, 2024 19:37

brunosan added 2 commits April 3, 2024 22:35

cleanup

c9f2a04

running state

1456315

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement DINO strategy for learning. #203

Implement DINO strategy for learning. #203

brunosan commented Mar 29, 2024 •

edited

brunosan commented Apr 3, 2024

Implement DINO strategy for learning. #203

Are you sure you want to change the base?

Implement DINO strategy for learning. #203

Conversation

brunosan commented Mar 29, 2024 • edited

brunosan commented Apr 3, 2024

brunosan commented Mar 29, 2024 •

edited