Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scale-MAE model #2057

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

isaaccorley
Copy link
Collaborator

Finally getting around to adding this.

Adds Scale-MAE model (ViT encoder only) and pretrained weights.

I've verified this reproduces KNN performance at different resolutions for UCMerced but will repeat for other datasets.

@RitwikGupta let me know if this looks good. I cleaned up some of the code a bit to work out of the box with our trainers (this required setting the res when initializing the model instead of dynamically but I think it should still be fine).

@calebrob6 lmk if you want to team up on this one.

@isaaccorley isaaccorley self-assigned this May 11, 2024
@github-actions github-actions bot added documentation Improvements or additions to documentation models Models and pretrained weights testing Continuous integration testing labels May 11, 2024
@adamjstewart adamjstewart added this to the 0.6.0 milestone May 12, 2024
Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs/api/misc_pretrained_weights.csv Outdated Show resolved Hide resolved
torchgeo/models/scale_mae.py Show resolved Hide resolved
Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor renaming suggestions to make things more consistent with DOFA, and some major documentation improvement suggestions. I'm willing to help with both if needed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two thoughts:

  1. I wonder if we should move both of these to the new "Sensor-Agnostic" section because they technically work for (RGB-only) imagery from any sensor
  2. Since both of these have evaluation results on fMoW, can we add additional columns with those performance metrics (assuming they are comparable)? If we move them to "Sensor-Agnostic", we may need two tables, one for things evaluated on GEO-Bench and one for things evaluated on fMoW.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may also want to add a short summary or table of which sensor-agnostic models provide which features. For example, DOFA enables explicit dynamic spectral band support (via model arch) and implicit dynamic resolution (via training data) while Scale-MAE has no dynamic spectral band support (RGB-only) but explicit dynamic resolution support (via model arch). Not sure about GASSL, maybe only implicit dynamic resolution (via training data)? It's worth mentioning that neither have dynamic temporal resolution support (maybe Satlas does?). I'm planning on highlighting this in our release notes, so I can also write something up if needed. Something like:

"The following pre-trained models offer dynamic spatial (resolution), temporal (time span), and/or spectral (wavelength) support, either via their training data (implicit) or via their model architecture (explicit):"

Model Spatial Temporal Spectral
DOFA implicit - explicit
GASSL implicit - -
Scale-MAE explicit - -

We could also optionally specify the range of resolutions/time spans/wavelengths that the model was pre-trained on. Just want to give users more feedback as to which model to choose.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we save this for a different PR?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with that, just don't let me forget before the release.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll open a PR after this one so we don't forget to finish it

Comment on lines +53 to +54
Scale-MAE Vision Transformer
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Scale-MAE Vision Transformer
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Scale-MAE
^^^^^^^^^

This is how we named DOFA, which also uses a ViT backbone

# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.

"""Pre-trained Scale-MAE Vision Transformer models."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Pre-trained Scale-MAE Vision Transformer models."""
"""Pre-trained Scale-MAE models."""

return emb


class ScaleMAEViT(VisionTransformer): # type: ignore[misc]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class ScaleMAEViT(VisionTransformer): # type: ignore[misc]
class ScaleMAE(VisionTransformer): # type: ignore[misc]

Weights.__deepcopy__ = lambda *args, **kwargs: args[0]


class ScaleMAE_ViTLarge16_Weights(WeightsEnum): # type: ignore[misc]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class ScaleMAE_ViTLarge16_Weights(WeightsEnum): # type: ignore[misc]
class ScaleMAELarge16_Weights(WeightsEnum): # type: ignore[misc]

)


def scalemae_vit_large_patch16(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def scalemae_vit_large_patch16(
def scalemae_large_patch16(

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is the image size customizable?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add additional sizes? I know we only have pre-trained weights available for large, but we might as well add functions to instantiate other sizes like they do in the source repo.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The image size can be changed and the positional embeddings will get interpolated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated diff to remove 244

@adamjstewart adamjstewart mentioned this pull request May 23, 2024
19 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation models Models and pretrained weights testing Continuous integration testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants