Scale-MAE model #2057

isaaccorley · 2024-05-11T20:20:34Z

Finally getting around to adding this.

Adds Scale-MAE model (ViT encoder only) and pretrained weights.

I've verified this reproduces KNN performance at different resolutions for UCMerced but will repeat for other datasets.

@RitwikGupta let me know if this looks good. I cleaned up some of the code a bit to work out of the box with our trainers (this required setting the res when initializing the model instead of dynamically but I think it should still be fine).

@calebrob6 lmk if you want to team up on this one.

adamjstewart

Also see changes in https://github.com/microsoft/torchgeo/pull/2052/files

docs/api/misc_pretrained_weights.csv

torchgeo/models/scale_mae.py

adamjstewart

Some minor renaming suggestions to make things more consistent with DOFA, and some major documentation improvement suggestions. I'm willing to help with both if needed.

adamjstewart · 2024-05-17T12:29:53Z

docs/api/misc_pretrained_weights.csv

Two thoughts:

I wonder if we should move both of these to the new "Sensor-Agnostic" section because they technically work for (RGB-only) imagery from any sensor

Since both of these have evaluation results on fMoW, can we add additional columns with those performance metrics (assuming they are comparable)? If we move them to "Sensor-Agnostic", we may need two tables, one for things evaluated on GEO-Bench and one for things evaluated on fMoW.

We may also want to add a short summary or table of which sensor-agnostic models provide which features. For example, DOFA enables explicit dynamic spectral band support (via model arch) and implicit dynamic resolution (via training data) while Scale-MAE has no dynamic spectral band support (RGB-only) but explicit dynamic resolution support (via model arch). Not sure about GASSL, maybe only implicit dynamic resolution (via training data)? It's worth mentioning that neither have dynamic temporal resolution support (maybe Satlas does?). I'm planning on highlighting this in our release notes, so I can also write something up if needed. Something like:

"The following pre-trained models offer dynamic spatial (resolution), temporal (time span), and/or spectral (wavelength) support, either via their training data (implicit) or via their model architecture (explicit):"

Model Spatial Temporal Spectral

DOFA implicit - explicit

GASSL implicit - -

Scale-MAE explicit - -

We could also optionally specify the range of resolutions/time spans/wavelengths that the model was pre-trained on. Just want to give users more feedback as to which model to choose.

Should we save this for a different PR?

I'm fine with that, just don't let me forget before the release.

I'll open a PR after this one so we don't forget to finish it

adamjstewart · 2024-05-17T12:31:46Z

docs/api/models.rst

+Scale-MAE Vision Transformer
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^


Suggested change

Scale-MAE Vision Transformer

^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Scale-MAE

^^^^^^^^^

This is how we named DOFA, which also uses a ViT backbone

adamjstewart · 2024-05-17T12:34:34Z

torchgeo/models/scale_mae.py

+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License.
+
+"""Pre-trained Scale-MAE Vision Transformer models."""


Suggested change

"""Pre-trained Scale-MAE Vision Transformer models."""

"""Pre-trained Scale-MAE models."""

adamjstewart · 2024-05-17T12:35:44Z

torchgeo/models/scale_mae.py

+    return emb
+
+
+class ScaleMAEViT(VisionTransformer):  # type: ignore[misc]


Suggested change

class ScaleMAEViT(VisionTransformer): # type: ignore[misc]

class ScaleMAE(VisionTransformer): # type: ignore[misc]

adamjstewart · 2024-05-17T12:35:55Z

torchgeo/models/scale_mae.py

+Weights.__deepcopy__ = lambda *args, **kwargs: args[0]
+
+
+class ScaleMAE_ViTLarge16_Weights(WeightsEnum):  # type: ignore[misc]


Suggested change

class ScaleMAE_ViTLarge16_Weights(WeightsEnum): # type: ignore[misc]

class ScaleMAELarge16_Weights(WeightsEnum): # type: ignore[misc]

adamjstewart · 2024-05-17T12:36:15Z

torchgeo/models/scale_mae.py

+    )
+
+
+def scalemae_vit_large_patch16(


Suggested change

def scalemae_vit_large_patch16(

def scalemae_large_patch16(

Or is the image size customizable?

Can we add additional sizes? I know we only have pre-trained weights available for large, but we might as well add functions to instantiate other sizes like they do in the source repo.

The image size can be changed and the positional embeddings will get interpolated.

Updated diff to remove 244

add scale-mae model and pretrained weights

f389031

isaaccorley self-assigned this May 11, 2024

github-actions bot added documentation Improvements or additions to documentation models Models and pretrained weights testing Continuous integration testing labels May 11, 2024

adamjstewart added this to the 0.6.0 milestone May 12, 2024

adamjstewart reviewed May 12, 2024

View reviewed changes

docs/api/misc_pretrained_weights.csv Outdated Show resolved Hide resolved

torchgeo/models/scale_mae.py Show resolved Hide resolved

isaaccorley added 4 commits May 12, 2024 18:30

fix docs, mypy, and code coverage

5790a94

Merge branch 'main' into models/scale-mae

7efcfe0

add scalemae to torch hub

ce429cc

make res optional instead of required

a935564

isaaccorley requested review from adamjstewart and calebrob6 May 13, 2024 01:22

adamjstewart requested changes May 17, 2024

View reviewed changes

adamjstewart mentioned this pull request May 23, 2024

Add new pretrained weights #1043

Open

19 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scale-MAE model #2057

Scale-MAE model #2057

isaaccorley commented May 11, 2024

adamjstewart left a comment

adamjstewart left a comment

adamjstewart May 17, 2024

adamjstewart May 17, 2024

isaaccorley May 17, 2024

adamjstewart May 17, 2024

isaaccorley May 17, 2024

adamjstewart May 17, 2024

adamjstewart May 17, 2024

adamjstewart May 17, 2024

adamjstewart May 17, 2024

adamjstewart May 17, 2024 •

edited

adamjstewart May 17, 2024

adamjstewart May 17, 2024

isaaccorley May 17, 2024

adamjstewart May 17, 2024

	"""Pre-trained Scale-MAE Vision Transformer models."""
	"""Pre-trained Scale-MAE models."""

		return emb


		class ScaleMAEViT(VisionTransformer): # type: ignore[misc]

	class ScaleMAEViT(VisionTransformer): # type: ignore[misc]
	class ScaleMAE(VisionTransformer): # type: ignore[misc]

		Weights.__deepcopy__ = lambda args, *kwargs: args[0]


		class ScaleMAE_ViTLarge16_Weights(WeightsEnum): # type: ignore[misc]

	class ScaleMAE_ViTLarge16_Weights(WeightsEnum): # type: ignore[misc]
	class ScaleMAELarge16_Weights(WeightsEnum): # type: ignore[misc]

Scale-MAE model #2057

Are you sure you want to change the base?

Scale-MAE model #2057

Conversation

isaaccorley commented May 11, 2024

adamjstewart left a comment

Choose a reason for hiding this comment

adamjstewart left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamjstewart May 17, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamjstewart May 17, 2024 •

edited