Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved metadata #240

Closed
wants to merge 40 commits into from
Closed

Improved metadata #240

wants to merge 40 commits into from

Conversation

yellowcap
Copy link
Member

This needs another sanity check, and the actual band stats are in computation right now. Should update tomorrow morning with the stats from the calculations on batch.

srmsoumya and others added 30 commits April 18, 2024 21:25
- Add starter code for clay v1 model
- Add `model_clay_v1.py` that handles images of different size & spectral bands
- Add `metadata.yaml` that contains meta information of satellite data, like wavelengths & gsd
- Add cls tokens to ClayMAE
PyTorch Image Models!
- First working version of sampler

---------

Co-authored-by: Daniel Wiesmann <daniel@wiesmann.pt>
Copy link
Contributor

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @yellowcap, I'm just cross-checking the wavelengths for now which look mostly ok. I do have a comment/suggestion about how we might want to consistently name the Near-infrared band(s) across Sentinel-2 and Landsat (and potentially NAIP).

configs/metadata.yaml Show resolved Hide resolved
band_order:
- vv
- vh
gsd: 10
Copy link
Contributor

@weiji14 weiji14 Apr 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that there is a subtle difference between Ground Sampling Distance (GSD) and pixel spacing (see e.g. https://mapscaping.com/understanding-ground-sampling-distance-gsd). The more technically correct term here would be pixel spacing, as the Sentinel-1 image is delivered as images with 10m pixels, but the actual ground sampling distance is more like 20m (xref https://sentinel.esa.int/web/sentinel/technical-guides/sentinel-1-sar/products-algorithms/level-1-algorithms/ground-range-detected/iw).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point. Maybe this could be renamed to pixel spacing. I think this is what the model need to know the most here, what size a single pixel represents. The GSD would be good additional information, but it would have to be specified per band for multiple systems. For instance, for Sentinel-2 we resample everything to a pixel spacing of 10m, although some bands have 20m GSD. So lets keep it as is for now and update naming later?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can keep the naming for now (otherwise the model code needs to be updated too).

configs/metadata.yaml Outdated Show resolved Hide resolved
configs/metadata.yaml Outdated Show resolved Hide resolved
configs/metadata.yaml Outdated Show resolved Hide resolved
Comment on lines +57 to +59
- nir08
- swir16
- swir22
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the near-infrared band for Landsat called nir08? Band 8 on Landsat 8/9 is the panchromatic band. Is this to match with Sentinel-2? Maybe we could follow the HLS naming scheme at https://lpdaac.usgs.gov/data/get-started-data/collection-overview/missions/harmonized-landsat-sentinel-2-hls-overview/#hls-spectral-bands if we want to keep the same band name. I.e. use NIR Broad, NIR Narrow, SWIR 1, SWIR 2.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming comes as-is from the underlying STAC catalog. We have this baked into the code in 2-3 places now. So let's not change it right now and make this more consistent in a future iteration.

Example:

https://landsatlook.usgs.gov/stac-server/collections/landsat-c2l1/items/LC09_L1GT_173239_20240430_20240430_02_T2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, then let's keep this as is for now then.

SRM and others added 7 commits April 30, 2024 11:44
This needs another sanity check, and the actual band stats are in computation right now. Should update tomorrow morning
Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>
Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>
Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>
@yellowcap yellowcap force-pushed the intelligent-sampler-metadata branch from bed94c4 to 4c0c846 Compare April 30, 2024 11:24
Copy link
Contributor

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-approving to move this along (you'll need to fix the merge conflicts first though). Still think we need to move away from mean/std normalization (#94), but could do it for model v2 I guess 🙂

Base automatically changed from intelligent-sampler to v1-run May 6, 2024 07:21
Base automatically changed from v1-run to dev May 6, 2024 07:40
Base automatically changed from dev to main May 24, 2024 08:50
@yellowcap
Copy link
Member Author

This has been merged in #253

@yellowcap yellowcap closed this May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants