Dataset creation questions #200

MaxLenormand · 2024-03-27T19:52:32Z

I've been digging around at all I could get my hands on this project, and have a few -mostly random and not in any particular order- questions, mostly on the choices of dataset generation.

The project you guys are working on sound incredibly cool. I've seen @brunosan post about embeddings understanding and wanted to start taking a closer look at how the dataset was created in more depth as a first step :)

As far as I can understand it, from the 'How to create a datacube' section of the docs and scripts/pipeline/datacube.py, the input data consists of:

11 Band Sentinel 2, but only 10 are really used?
2 Band Sentinel 1, using the MPC RTC product
Copernicus 30m DEM, resampled to 10m

My understanding is that this is based on Cloud to Street (now Floodbase)'s dataset, it's something they implemented and this initial v0.1 simply uses their dataset setup? That's probably the answer to the following questions, but here you go nonetheless:

Sentinel 1 RTC imagery & Copernicus DEM

Why use the RTC Sentinel 1 product? I see from the MPC docs that it uses the PlanetDEM product, which according to this is from the ALOS World 3D-30m + NASADEM. I hadn't compared this one to Copernicus DEM before but there might be difference in the two (ALOS is SAR, and Cop DEM is a downsampled version of Airbus's WorldDEM made from TerraSAR-X if I recall correctly, so there would at least be some similarities). This might lead to some inconsistency between the SAR corrected imagery & the DEM used? That being said I also understand this is probably a lot faster to implement with an off-the-shelf SAR dataset :D

Copernicus DEM nearest neighbour resampling (?)

I followed the basic tutorial to get MGRS tiles over Puri India and checked the tiles in QGIS, and noticed the Copernicus DEM, band 13, seems to be interpolated to nearest neighbour, i.e. that pixels are resampled from 30m to identical values of 3x3 pixels of 10x10 meters, with all the same value. Here using QGIS's Value Tool plugin I can see there are 3 pixels, with the DEM raster not changing value on a 3x3 grid

I was wondering if there was a reason for not interpolating using something like a bilinear that might smooth the DEM more? I feel like this might weight the edges of these 3x3 grids more heavily in changes compared to the central pixel, when that's not really how the terrain is?
I didn't find the code that resamples all datasets to the 10m, so may have gotten that wrong, this is only based on my visual interpretation in QGIS.

Band choices

I'm not too sure why this code has 11 Sentinel 2 bands, while this here in the benchmark only keeps 10? And of those 10, why remove the SCL band?

Also in the benchmark script only VV and VH are kept for Sentinel 1, why not HH & HV?

These are probably details in the grand scheme of the project, but questions I had so here we are!
I'm quite amazed at all the work that you all have made in just a few months, this gets me really excited for the next iterations to come!

The text was updated successfully, but these errors were encountered:

weiji14 · 2024-03-27T23:54:17Z

Hi again @MaxLenormand! You've got some long questions, and some of these have been partially answered in separate issues already, but I'll link to them or answer them directly below!

Why use the RTC Sentinel 1 product? I see from the MPC docs that it uses the PlanetDEM product, which according to this is from the ALOS World 3D-30m + NASADEM. I hadn't compared this one to Copernicus DEM before but there might be difference in the two (ALOS is SAR, and Cop DEM is a downsampled version of Airbus's WorldDEM made from TerraSAR-X if I recall correctly, so there would at least be some similarities). This might lead to some inconsistency between the SAR corrected imagery & the DEM used? That being said I also understand this is probably a lot faster to implement with an off-the-shelf SAR dataset :D

Answered at #19 (comment)

Copernicus DEM nearest neighbour resampling (?)

Answered at #21 (comment)

Band choices

I'm not too sure why this code has 11 Sentinel 2 bands, while this here in the benchmark only keeps 10? And of those 10, why remove the SCL band?

Clay model v0.1 itself takes 10 Sentinel-2 bands, the SCL 'band' isn't an actual radiometric band, but more of a quality flag. We have the SCL band in the datacube because it is used in other places, e.g. @lillythomas's recent work on patch-level cloud cover percentages at #184.

Also in the benchmark script only VV and VH are kept for Sentinel 1, why not HH & HV?

Sentinel-1 HH and HV are typically available over the polar regions only (Greenland and Antarctica), and MPC's Sentinel-1 RTC product doesn't have them, so we didn't use it.

MaxLenormand · 2024-03-28T06:25:10Z

Thanks a bunch of answering all of those!

This was referenced Mar 27, 2024

Re-sampling strategy #21

Open

Sentinel 1 input spec and retrieval #19

Closed

weiji14 added the question Further information is requested label Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset creation questions #200

Dataset creation questions #200

MaxLenormand commented Mar 27, 2024

weiji14 commented Mar 27, 2024 •

edited

MaxLenormand commented Mar 28, 2024

Dataset creation questions #200

Dataset creation questions #200

Comments

MaxLenormand commented Mar 27, 2024

weiji14 commented Mar 27, 2024 • edited

MaxLenormand commented Mar 28, 2024

weiji14 commented Mar 27, 2024 •

edited