Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset creation questions #200

Open
MaxLenormand opened this issue Mar 27, 2024 · 2 comments
Open

Dataset creation questions #200

MaxLenormand opened this issue Mar 27, 2024 · 2 comments
Labels
question Further information is requested

Comments

@MaxLenormand
Copy link
Contributor

I've been digging around at all I could get my hands on this project, and have a few -mostly random and not in any particular order- questions, mostly on the choices of dataset generation.

The project you guys are working on sound incredibly cool. I've seen @brunosan post about embeddings understanding and wanted to start taking a closer look at how the dataset was created in more depth as a first step :)

As far as I can understand it, from the 'How to create a datacube' section of the docs and scripts/pipeline/datacube.py, the input data consists of:

My understanding is that this is based on Cloud to Street (now Floodbase)'s dataset, it's something they implemented and this initial v0.1 simply uses their dataset setup? That's probably the answer to the following questions, but here you go nonetheless:

  1. Sentinel 1 RTC imagery & Copernicus DEM

Why use the RTC Sentinel 1 product? I see from the MPC docs that it uses the PlanetDEM product, which according to this is from the ALOS World 3D-30m + NASADEM. I hadn't compared this one to Copernicus DEM before but there might be difference in the two (ALOS is SAR, and Cop DEM is a downsampled version of Airbus's WorldDEM made from TerraSAR-X if I recall correctly, so there would at least be some similarities). This might lead to some inconsistency between the SAR corrected imagery & the DEM used? That being said I also understand this is probably a lot faster to implement with an off-the-shelf SAR dataset :D

  1. Copernicus DEM nearest neighbour resampling (?)

I followed the basic tutorial to get MGRS tiles over Puri India and checked the tiles in QGIS, and noticed the Copernicus DEM, band 13, seems to be interpolated to nearest neighbour, i.e. that pixels are resampled from 30m to identical values of 3x3 pixels of 10x10 meters, with all the same value. Here using QGIS's Value Tool plugin I can see there are 3 pixels, with the DEM raster not changing value on a 3x3 grid

CleanShot 2024-03-27 at 20 19 54

I was wondering if there was a reason for not interpolating using something like a bilinear that might smooth the DEM more? I feel like this might weight the edges of these 3x3 grids more heavily in changes compared to the central pixel, when that's not really how the terrain is?
I didn't find the code that resamples all datasets to the 10m, so may have gotten that wrong, this is only based on my visual interpretation in QGIS.

  1. Band choices

I'm not too sure why this code has 11 Sentinel 2 bands, while this here in the benchmark only keeps 10? And of those 10, why remove the SCL band?

Also in the benchmark script only VV and VH are kept for Sentinel 1, why not HH & HV?

These are probably details in the grand scheme of the project, but questions I had so here we are!
I'm quite amazed at all the work that you all have made in just a few months, this gets me really excited for the next iterations to come!

@weiji14
Copy link
Contributor

weiji14 commented Mar 27, 2024

Hi again @MaxLenormand! You've got some long questions, and some of these have been partially answered in separate issues already, but I'll link to them or answer them directly below!

Why use the RTC Sentinel 1 product? I see from the MPC docs that it uses the PlanetDEM product, which according to this is from the ALOS World 3D-30m + NASADEM. I hadn't compared this one to Copernicus DEM before but there might be difference in the two (ALOS is SAR, and Cop DEM is a downsampled version of Airbus's WorldDEM made from TerraSAR-X if I recall correctly, so there would at least be some similarities). This might lead to some inconsistency between the SAR corrected imagery & the DEM used? That being said I also understand this is probably a lot faster to implement with an off-the-shelf SAR dataset :D

Answered at #19 (comment)

  1. Copernicus DEM nearest neighbour resampling (?)

Answered at #21 (comment)

  1. Band choices

I'm not too sure why this code has 11 Sentinel 2 bands, while this here in the benchmark only keeps 10? And of those 10, why remove the SCL band?

Clay model v0.1 itself takes 10 Sentinel-2 bands, the SCL 'band' isn't an actual radiometric band, but more of a quality flag. We have the SCL band in the datacube because it is used in other places, e.g. @lillythomas's recent work on patch-level cloud cover percentages at #184.

Also in the benchmark script only VV and VH are kept for Sentinel 1, why not HH & HV?

Sentinel-1 HH and HV are typically available over the polar regions only (Greenland and Antarctica), and MPC's Sentinel-1 RTC product doesn't have them, so we didn't use it.

@weiji14 weiji14 added the question Further information is requested label Mar 27, 2024
@MaxLenormand
Copy link
Contributor Author

Thanks a bunch of answering all of those!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants