Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delayed/chunked opening (sentinel) SAFE data with bands as variables fails #761

Open
NiklasPhabian opened this issue Mar 25, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@NiklasPhabian
Copy link

NiklasPhabian commented Mar 25, 2024

Code Sample

Here, a cropped S2B_MSIL2A.SAFE.zip, containing only two 60 m bands.

import rioxarray

path_to_safe_zip = `S2B_MSIL2A.SAFE.zip`
safe_name = path_to_safe_zip.replace('.zip', '')
sds_path = 'SENTINEL2_L2A:/vsizip//{path_to_safe_zip}/{safe_name}/MTD_MSIL2A.xml:60m:EPSG_32611'

rds = rioxarray.open_rasterio(path, chunks=True, band_as_variable=True)
diff = (rds['band_1'] - rds['band_2']).compute()

Less abstract:

import rasterio
import rioxarray
from xarray.backends.file_manager import CachingFileManager,
from xarray.backends.locks import SerializableLock

RASTERIO_LOCK = SerializableLock()

manager = CachingFileManager(rasterio.open, path, mode='r', lock=RASTERIO_LOCK)
riods = manager.acquire()

dsr1 = rioxarray._io.SingleBandDatasetReader(riods, 0)
b1 = rioxarray.open_rasterio(dsr1, chunks=True).squeeze().drop_vars('band')

dsr2 = rioxarray._io.SingleBandDatasetReader(riods, 1)
b2 = rioxarray.open_rasterio(dsr2, chunks=True).squeeze().drop_vars('band')
diff = (b1-b2).compute()

Problem description

rioxarray fails to load subdatasets of sentinel2 L2A granules correctly when opened delayed/chunked and with band_as_variable=True.

Sentinel2 L2A is distributed in the SAFE format, described here. These granules contain 4 subdatasets; one for each band (10 m, 20 m, 60 m) and a (virtual) 10 m true color image (TCI) dataset. When pointed at the zipped SAFE folder, rioxarray loads 3 subdatasets (the 10 m subdataset is overwritten by the TCI sudataset, but that is a separate issue) and ignores the band_as_variable argument.

We can also point rioxarray.open_rasterio() directly to a subdataset by prepending SENTINEL2_L2A:/ and appending the subdataset's name, e.g. /MTD_MSIL2A.xml:60m:EPSG_32611 to the SAFE path.

When loading the granules with chunks=True and band_as_variable=True and performing computations on the delayed variable arrays (before loading them into memory), the data gets corrupted in the sense that all variable data is pointing to the last referenced variable data. This seems to be caused by the fact that the SingleBandDatasetReader instances of each variable is sharing the same name property (i.e. the path of the dataset), leaving dask to be unable to distinguish between them when loading the data.

Providing a disambiguation e.g. by appending the band id (bidx) to the name property of the SingleBandDatasetReader in rioxarray._io solves the issue:

    @property
    def name(self):
        """
        str: name of the dataset. Usually the path.
        """
        if isinstance(self._riods, rasterio.vrt.WarpedVRT):
            return self._riods.src_dataset.name + "-" + str(self._bidx)
        return self._riods.name + "-" + str(self._bidx)

Environment Information

rioxarray (0.15.2.dev0) deps:
  rasterio: 1.3.9
    xarray: 2024.1.1
      GDAL: 3.8.4
      GEOS: 3.12.1
      PROJ: 9.3.1
 PROJ DATA: /Users/griessban/Library/Application Support/proj:/Users/griessban/miniconda3/envs/rioxarray/share/proj:/Users/griessban/miniconda3/envs/rioxarray/share/proj
 GDAL DATA: /Users/griessban/miniconda3/envs/rioxarray/share/gdal

Other python deps:
     scipy: 1.12.0
    pyproj: 3.6.1
System:
    python: 3.12.1 | packaged by conda-forge | (main, Dec 23 2023, 08:01:35) [Clang 16.0.6 ]
executable: /Users/griessban/miniconda3/envs/rioxarray/bin/python
   machine: macOS-14.4-arm64-arm-64bit
@NiklasPhabian NiklasPhabian added the bug Something isn't working label Mar 25, 2024
@snowman2
Copy link
Member

Thanks for the analysis and a solution. A MR with the fix is welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants