You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
rioxarray fails to load subdatasets of sentinel2 L2A granules correctly when opened delayed/chunked and with band_as_variable=True.
Sentinel2 L2A is distributed in the SAFE format, described here. These granules contain 4 subdatasets; one for each band (10 m, 20 m, 60 m) and a (virtual) 10 m true color image (TCI) dataset. When pointed at the zipped SAFE folder, rioxarray loads 3 subdatasets (the 10 m subdataset is overwritten by the TCI sudataset, but that is a separate issue) and ignores the band_as_variable argument.
We can also point rioxarray.open_rasterio() directly to a subdataset by prepending SENTINEL2_L2A:/ and appending the subdataset's name, e.g. /MTD_MSIL2A.xml:60m:EPSG_32611 to the SAFE path.
When loading the granules with chunks=True and band_as_variable=True and performing computations on the delayed variable arrays (before loading them into memory), the data gets corrupted in the sense that all variable data is pointing to the last referenced variable data. This seems to be caused by the fact that the SingleBandDatasetReader instances of each variable is sharing the same name property (i.e. the path of the dataset), leaving dask to be unable to distinguish between them when loading the data.
Providing a disambiguation e.g. by appending the band id (bidx) to the name property of the SingleBandDatasetReader in rioxarray._io solves the issue:
@propertydefname(self):
""" str: name of the dataset. Usually the path. """ifisinstance(self._riods, rasterio.vrt.WarpedVRT):
returnself._riods.src_dataset.name+"-"+str(self._bidx)
returnself._riods.name+"-"+str(self._bidx)
Code Sample
Here, a cropped S2B_MSIL2A.SAFE.zip, containing only two 60 m bands.
Less abstract:
Problem description
rioxarray fails to load subdatasets of sentinel2 L2A granules correctly when opened delayed/chunked and with
band_as_variable=True
.Sentinel2 L2A is distributed in the SAFE format, described here. These granules contain 4 subdatasets; one for each band (10 m, 20 m, 60 m) and a (virtual) 10 m true color image (TCI) dataset. When pointed at the zipped SAFE folder, rioxarray loads 3 subdatasets (the 10 m subdataset is overwritten by the TCI sudataset, but that is a separate issue) and ignores the
band_as_variable
argument.We can also point
rioxarray.open_rasterio()
directly to a subdataset by prependingSENTINEL2_L2A:/
and appending the subdataset's name, e.g./MTD_MSIL2A.xml:60m:EPSG_32611
to the SAFE path.When loading the granules with
chunks=True
andband_as_variable=True
and performing computations on the delayed variable arrays (before loading them into memory), the data gets corrupted in the sense that all variable data is pointing to the last referenced variable data. This seems to be caused by the fact that theSingleBandDatasetReader
instances of each variable is sharing the samename
property (i.e. the path of the dataset), leavingdask
to be unable to distinguish between them when loading the data.Providing a disambiguation e.g. by appending the band id (
bidx
) to thename
property of theSingleBandDatasetReader
inrioxarray._io
solves the issue:Environment Information
The text was updated successfully, but these errors were encountered: