Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing a large tiff without specifying BIGTIFF="YES" silently fails writing some blocks #709

Open
alessioarena opened this issue Nov 6, 2023 · 5 comments
Labels
bug Something isn't working upstream Issue is related to a dependency (upstream package).

Comments

@alessioarena
Copy link

Code Sample, a copy-pastable example if possible

A "Minimal, Complete and Verifiable Example" will make it much easier for maintainers to help you:
http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

import xarray as xr
import dask.array as da
import rioxarray as rio

size = (30_000, 60_000)

data = xr.DataArray(
    data = da.random.random(size), 
    coords={'y':np.linspace(0, size[0]*10, size[0]), 'x':np.linspace(0, size[1]*10, size[1])},
    dims=('y', 'x'),
)
data = data.rio.set_crs(3857)

data[::100, ::100].plot()
# you should get something like the image in Expected Output

data.rio.to_raster('test.tif', COMPRESS="DEFLATE")

rio.open_rasterio('test.tif', chunks='auto', parallel=True, lock=False).isel(band=0)[::100, ::100].plot()
# you should get something partial the image in Problem Description

Problem description

I came across this issue recently, and seems it is linked to using COMPRESS="DEFLATE".

If running the code above, saving the image succeeds with no issue or warning raised.
However, upon opening the image it looks partial.
Untitled

If performing the same exact operation using rasterio, instead I get this error.
https://gis.stackexchange.com/questions/368251/error-occurred-while-writing-dirty-block-from-gdalrasterbandirasterio
This as the post explains it is linked to not specify BIGTIFF="YES"

Expected Output

Either a correctly saved image, or the error being raised
Untitled

Environment Information

  • python -c "import rioxarray; rioxarray.show_versions()"

Python version : 3.10.12
Platform : Linux
xarray : 2023.10.1
pandas : 2.1.1
dask : 2023.10.0
numpy : 1.23.4
rasterio : 1.3.9
rioxarray : 0.15.0
geopandas : 0.14.0
shapely : 2.0.2
zarr : 2.16.1
matplotlib : 3.8.0
cartopy : 0.22.0
nbic_utils : 2.0.0
xrutils : 2.0.0

Installation method

pypi

@alessioarena alessioarena added the bug Something isn't working label Nov 6, 2023
@snowman2
Copy link
Member

snowman2 commented Nov 8, 2023

This is likely due to using a dask array when writing as it uses a different writing mechanism. Do you run into this issue with a numpy array?

@pfuhe1
Copy link

pfuhe1 commented Dec 19, 2023

I also have had this issue - the silent failing seems related to using dask

@RichardScottOZ
Copy link
Contributor

I think I have seen with rasterio, too...will just write 4GB worth and rest is empty.

@snowman2
Copy link
Member

snowman2 commented Mar 1, 2024

I am guessing this is related: #220
See: https://corteva.github.io/rioxarray/latest/examples/dask_read_write.html

@snowman2 snowman2 added the upstream Issue is related to a dependency (upstream package). label Apr 22, 2024
@snowman2
Copy link
Member

From: https://gdal.org/drivers/raster/gtiff.html

Default: BIGTIFF=IF_NEEDED
Description: "will only create a BigTIFF if it is clearly needed (in the uncompressed case, and image larger than 4GB. So no effect when using a compression)."

In your example, COMPRESS="DEFLATE". So, you need to set BIGTIFF=TES for it to work successfully. In order for a more explicit error message, GDAL is where the change likely would need to happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working upstream Issue is related to a dependency (upstream package).
Projects
None yet
Development

No branches or pull requests

4 participants