New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extended hang in GDALDataset destructor after (read-only) RasterIO call on GTiff (Windows specific) #9510
Comments
After some quality time in the debugger, it appears to me that what is happening -- in the case where there is a hang-on-exit -- is that, for each GDALRasterBand of the GTiffDataset, the code enters
I have to think that this cache flush shouldn't be occurring from within the destructor, since the dataset was opened read-only (and the code hasn't tried to open any on-disk file for write)? Please let me know if there's any more info I can provide. |
Don't know if this helps any further, but whenever I randomly break during the apparent hang, the point of execution as per Visual Studio is inside |
This might be depending on the amount of RAM they have, which influences the size of the GDAL block cache which can be set with GDAL_CACHEMAX (https://gdal.org/user/configoptions.html#performance-and-caching). Do you observe this on release builds of GDAL? I wouldn't expect freeing just 2 GB of RAM to take 16 seconds... |
Yes, I'm seeing the same slow-down on close (for input TIFF files and machines where it happens) in both debug and release builds of GDAL.
I agree, which is why I found this so odd! But I haven't been able to pin down what is different between input file/machine combinations where I see this problem, versus those where I don't. What would you recommend I try setting GDAL_CACHEMAX to, to influence this behavior? |
first, run your code with CPL_DEBUG=ON and look at the "GDAL: GDAL_CACHEMAX = xxxx MB" trace in the slow and fast cases, and start playing with that information |
If your GeoTIFF is uncompressed, you may also try setting the GTIFF_DIRECT_IO=YES configuration option (see https://gdal.org/drivers/raster/gtiff.html#configuration-options) |
If I On the other machine, a desktop with 128 GB RAM, I have to reduce it further and With With So to sum up, to observe the slowdown:
Thanks again, and please let me know if you'd like any more info! |
I've investigated that, and it is indeed a Windows specific issue I could reproduce with the following toy code:
So I've tried regular malloc()/free(), _aligned_malloc()/_aligne_free(), new[]/delete[], and all show the same issue: freeing a lot of allocation is extremely slow. Using a private heap, I've managed to get decent performance, but only when setting a non-zero maximum heap size at HeapCreate() time. However, while we might now it from GDAL_CACHEMAX, this still consumes the corresponding virtual memory, which might not be desirable. |
Other findings, the issue only appears if SINGLE_ALLOC_SIZE is strictly greater than 4 * 4096 bytes. |
What is the bug?
On Windows only (i.e. not Linux), I am seeing an extended hang of 15-20 seconds when freeing up a GDALDataset object that was created by opening a large (~ 2 GB) GeoTIFF file with GA_ReadOnly setting.
Curiously, I am seeing this for some files on both the machines where I am testing; but for other files, I see it only on one of the two machines. (Same GDAL version, same bit-for-bit file) Hence I am suspecting perhaps an uninitialized variable may be influencing the behavior.
We have observed this issue both with the most recently-released GDAL 3.8.4, and also with the GDAL version we were using prior to upgrading (2.2.1).
Steps to reproduce the issue
I've written a minimal test program, pasted below, that uses the GDAL API to replicate the hang. When I do not call GDALRasterBand::RasterIO() the hang does not happen so perhaps it has something to do with a "dirty" flag.
When run against a file that doesn't exhibit the hanging behavior at close time, the output looks something like this:
When run against a file that does exhibit the hang at close, the output looks something like this:
Test program "testme.cpp" --
To run the test program, run it with one argument, which should be a large GeoTIFF file with 8-bit bands. E.g.:
Here is one possibility for obtaining such a file. Take note that I've been able to reproduce the hang with this file on only one of two machines I tested, so you might not immediately observe it.
dutch_wmts.zip
gdal_translate -projwin 100000 480147 100147 480000 -outsize 22400 22400 dutch_wmts.xml 2G.tif
Versions and provenance
GDAL 3.8.4, released 2024/02/08
Additional context
Thank you in advance for any help you can provide! If there is a work-around we could use in our own code that doesn't involve editing GDAL code, that would be most welcome information also.
The text was updated successfully, but these errors were encountered: