Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extended hang in GDALDataset destructor after (read-only) RasterIO call on GTiff (Windows specific) #9510

Open
kmccarty-dgi opened this issue Mar 19, 2024 · 9 comments

Comments

@kmccarty-dgi
Copy link

kmccarty-dgi commented Mar 19, 2024

What is the bug?

On Windows only (i.e. not Linux), I am seeing an extended hang of 15-20 seconds when freeing up a GDALDataset object that was created by opening a large (~ 2 GB) GeoTIFF file with GA_ReadOnly setting.

Curiously, I am seeing this for some files on both the machines where I am testing; but for other files, I see it only on one of the two machines. (Same GDAL version, same bit-for-bit file) Hence I am suspecting perhaps an uninitialized variable may be influencing the behavior.

We have observed this issue both with the most recently-released GDAL 3.8.4, and also with the GDAL version we were using prior to upgrading (2.2.1).

Steps to reproduce the issue

I've written a minimal test program, pasted below, that uses the GDAL API to replicate the hang. When I do not call GDALRasterBand::RasterIO() the hang does not happen so perhaps it has something to do with a "dirty" flag.

When run against a file that doesn't exhibit the hanging behavior at close time, the output looks something like this:

register: dt = 2 ms
open: dt = 0 ms
get band: dt = 0 ms
alloc: dt = 0 ms
read: dt = 589 ms
process: dt = 24 ms
dealloc: dt = 16 ms
destroy: dt = 34 ms

When run against a file that does exhibit the hang at close, the output looks something like this:

register: dt = 2 ms
open: dt = 0 ms
get band: dt = 0 ms
alloc: dt = 0 ms
read: dt = 912 ms
process: dt = 21 ms
dealloc: dt = 14 ms
destroy: dt = 16951 ms

Test program "testme.cpp" --

// Compile with flags /EHsc /std:c++20 /W4 /permissive-

#include "gdal_priv.h"

#include <cassert>
#include <chrono>
#include <cstdio>

static auto now()
{
    return std::chrono::high_resolution_clock::now();
}

static void report_time(const char * label, auto & t1)
{
    auto t2 = now();
    auto delta = std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1);
    fprintf(stderr, "%s: dt = %lld ms\n", label, (long long)delta.count());
    t1 = now();
}

int main(int argc, char ** argv)
{
    unsigned char c = 255;
    auto t1 = now();

    GDALAllRegister();
    // Forbid generation of ".aux.xml" files
    CPLSetConfigOption("GDAL_PAM_ENABLED", "NO");

    report_time("register", t1);

    std::unique_ptr<GDALDataset>
      img{static_cast<GDALDataset*>(GDALOpen(argv[1], GA_ReadOnly))};
    report_time("open", t1);

    assert(img && img->GetRasterCount() > 0);
    auto band = img->GetRasterBand(1);
    assert(band->GetRasterDataType() == GDT_Byte);
    int x = band->GetXSize(), y = band->GetYSize();
    assert(x > 0 && y > 0);
    report_time("get band", t1);

    size_t n = size_t(x) * size_t(y);
    auto p = new unsigned char [n];
    report_time("alloc", t1);

#ifndef NO_RASTER_IO
    auto stat = band->RasterIO(GF_Read, 0, 0, x, y, p, x, y, GDT_Byte, 1, x);
    assert(stat == CE_None);
    report_time("read", t1);

    // Do something with the pixels so the optimizer doesn't skip anything
    for (size_t i = 0; i < n; ++i)
	c ^= p[i];

    report_time("process", t1);
#endif

    delete [] p;
    report_time("dealloc", t1);

    img.reset();
    report_time("destroy", t1);

    return n % (int(c) + 1);
}

To run the test program, run it with one argument, which should be a large GeoTIFF file with 8-bit bands. E.g.:

testme.exe 2G.tif

Here is one possibility for obtaining such a file. Take note that I've been able to reproduce the hang with this file on only one of two machines I tested, so you might not immediately observe it.

dutch_wmts.zip

  • Unzip the above to obtain an XML file for a WMS
  • gdal_translate -projwin 100000 480147 100147 480000 -outsize 22400 22400 dutch_wmts.xml 2G.tif

Versions and provenance

  • GDAL version: GDAL 3.8.4, released 2024/02/08
  • Windows 10
  • Self-built binaries using Visual C++ (cl.exe 19.29.30151)
  • Dependent library versions:
    • CURL: 8.4.0
    • ZLIB: 1.3.1
    • PROJ: 9.3.1
    • libgeotiff: 1.7.1
    • libtiff: 4.6.0

Additional context

Thank you in advance for any help you can provide! If there is a work-around we could use in our own code that doesn't involve editing GDAL code, that would be most welcome information also.

@kmccarty-dgi
Copy link
Author

kmccarty-dgi commented Mar 19, 2024

After some quality time in the debugger, it appears to me that what is happening -- in the case where there is a hang-on-exit -- is that, for each GDALRasterBand of the GTiffDataset, the code enters GDALArrayBandBlockCache::FlushCache(void), which then proceeds to iterate over all the blocks per column and blocks per row to call FlushBlock(). This is the call stack at this point:

	gdald.dll!GDALArrayBandBlockCache::FlushCache() Line 275	C++
 	gdald.dll!GDALRasterBand::FlushCache(bool bAtClosing) Line 1123	C++
 	gdald.dll!GDALDataset::FlushCache(bool bAtClosing) Line 569	C++
 	gdald.dll!GDALPamDataset::FlushCache(bool bAtClosing) Line 188	C++
 	gdald.dll!GTiffDataset::FlushCacheInternal(bool bAtClosing, bool bFlushDirectory) Line 2104	C++
 	gdald.dll!GTiffDataset::Finalize() Line 233	C++
 	gdald.dll!GTiffDataset::Close() Line 153	C++
 	gdald.dll!GTiffDataset::~GTiffDataset() Line 141	C++

I have to think that this cache flush shouldn't be occurring from within the destructor, since the dataset was opened read-only (and the code hasn't tried to open any on-disk file for write)? Please let me know if there's any more info I can provide.

@kmccarty-dgi
Copy link
Author

Don't know if this helps any further, but whenever I randomly break during the apparent hang, the point of execution as per Visual Studio is inside VSIFreeAligned(). (which on Windows does nothing more than call _aligned_free().)

@rouault
Copy link
Member

rouault commented Mar 19, 2024

Take note that I've been able to reproduce the hang with this file on only one of two machines I tested, so you might not immediately observe it.

This might be depending on the amount of RAM they have, which influences the size of the GDAL block cache which can be set with GDAL_CACHEMAX (https://gdal.org/user/configoptions.html#performance-and-caching). Do you observe this on release builds of GDAL? I wouldn't expect freeing just 2 GB of RAM to take 16 seconds...

@kmccarty-dgi
Copy link
Author

Yes, I'm seeing the same slow-down on close (for input TIFF files and machines where it happens) in both debug and release builds of GDAL.

I wouldn't expect freeing just 2 GB of RAM to take 16 seconds...

I agree, which is why I found this so odd! But I haven't been able to pin down what is different between input file/machine combinations where I see this problem, versus those where I don't. What would you recommend I try setting GDAL_CACHEMAX to, to influence this behavior?

@rouault
Copy link
Member

rouault commented Mar 19, 2024

What would you recommend I try setting GDAL_CACHEMAX to, to influence this behavior?

first, run your code with CPL_DEBUG=ON and look at the "GDAL: GDAL_CACHEMAX = xxxx MB" trace in the slow and fast cases, and start playing with that information

@rouault
Copy link
Member

rouault commented Mar 19, 2024

If your GeoTIFF is uncompressed, you may also try setting the GTIFF_DIRECT_IO=YES configuration option (see https://gdal.org/drivers/raster/gtiff.html#configuration-options)

@kmccarty-dgi
Copy link
Author

If I set GDAL_CACHEMAX=2% then I do not see the problem with either of my large test TIFF files ... whereas if I set it to 10% then I do see the issue with both of them. (This is on a laptop with 32 GB RAM.) So this gives me some hope that you might be able to reproduce locally by tuning that setting using the TIFF file I described above how to obtain, or maybe with any large GeoTIFF.

On the other machine, a desktop with 128 GB RAM, I have to reduce it further and set GDAL_CACHEMAX=1% in order not to observe the slowdown-on-close.

With CPL_DEBUG=ON, it looks to me like the slowdown-on-exit happens exactly in the cases when GDAL_CACHEMAX is large enough that this message does NOT appear: "GTiff: Disable aggressive band caching. Cache not big enough. At least 1702774400 bytes necessary"

With GTIFF_DIRECT_IO=YES I do not see the slowdown-on-exit at all regardless of setting of GDAL_CACHEMAX. But we'd like to support compressed as well as uncompressed TIFFs, so I don't see this setting as being something we can feasibly use ourselves.

So to sum up, to observe the slowdown:

  • The TIFF file has to be large enough (500 MB+ uncompressed?) that the slowdown-on-exit is clearly a problem
  • The GDAL_CACHEMAX setting (or default) has to be large enough that the uncompressed TIFF fits in cache
  • The GTIFF_DIRECT_IO setting should not be ON
  • (As mentioned in the original report) A call to GDALRasterBand::RasterIO() has been made

Thanks again, and please let me know if you'd like any more info!

@rouault
Copy link
Member

rouault commented Mar 21, 2024

I've investigated that, and it is indeed a Windows specific issue I could reproduce with the following toy code:

#include <windows.h>
#include <stdlib.h>
#include <stdio.h>
#include <assert.h>

HANDLE heap = NULL;
int flag = 0; 

int SINGLE_ALLOC_SIZE = 21200;
int NUMBER_OF_ALLOCS = 21200 * 4;

class CMyClass
{
public:
  CMyClass()
  {
      // lpData = malloc(SINGLE_ALLOC_SIZE);
      // lpData = _aligned_malloc(SINGLE_ALLOC_SIZE, 64);
      lpData = HeapAlloc(heap, flag, SINGLE_ALLOC_SIZE);
      //lpData = new char[SINGLE_ALLOC_SIZE];
      assert(lpData);
  };

  ~CMyClass()
  {
      // free(lpData);
      //_aligned_free(lpData);
      HeapFree(heap, flag, lpData);
      //delete[] lpData;
  };

public:
  void* lpData;
};

int main()
{
    do
    {
#ifdef FAST
        size_t max_heap_size = size_t(2) * 1000 * 1000 * 1000;
#else
        size_t max_heap_size = 0;
#endif
        heap = HeapCreate(flag, flag, max_heap_size);
        printf("start\n");
        CMyClass *lpList = new CMyClass[NUMBER_OF_ALLOCS];
        printf("after alloc. starting freeing\n");
        delete [] lpList;
        printf("call HeapDestroy\n");
        HeapDestroy(heap);
        printf("end\n");
    }
    while(1);
    return 0;
}

So I've tried regular malloc()/free(), _aligned_malloc()/_aligne_free(), new[]/delete[], and all show the same issue: freeing a lot of allocation is extremely slow. Using a private heap, I've managed to get decent performance, but only when setting a non-zero maximum heap size at HeapCreate() time. However, while we might now it from GDAL_CACHEMAX, this still consumes the corresponding virtual memory, which might not be desirable.

@rouault
Copy link
Member

rouault commented Mar 21, 2024

Other findings, the issue only appears if SINGLE_ALLOC_SIZE is strictly greater than 4 * 4096 bytes.

@rouault rouault changed the title Extended hang in GDALDataset destructor after (read-only) RasterIO call on GTiff Extended hang in GDALDataset destructor after (read-only) RasterIO call on GTiff (Windows specific) Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants