Slow boundless read performance? #3077
Replies: 1 comment 3 replies
-
@sanderjansen thanks for the questions! GDAL's block cache uses filenames in its keys. For a rasterio boundless read, the filename is an VRT XML text. Since you're changing the extent of the VRT (via the changing window) with each read, the XML "filename" changes, and you won't see a cache hit. There is some overhead to a boundless read, yes. I see it in the difference between the times in rows 2 and 0 in your table. I have no explanation for the ~20% increase between row 3 and 2. I'd expect the time to be slightly better with caching, because the blocks of file.tif could be usefully cached even if the VRT blocks are not. Precisely timing program execution can be tricky. For an application like this, the way to take advantage of caching would be to create one VRT up front, with an extent large enough to cover all your windows, and then make non-boundless reads using windows into that large VRT. |
Beta Was this translation helpful? Give feedback.
-
As part of optimizing our rendering pipeline, we were hoping to take advantage of gdal native caching to do many windowed reads from the same file. We discovered however that the boundless option completely negates any caching speed ups:
In the contrived case above we simulate different windowed reads. Trying it with different parameters, I got:
Any clue what is going on? Is the In-Memory VRT preventing from optimal read caching? Or is the VRT operation itself just slow.
Also I came across this highly questionable commit which does the exact opposite of what the commit message says:
adbd848
I did try removing the shared flag but that didn't seem to have any affect.
Beta Was this translation helpful? Give feedback.
All reactions