Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Tiled merge #2221

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft

[WIP] Tiled merge #2221

wants to merge 5 commits into from

Conversation

groutr
Copy link
Contributor

@groutr groutr commented Jun 25, 2021

An experiment to see if merge could be made more memory efficient using windowed read/write. Testing with the dataset in #2174 seems to produce good results when visually inspected (ie no lines of missing data). I've only manually tested at this point, but thought I would open a draft PR for early feedback.

Advantages:

  • Tries to keep only one or two tiles in memory at a time. Memory usage of tiles is tunable (see arrblocks). This probably allows arbitrary sized merges.
  • Using windowed reads/writes seem to side-step the rounding issues in Merge is prone to off-by-one pixel errors #2205
  • Each block is independent. Perhaps parallel merges are in the future?
  • Window reads can be aligned to the block sizes of the input datasets allowing for efficient reads. See the arrblocksxy function.
  • Merge loop seems to be shorter and more concise.

Disadvantages

  • Code is super new and therefore super untested.
  • Appears to trade speed for memory efficiency

@groutr groutr marked this pull request as draft June 25, 2021 04:04
@groutr
Copy link
Contributor Author

groutr commented Jun 25, 2021

In-memory merge method (master) (15s):
merge_old

Tiled merge (48s):
merge_new_tiled

While the runtime of the tiled merge does seem to be significantly slower, a quick glance at a profile reveals that almost 80% of the time is spent doing IO, reading and writing from/to a compressed GTiff.

@groutr
Copy link
Contributor Author

groutr commented Jun 25, 2021

TODO: handle resolution changes.
The code as it stands now, only works if everything is at the same resolution.

@groutr
Copy link
Contributor Author

groutr commented Sep 14, 2021

@sgillies Can I get your feedback? Is this the proper direction to expand merge?

There are two things happening here:

  1. Tiled merge functionality. Some preliminary testing shows that by threading the IO, a good portion of the IO costs can be mitigated. Not intended to replace in-memory merge, but complement it for out-of-core merges.
  2. Port the window bounds and intersection from gdal_merge. I found that gdal_merge uses a different rounding method for windows which appears to resolve Merge is prone to off-by-one pixel errors #2205 (at least in the cases I have seen so far). Perhaps it might be useful to add this type of rounding to rasterio in the event that the current rounding fails.

This PR is a proof-of-concept for those two things. If anything is useful here, it should be added in its own PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant