Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offer a "low memory mode" and multiprocessing of array chunks #79

Open
vlandau opened this issue Oct 2, 2020 · 11 comments
Open

Offer a "low memory mode" and multiprocessing of array chunks #79

vlandau opened this issue Oct 2, 2020 · 11 comments
Assignees
Labels
enhancement New feature or request performance Related to compute and memory efficiency
Milestone

Comments

@vlandau
Copy link
Member

vlandau commented Oct 2, 2020

Often grids are so large that allocating arrays requires tons of memory. Each moving window solve is independent, so a low-memory mode may be relatively simple.


Two options come to mind:

1. Split the landscape into chunks, and solve the chunks in serial

The landscape could be split into adjacent, non-overlapping rectangular or square chunks. Each chunk (and corresponding inputs) would need to be buffered by radius (omniscape argument) so that the pixels at the edge of the unbuffered chunk aren't missing any data in their moving windows. Then, results for each chunk could be stored and mosaicked, taking the max value. If done correctly, his would result in the same exact map as would be produced in Omniscape v0.4.2.

2. Lazy reading and writing for each moving window solve

A given moving window solve only needs the data directly pertinent to it, so there is no need to store the entire raster inputs in-memory. Just load what is needed for each moving window (GeoData.jl has functionality to do this), solve it, lazily-write to an output raster then remove/garbage collect data and move on to the next solve. Thread safety could be a little tricky. Each thread could write to its own output component, then output components could be summed, but this would need to be done lazily/by chunk as well, otherwise you'd end up loading all the components in memory, which would take as much memory as the current method of Omniscape. Another consideration is computing the coordinates and source strengths of target pixels (the ones on which the moving windows are centered). The biggest memory hog in Omniscape is the allocation of the array(s) that store(s) current values. They have the same number of rows and columns asn the resistance surface, but have a 3rd dimension equal to the number of parallel workers. If computing normalized current flow in addition to cumulative current, there are two of them! Because of this, even if the source strength layer needs to be loaded in full into memory, memory savings would still be substantial.


I tend to like option 2, it's the slickest, and if done in a thread safe way, I think it would be more straightforward to implement in a way such that errors from edge cases would be less likely, but IO overhead might be significant -- not sure.

@vlandau vlandau added enhancement New feature or request help wanted Extra attention from other devs/contributors requested performance Related to compute and memory efficiency labels Oct 2, 2020
@glaroc
Copy link

glaroc commented Feb 16, 2021

This would be a huge improvement for the scalability of Omniscape. I feel like solution 1 would be infinitely scalable, which isn't the case for solution 2, no?

@vlandau
Copy link
Member Author

vlandau commented Feb 17, 2021

Yeah, on a second read after leaving this for a while, I'm preferring option 1 above... I think it would be faster and also easy to test for correctness.

@vlandau vlandau self-assigned this Feb 17, 2021
@glaroc
Copy link

glaroc commented Feb 17, 2021

But thinking more about it, option 2 if implemented well could be more suited for HPC where each thread is using very little memory. Ultimately, this might be the best solution.

@vlandau
Copy link
Member Author

vlandau commented Feb 17, 2021

Yeah, I think option 1 has the opportunity to be much faster though (lazy read/write plus associated lock/unlocks in the code for thread safety for that many operations could involve massive overhead).

Also, if in option 2 there were N output GeoTiffs that were written to separately for thread safety, where N is the number of threads, then each of those N tiles would be the full size of the landscape, which could introduce additional memory bottlenecks when adding them together.

@glaroc
Copy link

glaroc commented Mar 11, 2021

I tested option 1 by splitting the input resistance map in Bash+GDAL first into overlapping tiles, then running Omniscape on each tile, and finally remosaic the tiles in Python using rasterio.merge(method="max). It works really well and you don't see any stitch marks in the final output. I looked at the Omniscape code and I see it would require some significant refactoring to do this within Omniscape. It also seems quite difficult to do the final merge in GDAL only using the max pixel value. I don't know if there is an equivalent merge function in Julia.

@vlandau
Copy link
Member Author

vlandau commented Mar 11, 2021

Awesome! Glad it worked well. I think there might be a way to mosiac the TIFFs using GDAL.jl or ArchGDAL.jl? But actually, I use gdal_merge.py from the command line when merging rasters, and maybe that's not available in Julia...

We could possibly do it with GeoData.jl. There may be a way to create an empty GDALarray, then write to it lazily using GeoData.jl. The key is lazy writing, to ensure the most efficient use of RAM.

cc @rafaqz, who might have some ideas/thoughts. I think GeoData.jl (or maybe ArchGDAL?) might have what we need.

I see the workflow as having these steps:

  1. Load in the raster inputs as GeoData.GDALarrays so they're not fully loaded into memory
  2. Identify tile extents (such that they overlap enough, and based on user-provided params), with each "extent" being an Array{Number, 1} of length 4 (North, South, East, West coords), so we'd have an Array{Array{Number, 1}, 1}
  3. Loop through the Array{Array{Number, 1}, 1} and load that extent of each input raster into memory, and for each:
    • Run Omniscape
    • Write that Omniscape output to disk
    • Clear outputs from memory/garbage collect
  4. Create an empty GDALarray and iteratively add each Omniscape output to it lazily

I'd like to start working on these components when I have time -- It might be a bit until I have a chunk of time to do it, but I think we're starting to get a somewhat clear plan of action at this point.

@glaroc
Copy link

glaroc commented Mar 11, 2021

Just a note to mention that gdal_merge.py doesn't allow to take the maximum value of overlapping pixels. If you don't do this, there will be some serious stitch marks.

@vlandau
Copy link
Member Author

vlandau commented Mar 11, 2021

Also, the tiling of the landscape could end up giving us a solid foundation from which to build out functionality to enable hierarchical parallelism.

@rafaqz
Copy link

rafaqz commented Mar 12, 2021

GeoData.jl has the open method the you use with a do block to write to the open ArchGDAL.jl/DiskArrays.jl dataset underneath. I havent really tested it for writing, more just for reading larger-than-memory files. But this should be doable and easy in Julia, I havent petsonally been pushed to ensure that because my datasets are only ever that big along the time dimension. I would be happy to help out - I think this being easy should be a goal of the julia geo ecosystem.

@vlandau
Copy link
Member Author

vlandau commented Mar 12, 2021

Thanks @rafaqz! I'll keep you in the loop when I find time to work on this, and may solicit some feedback as I go 🙂

@vlandau vlandau changed the title Offer a "low memory mode" Offer a "low memory mode" and multiprocessing of array chunks Sep 8, 2021
@vlandau vlandau added this to the Version 1.0 milestone Sep 8, 2021
@vlandau
Copy link
Member Author

vlandau commented Sep 8, 2021

Version 1 will be the way I go, as it has several benefits, including making multiprocessing easy to do, which will enable hierarchical parallel processing (e.g. see #106)

@vlandau vlandau removed the help wanted Extra attention from other devs/contributors requested label Oct 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance Related to compute and memory efficiency
Projects
None yet
Development

No branches or pull requests

3 participants