Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute flow potential and current flow in separate loops to save memory #95

Open
vlandau opened this issue Apr 1, 2021 · 5 comments
Open
Labels
performance Related to compute and memory efficiency
Milestone

Comments

@vlandau
Copy link
Member

vlandau commented Apr 1, 2021

Right now both flow potential and cumulative current are solved in the same loop.

e.g.:

flow_potential_storage_array = fill(0.0, (size(resistance)..., nthreads())
cumulative_current_storage_array = fill(0.0, (size(resistance)..., nthreads())
for i in moving_windows
    # solve flow potential for window i
    # add flow potential for window i to the flow potential storage array
    # solve current flow for window i
    # add current for window i to the cumulative current storage array
end
# sum flow potential along 3rd dim
# sum cumulative current along 3rd dim

so storage arrays (X by Y by N_THREADS; which are then summed along the N_THREADS dimension) need to be allocated for both at the same time... if flow potential was solved first, then it could be summed and stored as an X by Y array, and the storage array removed before allocating the array for storing cumulative current. Shouldn't take any longer at all, but will be much more efficient.

The new code would look like:

flow_potential_storage_array = fill(0.0, (size(resistance)..., nthreads())
for i in moving_windows
    # solve flow potential for window i
    # add flow potential for window i to the flow potential storage array
end
# sum flow potential along 3rd dim (new object of size size(resistance))
flow_potential_storage_array = nothing

cumulative_current_storage_array = fill(0.0, (size(resistance)..., nthreads())
for i in moving_windows
    # solve current flow for window i
    # add current for window i to the cumulative current storage array
end
# sum cumulative current along 3rd dim

Need to give some more thought to memory management in general, though.... e.g. maybe there is a better way to do this than by allocating a separate array for each thread to make it threadsafe -- I just didn't want to bother with locks/unlocks, as I know that can come with a compute time penalty.

@vlandau vlandau added the performance Related to compute and memory efficiency label Apr 1, 2021
@ViralBShah
Copy link
Member

ViralBShah commented Apr 1, 2021

The standard question in all of parallel computing. The answer often is - implement it both ways - and use whichever one works better (for some definition of better).

@vlandau
Copy link
Member Author

vlandau commented Apr 1, 2021

implement it both ways

With the threadsafe 3D array vs. locks/unlocks on a 2D array? Yeah, I should, and do some formal benchmarks -- won't hurt for me to get more familiar with locks/unlocks too.

@ViralBShah
Copy link
Member

Yes. But you'll need to test it with big problem sizes. I mean you only have a handful of tasks, and I suppose they can do all their partial sums locally. I can't imagine the locking and unlocking for the final part to take more than a second. But would love to see what you find. Without knowing the sizes of the 3d array and how it grows with input size, it is hard to tell.

@vlandau
Copy link
Member Author

vlandau commented Apr 1, 2021

Yeah -- definitely will need any benchmarks to consider different map sizes, moving window sizes, and other options within Omniscape. It will be a valuable exercise I think.

@vlandau
Copy link
Member Author

vlandau commented Sep 8, 2021

This may be less of an issue if #106 and #79 can be implemented efficiently.

@vlandau vlandau added this to the Version 1.0 milestone Sep 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Related to compute and memory efficiency
Projects
None yet
Development

No branches or pull requests

2 participants