Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better parallelization using Dask? #15

Open
jgostick opened this issue Apr 19, 2020 · 2 comments
Open

Better parallelization using Dask? #15

jgostick opened this issue Apr 19, 2020 · 2 comments
Labels
enhancement New feature or request

Comments

@jgostick
Copy link
Member

We've had some great success with dask for simple single-machine parallelization. I think the current version uses something a bit problematic right, requiring the script to be in a main section right? Dask does not require any of this.

@jgostick jgostick added the enhancement New feature or request label Apr 19, 2020
@TomTranter
Copy link
Collaborator

Yup, can you post an example? I also started looking at shared memory arrays for something else but only works in linux

@jgostick
Copy link
Member Author

jgostick commented Apr 20, 2020

We've been using it to do operations in chunks, like this:

import numpy as np
from dask import delayed, compute

# This decorator tells dask to delay computation of this function
@delayed
def sumnums(arr, num):
    arr = arr + num
    return arr


arr = np.zeros([20, 20])
res = []
for chunk in np.split(arr, 2):
    # This loop creates a list of 'delayed' functions
    res.append(sumnums(chunk, 1))
temp = compute(res)  # Here we tell dask to actually do the calc, in parallel
new_arr = np.vstack(temp[0])  # Now we massage the result back into an array
assert np.all(new_arr.shape == arr.shape)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants