New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor scheduling with flox
, leading to high memory usage and eventual failure
#11026
Comments
The input chunk sizes are large here, so that's fixable in theory. I did notice that no root tasks were identified in the dashboard. @ivirshup's comment suggest that this is happening with Zarr inputs as well, not just EDIT: Those |
@dcherian, adding an intermediate (computed) zarr store like this: M, N = 500_000, 20_000
N_CATEGORIES = 2_000
SCRATCH = Path("/scratch/tmp")
X_pth = SCRATCH / "X.zarr"
X_orig = da.random.normal(size=(M, N), chunks=(5_000, N))
X_orig.to_zarr(X_pth, overwrite=True, compute=True)
X = da.from_zarr(X_pth)
by = np.random.choice(N_CATEGORIES, size=M) Has the same behaviour for me. @fjetter, what is that task representation? I'm not familiar enough with dask to know what my expectations should be of the high level graph representations, but some stuff there does seem a little off. My expectation would be that each groupby-sum chunk would be |
That makes sense, but I guess I would have expected to have arrays that look like:
Instead of:
Why does the FWIW, I tried this with the threading scheduler and it seemed to work great with no memory issues. So definitely speeks to high cost of transfer. I was under the impression that the scheduler would try to reduce the number of data transfers necessary. Is this explicitly done, or is this just expected to happen due to last-in-first-out task distribution? |
This is interesting, I don't actually update the sizes. @fjetter would the scheduling algorithm work better if the size/shapes were more accurate at the blockwise step? |
As a heads-up, Florian is travelling today and may not be very responsive.
…On Fri, Mar 29, 2024 at 12:06 PM Deepak Cherian ***@***.***> wrote:
Why does the groupby_sum-chunk task output an array the same size as the
input?
This is interesting, I don't actually update the sizes.
@fjetter <https://github.com/fjetter> would the scheduling algorithm work
better if the size/shapes were more accurate at the blockwise step?
—
Reply to this email directly, view it on GitHub
<#11026 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTF5N6XZKBT2D24AE2DY2WNRFAVCNFSM6AAAAABFJUPHQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRXGUYDANZXGA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Describe the issue:
When performing aggregation with flox, I quickly run out of memory, regardless of the size of a chunk.
It appears that lower nodes (loading data) keep getting run while higher nodes (reduction) sit un-computed.
Opening here as requested in:
cc: @dcherian
Minimal Complete Verifiable Example:
Anything else we need to know?:
As an example, here's how my task graph completes if I run `X.sum(axis=0).compute()`
Whereas `res.compute()` ends up with something more like:
Environment:
The text was updated successfully, but these errors were encountered: