Simplify blocked reprojection implementation by using dask and improve efficiency of parallel reprojection #314

astrofrog · 2022-10-06T14:22:16Z

This is an experiment to simplify the implementation of blocked reprojection added in #214 by using dask.

For now the usage of dask is internal and doesn't mean using dask for input/output arrays. However with this in place we could potentially have an option to request that the return type for the data be a dask array rather than a Numpy array. Having dask inputs/outputs is a separate topic so I will leave it to another PR.

I am running into an issue where da.store() is not working but using compute() straight up is, see the FIXME in utils.py. I wonder if this might be a dask bug, but not clear.

reproject/utils.py

codecov · 2022-10-06T14:29:33Z

Codecov Report

Merging #314 (ad5c932) into main (8ad494b) will increase coverage by 0.59%.
The diff coverage is 95.23%.

@@            Coverage Diff             @@
##             main     #314      +/-   ##
==========================================
+ Coverage   92.02%   92.62%   +0.59%     
==========================================
  Files          24       24              
  Lines         803      786      -17     
==========================================
- Hits          739      728      -11     
+ Misses         64       58       -6

Impacted Files	Coverage Δ
reproject/utils.py	`84.67% <94.44%> (+0.56%)`	⬆️
reproject/interpolation/core.py	`82.08% <100.00%> (+0.83%)`	⬆️
reproject/interpolation/high_level.py	`100.00% <100.00%> (+12.00%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

astrofrog · 2022-10-07T09:58:15Z

reproject/interpolation/high_level.py

-        # to avoid 0 length block sizes when num_cpu_cores is greater than the side of the image
-        for dim_idx in range(min(len(shape_out), 2)):
-            if block_size[dim_idx] == 0:
-                block_size[dim_idx] = shape_out[dim_idx]


I've removed this to instead let dask decide how to chunk the array, though we might want to provide a keyword argument that specifies the typical number of elements in a chunk.

astrofrog · 2022-10-07T09:58:50Z

reproject/interpolation/tests/test_core.py

@@ -674,48 +674,3 @@ def test_blocked_against_single(parallel, block_size):

    np.testing.assert_allclose(array_test, array_reference, equal_nan=True)
    np.testing.assert_allclose(footprint_test, footprint_reference, equal_nan=True)
-
-
-def test_blocked_corner_cases():


This test is no longer relevant if we don't try and set the chunk size ourselves.

astrofrog · 2022-10-07T09:59:05Z

reproject/interpolation/tests/test_core.py

@@ -630,7 +630,7 @@ def test_identity_with_offset(roundtrip_coords):


 @pytest.mark.parametrize("parallel", [True, 2, False])
-@pytest.mark.parametrize("block_size", [[10, 10], [500, 500], [500, 100], None])
+@pytest.mark.parametrize("block_size", [[30, 30], [500, 500], [500, 100], None])


Changed this as the test was quite slow before

astrofrog · 2022-10-07T10:06:35Z

There's still some work to be done then, and of course I will need to do some performance benchmarks to compare this to the previous implementation - some of the tests seem slower so that might not be ideal if true.

astrofrog · 2023-01-26T23:11:59Z

I've now rebased this - I need to figure out how to adapt the code in utils.py to work properly with @svank's changes in #332. Some of the tests are failing currently because we need to adjust the block size in the case where it is passed using only the non-broadcast dimensions. In this case, should the extra dimensions be given a value of -1? (which could be risky because the broadcast dimensions could be arbitrarily large). Or should we simply require that block_size, if specified, has a length that matches the dimensionality of the data array passed.

In the case the block size isn't passed, should we encourage dask to not chunk over broadcast dimensions by passing e.g. (-1, 'auto', 'auto') as the chunk size? (with as many -1s as extra dimensions).

@svank - any thoughts?

astrofrog · 2023-01-30T10:49:06Z

Just a note that I am going to go ahead and do a release of reproject with the current implementation of the blocking, because this PR doesn't fundamentally change any user-facing API. After this PR the default block size might change but to some extent that is an implementation detail.

svank · 2023-02-01T01:02:05Z

I don't have any knowledge of how dask works, but I bet the right choice is indeed to only chunk over the WCS dimensions. That would let each chunk compute its coordinate transform information in parallel and then re-use the transformation everywhere it's applicable. I suspect chunking the broadcast dimensions would result in unnecessary repetition of coordinate transformations.

On your comment about the risk if the broadcast dimensions are quite large, is that much of a risk in this PR, where the input and output arrays are still both numpy arrays, and so we know the broadcast dimensions must be small enough that both arrays can fit in memory? Or is it that we need memory to hold the whole input, the whole output, plus n_procs * chunk_size of temporary memory, and so chunk_size can't be too large? If that's the case, I think it would be better to limit chunk_size by making the chunks very small along the WCS axes and keeping them full-size along the broadcast axes---I think the numbers I was seeing were suggesting that the coordinate transformations are by far most of the runtime for the interpolating algorithm, so avoiding repeated transformations at all costs may be well worth it.

…no longer needed

Co-authored-by: Stuart Mumford <stuart@cadair.com>

… required scipy, and set minimum required dask version

astrofrog · 2023-02-28T15:31:11Z

Ok so I think this is now working well - once issue I ran into that was also happening with the original implementation before this PR was that the input array would get copied to all the processes which resulted in the memory growing a lot. I've now made it so that in parallel mode we save the input array to a memmap and then load the memmap (as a memmap) inside each process which speeds things up and reduces memory usage.

astrofrog · 2023-02-28T15:47:00Z

I stress-tested this by reprojecting a ~10k by 30k array to a 30k by 30k image (different coordinate system) and with 8 processes the reprojection is 4x faster than using the serial version (still some overhead but going to be hard to be 100% efficient!)

astrofrog · 2023-02-28T15:54:34Z

A big part of the remaining inefficiency in parallel mode is due to #342 - if we switch to using vanilla scipy map_coordinates the speedup is 6.5x

Cadair reviewed Oct 6, 2022

View reviewed changes

reproject/utils.py Outdated Show resolved Hide resolved

astrofrog commented Oct 7, 2022

View reviewed changes

astrofrog force-pushed the use-dask branch from 2d1ffae to 75582fd Compare October 27, 2022 23:27

astrofrog force-pushed the use-dask branch from 49b92bd to 8a65155 Compare January 26, 2023 23:06

astrofrog and others added 14 commits February 27, 2023 14:50

Started refactoring reproject_blocked to use dask

7e2faee

Tweaks

37ca966

Added dask[array] as a reproject dependency

ad1ee0d

Increase small block size to speed up test, and remove test which is …

1465af4

…no longer needed

Fix tests

bc7dda7

Remove code to determine chunk size, instead leave this up to dask

6cb31fd

Fix more issues

a7b2067

Work around issue with da.store() for now

bdb7d0b

Added note

60fbe20

Simplify logic

f5a884d

Co-authored-by: Stuart Mumford <stuart@cadair.com>

Bump minimum required version of numpy to follow NEP 29, bump minimum…

07d9806

… required scipy, and set minimum required dask version

Added cloudpickle as a dependency

7d3492b

Fixed syntax in setup.cfg

1f5efba

Fix blocked reprojection with extra broadcast dimensions

de94742

astrofrog force-pushed the use-dask branch from 8a65155 to de94742 Compare February 28, 2023 00:01

astrofrog added 4 commits February 28, 2023 14:07

Expand test suite and fix all tests

9513323

Recognize number of workers

d40768c

Pass input array to processes via memmap

669741e

Clean up code

76f4d7d

astrofrog changed the title ~~Simplify blocked reprojection implementation by using dask~~ Simplify blocked reprojection implementation by using dask and improve efficiency of parallel reprojection Feb 28, 2023

astrofrog added 3 commits February 28, 2023 15:36

Add zarr and fsspec to dependencies

8704d77

Make reproject_blocked private

13cc3d2

Fix codestyle

8615e48

astrofrog marked this pull request as ready for review February 28, 2023 16:05

Fix typo and remove unused fixture

ad5c932

astrofrog merged commit 42b3ee5 into astropy:main Feb 28, 2023

astrofrog mentioned this pull request May 19, 2023

Implement support for dask input/output #362

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify blocked reprojection implementation by using dask and improve efficiency of parallel reprojection #314

Simplify blocked reprojection implementation by using dask and improve efficiency of parallel reprojection #314

astrofrog commented Oct 6, 2022

codecov bot commented Oct 6, 2022 •

edited

astrofrog Oct 7, 2022

astrofrog Oct 7, 2022

astrofrog Oct 7, 2022

astrofrog commented Oct 7, 2022

astrofrog commented Jan 26, 2023

astrofrog commented Jan 30, 2023

svank commented Feb 1, 2023

astrofrog commented Feb 28, 2023

astrofrog commented Feb 28, 2023

astrofrog commented Feb 28, 2023

Simplify blocked reprojection implementation by using dask and improve efficiency of parallel reprojection #314

Simplify blocked reprojection implementation by using dask and improve efficiency of parallel reprojection #314

Conversation

astrofrog commented Oct 6, 2022

codecov bot commented Oct 6, 2022 • edited

Codecov Report

astrofrog Oct 7, 2022

Choose a reason for hiding this comment

astrofrog Oct 7, 2022

Choose a reason for hiding this comment

astrofrog Oct 7, 2022

Choose a reason for hiding this comment

astrofrog commented Oct 7, 2022

astrofrog commented Jan 26, 2023

astrofrog commented Jan 30, 2023

svank commented Feb 1, 2023

astrofrog commented Feb 28, 2023

astrofrog commented Feb 28, 2023

astrofrog commented Feb 28, 2023

codecov bot commented Oct 6, 2022 •

edited