Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up normalize_chunks #10648

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open

Conversation

Illviljan
Copy link
Contributor

@Illviljan Illviljan commented Nov 24, 2023

This PR attempts to speed up normalize_chunks a bit because it was one of the more significant bottlenecks when initailizing many dask arrays.

  • Removes a lot of duplicated checks.
  • Adds type hints to further make sure things work as intended.
  • Closes #xxxx
  • Tests added / passed
  • Passes pre-commit run --all-files

Performance comparison:

chunks = ((3, 3), (4, 4))
shape = (6, 8)
%timeit normalize_chunks(chunks, shape)
6.02 µs ± 312 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) PR
13.1 µs ± 204 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) Main

chunks = (3, (4, 4))
shape = (6, 8)
%timeit normalize_chunks(chunks, shape)
17.6 µs ± 282 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) PR
26.1 µs ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) Main

chunks = ("auto", (4, 4))
shape = (6, 8)
dtype=np.dtype(np.int32)
%timeit normalize_chunks(chunks, shape, dtype=dtype)
53.5 µs ± 566 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) PR
63.5 µs ± 3.26 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) Main

@github-actions github-actions bot added the array label Nov 24, 2023
@Illviljan
Copy link
Contributor Author

dask/array/core.py:3090: error: Statement is unreachable  [unreachable]
dask/array/core.py:3109: error: Incompatible types in assignment (expression has type "Tuple[Union[Tuple[Union[Union[int, float], Union[str, Literal['auto']]], ...], Tuple[Tuple[Union[int, float], ...], ...]]]", variable has type "Union[Tuple[Union[Union[int, float], Union[str, Literal['auto']]], ...], Tuple[Tuple[Union[int, float], ...], ...]]")  [assignment]
dask/array/core.py:3165: error: Argument 1 to "append" of "list" has incompatible type "str"; expected "Tuple[Union[Union[int, float], Literal['auto']], ...]"  [arg-type]
dask/array/core.py:3180: error: Argument 1 to "append" of "list" has incompatible type "str"; expected "Tuple[Union[Union[int, float], Literal['auto']], ...]"  [arg-type]
dask/array/core.py:3201: error: Argument 1 to "tuple" has incompatible type "List[Tuple[Union[Union[int, float], Literal['auto']], ...]]"; expected "Iterable[Tuple[Union[int, float], ...]]"  [arg-type]

dask/array/core.py Outdated Show resolved Hide resolved
@Illviljan
Copy link
Contributor Author

Illviljan commented Nov 25, 2023

FAILED dask/array/tests/test_array_core.py::test_tiledb_roundtrip - ValueError: Chunk element is not supported. Got 3 from [3, 3]
FAILED dask/array/tests/test_array_core.py::test_tiledb_multiattr - ValueError: Chunk element is not supported. Got 100 from [100, 100]
= 2 failed, 11618 passed, 961 skipped, 84 xfailed, 56 xpassed, 3867 warnings in 750.50s (0:12:30) =

Odd errors. How are these elements not int or float?

Edit: They are numpy integers...

@Illviljan Illviljan marked this pull request as ready for review November 25, 2023 12:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants