Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add conditional to set _mv variable to np.uint64 data type #47

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

robgpita
Copy link

@robgpita robgpita commented May 7, 2024

Allow the processing of large rasters, shaped at (63708, 79780), for Flwdir object.

Issue addressed

Closes #46

Explanation

As described in issue #46, when large rasters are used, an IndexError was thrown from underlying numba code.

The upstream_count function was the cause of the Indexing Error, due to an incorrect value set in the _mv variable when using a np.uint64 data type.

Checklist

  • Updated tests or added new tests
  • Branch is up to date with main
  • Updated documentation if needed
  • Updated CHANGELOG.rst if needed

Additional Notes (optional)

The setting of the np.uint64 dtype in pyflwdir.py was not accounted for, and caused non-correct values in the _mv variable, when assigned in core.py. This was causing the C pointer references to not line up with the signed integer type that was being used in python. See numpy docs for np.intp()

The downstream effects

self._mv = core._mv
if idxs_ds.dtype == np.uint32:
self._mv = np.uint32(self._mv)
if idxs_ds.dtype == np.uint64:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just appears to be a dtype limitation?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fernando-aristizabal not exactly. The dtype assigned to self._mv needs to be the same as the dtype for idxs.ds. Those two dtypes need to line up, otherwise there were errors in numba code. In the downstream code, incorrect aliasing was occurring based on _mv's dtype. This was causing a mis-match in values between _mv and items in the idxs.ds array, which ultimately led to the IndexErrors being thrown.

In order to avoid the IndexErrors, for the large array indices case, where the idxs.ds.dtype is set to uint64, the dtype for _mv needs to be set to uint64 as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FlwdirRaster.accuflux() segmentation faults with a raster ~ (63000, 80000) on machine with >1Tb RAM
2 participants