New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FlwdirRaster.accuflux() segmentation faults with a raster ~ (63000, 80000) on machine with >1Tb RAM #46
Comments
I'm not too familiar with the code here, but thanks to the clear report I could follow along. You mention:
Do you know why this didn't work? Did it use the "sort" method but also run out of memory there? From reading the issue I gather you have a d8 flow direction type. https://deltares.github.io/pyflwdir/latest/_examples/flwdir.html Which due to this line selects the "walk" method: Line 272 in 45c5e63
Which states that it uses a lot of memory, suggesting "sort" method should be a good alternative: Lines 209 to 215 in 45c5e63
The fact RAM use is only at 1/3 of capacity at the segfault time may be misleading, because it will try to allocate this memory at once with the "walk" method: Line 70 in 45c5e63
|
@visr Thanks for the reply. For reference, below is the result of using Traceback (most recent call last):
File "/foss_fim/src/accumulate_headwaters.py", line 110, in <module>
accumulate_flow(**vars(args))
File "/foss_fim/src/accumulate_headwaters.py", line 73, in accumulate_flow
flowaccum = flw.accuflux(headwaters, nodata=nodata, direction='up')
File "/usr/local/lib/python3.10/dist-packages/pyflwdir/flwdir.py", line 555, in accuflux
seq=self.idxs_seq,
File "/usr/local/lib/python3.10/dist-packages/pyflwdir/pyflwdir.py", line 272, in idxs_seq
self.order_cells(method="sort")
File "/usr/local/lib/python3.10/dist-packages/pyflwdir/flwdir.py", line 211, in order_cells
rnk, n = core.rank(self.idxs_ds, mv=self._mv)
File "/usr/local/lib/python3.10/dist-packages/numba/core/dispatcher.py", line 468, in _compile_for_args
error_rewrite(e, 'typing')
File "/usr/local/lib/python3.10/dist-packages/numba/core/dispatcher.py", line 409, in error_rewrite
raise e.with_traceback(None)
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<built-in function setitem>) found for signature:
>>> setitem(array(int32, 1d, C), float64, Literal[int](-1))
There are 16 candidate implementations:
- Of which 14 did not match due to:
Overload of function 'setitem': File: <numerous>: Line N/A.
With argument(s): '(array(int32, 1d, C), float64, int64)':
No match.
- Of which 2 did not match due to:
Overload in function 'SetItemBuffer.generic': File: numba/core/typing/arraydecl.py: Line 176.
With argument(s): '(array(int32, 1d, C), float64, int64)':
Rejected as the implementation raised a specific error:
NumbaTypeError: unsupported array index type float64 in [float64]
raised from /usr/local/lib/python3.10/dist-packages/numba/core/typing/arraydecl.py:72
During: typing of setitem at /usr/local/lib/python3.10/dist-packages/pyflwdir/core.py (37)
File "usr/local/lib/python3.10/dist-packages/pyflwdir/core.py", line 37:
def rank(idxs_ds, mv=_mv):
<source elided>
while len(idxs_lst) > 0:
ranks[idxs_lst.pop(-1)] = -1
^
|
Interesting. I hope I'm not leading you astray, but if I read that error correctly it tries to do a Lines 35 to 38 in 45c5e63
So it hits a loop and is trying to mark its path, but is failing to do so since
You could try removing the |
Hmm I should've tried asking Copilot earlier, this verbatim answer makes sense: The issue might be with the way Numba is interpreting the types in your code. It might be incorrectly inferring the type of the index. To fix this, you can explicitly cast the index to an integer before using it: ranks[int(idxs_lst.pop(-1))] = -1 And ranks[int(idxs_lst.pop(-1))] = rnk This will ensure that the index is always an integer, which should resolve the error. |
@visr Thanks for your help and suggestions. After upgrading to Fatal Python error: Segmentation fault
Extension modules:
.... (total: 97)
Segmentation fault (core dumped) Unfortunately, when using I will try to upgrade our Python version next, as well as try and get more detailed debugging information as to what lines of code are responsible for causing the segmentation fault in hopes of addressing the root of the problem. |
Unfortunately, I personally don't have much time either to dig into this, but from my numba wrangling experience I can share maybe a couple of tips. If numba cannot allocate sufficient memory, it will generally report an appropriate error: import numpy as np
import numba as nb
@nb.njit
def allocate(n):
return np.empty(n, dtype=np.float64)
a = allocate(int(1e12)) # my machine has 32 GB RAM This raises: Segfaults, however, are very easy to trigger. Numba doesn't do any bounds checking. Perhaps all the segfaults that I've generated with numba were due to silly indexing mistakes. It may be that it doesn't show up (consistently) with smaller inputs, I guess as long as numba stays within the process memory, the OS doesn't kill it. In the example below, indexing with 11 is definitely out of bounds, but I simply get a garbage value. A much larger index guarantees I'm definitely trespassing and Python crashes: @nb.njit
def allocate_and_index():
a = np.empty(10, dtype=np.float64)
return a[10000000] Bounds checking can be enabled since some time: https://numba.readthedocs.io/en/stable/reference/pysemantics.html#bounds-checking nb.njit(boundscheck=True)
def allocate_and_index():
a = np.empty(10, dtype=np.float64)
return a[10000000] Results in the error: It may be tedious to set this on the decorator, you can also set it via an environmental variable: import os
os.environ["NUMBA_BOUNDSCHECK"] = "1" (This needs to go on top in the script such that the jit decorator is aware before anything gets compiled. Of course you can also just set it in your command line prior to starting Python.) Generally, one of the first thing I try when running into segfaults is disabling numba entirely through another environmental variable: import os
os.environ["NUMBA_DISABLE_JIT"] = "1" This may not be feasible if the segfault is only triggered with large inputs, as dynamic Python can be 300 times slower than numba, so it may take an inordinate amount of time to trigger the error. One (obvious) option here is splitting up the function, run the part up until the segfault with numba, get the intermediate products out, and try to run the subsequent part without numba. Anyway, I'd start with the numba boundscheck. It doesn't mention a line number in my test (I'm on numba 0.59.1), but if it errors, it will at least provide a starting point. For what it's worth: all the numba segfaults that I can remember the past also triggered errors in Python and were relatively straightforward to iron out... |
Re-reading the title and OP another time: is it possible that we're looking at an int32 overflow or something? I'm not sure how that would then result in a segfault, but the examples sizes are suggestive:
Interestingly, the other example is big, but does fit:
Searching the pyflwdir project, I get 93 results in 18 files for If you're feeling lucky, you could try running a search and replace, changing all of them to int64. Make sure to also update the uint32's, since the 63000 by 80000 outsizes unsigned integers as well. I get no hits for Worth noting that an unjitted version may give this error, since Python seems to warn for overflow: In [9]: a = np.int32(np.iinfo(np.int32).max)
In [10]: a
Out[10]: 2147483647
In [11]: a + 1
<ipython-input-11-ca42ed42e993>:1: RuntimeWarning: overflow encountered in scalar add
a + 1
Out[11]: -2147483648 Unfortunately, it only seems to check scalar types though: In [15]: b = np.full(1, np.iinfo(np.int32).max)
In [16]: b
Out[16]: array([2147483647])
In [17]: b + 1
Out[17]: array([-2147483648]) |
I don't think this is related to an int32 overflow, based on the data size a dtype is assigned, see also: https://github.com/Deltares/pyflwdir/blob/main/pyflwdir/pyflwdir.py#L167 Anyway, good to check if this is indeed the case in the |
When enabling Numba bounds checking, and inserting print statements, I was able to pinpoint the offending function. upstream_count seems to be causing the For reference, when Further investigation of the dtype for both |
…f large arrays. Ref Deltares#46
pyFlwDir version checks
Reproducible Example
Current behaviour
In trying to accumulate flow (accuflux) on larger rasters generated from 1m LiDAR Data, Segmentation Faults are occurring.
The specifics:
1m LiDAR flow direction file & headwaters file (same size raster)
flow_direction_filename is 253M
head_waters_filename is 66M
Rasters generated from 3m LiDAR data will not seg fault, and process successfully.
flow_direction_filename is 107M
head_waters_filename is 7.4M
When the Python script is called from a shell script, an Exit Status of 139 is observed. Further debugging:
Naively, the source code was modified (call to order_cells) hardcoding
method="sort"
. This did not work. As seen above, line 215 in order_cells, calls core.idxs_seq which appears to be the root of the problem. No further investigation has been made past this point.Desired behaviour
Ideally larger rasters would process without segmentation faults. If not, the exception could potentially be handled a little more elegantly from python with a message stating that the raster is too big to process, or....
Another option might entail providing documentation/examples to users on how to split larger rasters into chunks, and then provide the tool/utility to join/concatenate 'blocked' or 'chunked' rasters back into a single
pyflwdir.FlwdirRaster
object once processing (whether it beaccuflux
or otherFlwdirRaster
methods) is finished.Additional context
Memory usage was tracked, and it was observed that less than 1/3 of the available RAM was in use when the segmentation fault occurred.
The text was updated successfully, but these errors were encountered: