You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The idea behind those funcs is to keep things as dask arrays and int datatypes until the last possible moment so that memory is kept better under control. I'm not entirely sure though if there's options there for computing things like means/medians etc on the data in its original data type (taking into account the custom nodata values), but this would also be good to include as these are very common workflows.
The text was updated successfully, but these errors were encountered:
There are no nodata aware reduction functions. Maybe these are supported by masked arrays in numpy? But really bigger problem is not so much representation and handling of missing values, bigger problem is integer math can be hard to reason about and implement correctly (without silent overflows). So I prefer to convert to float then use nan{mean,sum,...} family of functions followed by conversion back to integer. to_float is also useful for plotting as nan are automatically transparent, whereas nodata values are not
A step-by-step example of using some of these tools are available in the Cloud and Pixel Quality Masking notebook in deafrica-sandbox-notebooks, including the mask_cleanup function which is pretty handy. Not explicity reference memory optimisation, but might provide some boilerplate code for starting this notebook
We should create a
Frequently_used_code
notebook that documents some useful techniques for optimising memory use when analysing DEA data.@Kirill888 has lots of useful tools for doing that here: https://github.com/opendatacube/odc-tools/blob/develop/libs/algo/odc/algo/_masking.py
E.g. you can use something like
fmask_to_bool
to produce a boolean mask from fmask flags:https://github.com/opendatacube/odc-tools/blob/develop/libs/algo/odc/algo/_masking.py#L517
Then pass that to
erase_bad
to set those "bad" values to the data's nodata value (still in the original data type):https://github.com/opendatacube/odc-tools/blob/develop/libs/algo/odc/algo/_masking.py#L97
Then finally convert it to floats at the end using
to_float
(this is the first time the nodata values will be set to NaN ) :https://github.com/opendatacube/odc-tools/blob/develop/libs/algo/odc/algo/_masking.py#L204
The idea behind those funcs is to keep things as dask arrays and int datatypes until the last possible moment so that memory is kept better under control. I'm not entirely sure though if there's options there for computing things like means/medians etc on the data in its original data type (taking into account the custom nodata values), but this would also be good to include as these are very common workflows.
The text was updated successfully, but these errors were encountered: