Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: estimate minimum memory requirement #557

Closed
wants to merge 4 commits into from
Closed

Conversation

ungarj
Copy link
Owner

@ungarj ungarj commented Jul 12, 2023

Attempt to fix #556

The idea is to let mapchete estimate how much memory will be needed to process one tile. The maximum memory usage is depending on tile size, band count, array data type used and whatever happens within the process function in order to get this array.

The first items (tile size, band count and data type) can be determined by combining information from the output with the process pyramid.

For an estimate on what the process function needs we need some additional information from the process function itself. One option would be to use decorators to mark the function, very similar of what pytest offers:

import mapchete

# This decorator indicates that the array is
# materialized up to two times somewhere within
# the process function, thus requiring twice the
# estimated memory than what could be expected
# by looking at the output array alone.
@mapchete.mark.mem_usage(tile_array_multiplier=2)
def execute(mp):
   ...
import mapchete

# This decorator indicates that the array is
# converted to another dtype in between the process
# requiring more memory than the output array with its
# datatype would.
@mapchete.mark.mem_usage(dtype_cast="float16")
def execute(mp):
   ...

@coveralls
Copy link

coveralls commented Jul 12, 2023

Coverage Status

coverage: 99.983% (-0.02%) from 100.0% when pulling 4acaf63 on estimate_memory_usage into 86460fa on main.

@ungarj
Copy link
Owner Author

ungarj commented Jul 20, 2023

The function seems to estimate the right size in this example:

from memory_profiler import profile
import numpy as np
import sys

def minimum_worker_memory_usage(count, tile_size, dtype):
    itemsize = np.dtype(dtype).itemsize
    mem_usage_bytes = count * tile_size * tile_size * itemsize
    return mem_usage_bytes

def print_mb(bytes):
    print(f"{bytes / 1024 / 1024:.2f} MB")


@profile
def herbert(count=3, tile_size=256, dtype=np.uint8):
    before = None
    arr = np.ones((count, tile_size, tile_size), dtype=dtype)
    return arr


if __name__ == '__main__':
    count = 3
    tile_size = 256 * 32
    dtype = np.uint8
    arr = herbert(count, tile_size, dtype)
    print_mb(sys.getsizeof(arr))
    print_mb(minimum_worker_memory_usage(count, tile_size, dtype))

run with

$ python -m memory_profiler test.py 
Filename: test.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    14     51.3 MiB     51.3 MiB           1   @profile
    15                                         def herbert(count=3, tile_size=256, dtype=np.uint8):
    16     51.3 MiB      0.0 MiB           1       before = None
    17    243.2 MiB    191.9 MiB           1       arr = np.ones((count, tile_size, tile_size), dtype=dtype)
    18    243.2 MiB      0.0 MiB           1       return arr


192.00 MB
192.00 MB

@ungarj ungarj closed this Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

estimate minimum memory usage
2 participants