Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi task processing result in multiple times creation same tmp file & multiple times processing same raster #70

Open
justRishi opened this issue Jan 18, 2022 · 0 comments

Comments

@justRishi
Copy link

justRishi commented Jan 18, 2022

Problem

If RASTER_USE_CELERY = True and (RASTER_PARSE_SINGLE_TASK = False or not set) then
a temp file is created multiple times in def open_raster_file in RasterLayerParser in parser.py
Also when not in the right reprojection, the projection is done multiple times.

Why problem

Big raster files are copied in my case 4 times, processed by GDAL 4 times . and sometimes (when not in the right projection) 4 times reprojected.

How tested

by adding self.log to print out tmp file creation resulting in:
image

How to mitigate

put RASTER_PARSE_SINGLE_TASK = True in settings , but meaning will not use concurrency to process raster file

Possible solution to process parallel and not duplicate work

  1. check that only 1 tmp folder is created :
    so this line in parser.py should change self.tmpdir = tempfile.mkdtemp(dir=raster_workdir (as always unique)
  2. self.dataset in parser.py (in class RasterLayerParser) should be shared by all parallel tasks for same raster file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant