Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

found issue with big 10GB raster file and solved it in a fork #65

Open
justRishi opened this issue Sep 8, 2021 · 0 comments
Open

found issue with big 10GB raster file and solved it in a fork #65

justRishi opened this issue Sep 8, 2021 · 0 comments

Comments

@justRishi
Copy link

justRishi commented Sep 8, 2021

Hi I made some changes in here:
https://github.com/justRishi/django-raster/tree/improve-working-with-big-raster-files-needs-lot-of-ram

I put it here as an issue , as not sure If you would like my pull request.

Description:

Problem statement

Big raster files in AWS issues

  • /tmp storage gets full with 20GB docker image limit in AWS
  • it takes 7 /8 hours to process a 10GB Sentinel 2 tif

Query in parser.py to be written to db , is to big

in process_quadrant bulk_create a result which is a to big query to be fired off by postgres, query string is just to big and results in postgres firing off an out of memory error.

Can not remove raster layers from django admin

Changes

  • GDAL vsimem for creating tiles in memory (and not in tmp files) in parser.py
    dest_file_name = os.path.join('/vsimem/', '{}.tif'.format(uuid.uuid4()))
  • added to .bulk_create second parameter with with default 50 , so query to be written to db is not to big, 2nd parameter(for
    write bulk in batches of 2nd param) is new since Django (X?).
  • removed from admin.py the following:
            def has_delete_permission(self, request, obj=None):
                return False

Solves

  • processing time reduced to 1.5 hours of processing for 10GB S2 raster files (when using 16GB RAM and 2 CPU)
  • no "query string buffer is to big" errors from postgres

Drawbacks using vsimem:

  • vsimem needs a lot of more memory to process files , when not enough RAM celery crashes
  • vsimem seems only to work nice with all_in_one parameter set (RASTER_PARSE_SINGLE_TASK = True )

Drawback using max batch-size parameter in bulkcreate:

will not work with old Django versions .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant