Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Async AWS Code with Threads #522

Open
omad opened this issue Nov 7, 2022 · 1 comment
Open

Replace Async AWS Code with Threads #522

omad opened this issue Nov 7, 2022 · 1 comment
Labels
enhancement New feature or request

Comments

@omad
Copy link
Member

omad commented Nov 7, 2022

Background

To get good performance from AWS S3, it's necessary to parallelise requests.

The odc-aio library provides functions used in the odc-tools CLI applications, and is implemented using Async Python and the aiobotocore library.

This has worked well for several years, providing good performance.
However, using async python, and in particular aiobotocore comes with several significant drawbacks.

  • Because aiobotocore works by altering the internals of boto3 (the Python library for accessing AWS), it is tightly coupled to the version of boto3 used. This greatly complicates building a Python environment, since boto3 has new releases almost every day, and aiobotocore only every few months.
  • The moto is a library which allows mocking AWS services within boto3 for testing. It does not work with aiobotocore, nor is there a drop in replacement, so most test of ODC Cloud tools rely on externally managed S3 buckets allowing anonymous access. This makes it impossible to run tests offline, AND has recently proved unreliable as access to some of those buckets has changed.

Proposal

An alternative to Asynchronous functions to parallelise access to cloud resources, is to use old fashioned threads. To get good S3 performance you only need to use somewhere from 10-50 parallel requests, which can easily be handled by threads. When used correctly the boto3 library is thread safe.

I think work should be put in to migrating away from odc-aio and using a threaded solution instead.

History

This was raised in #332 but never got to the top of the priority list.

@omad omad added the enhancement New feature or request label Nov 7, 2022
@alexgleith
Copy link
Contributor

There's a few examples in this repo that use threads instead and I think they work fast and fine... it's much simpler than async! For example: https://github.com/opendatacube/odc-tools/blob/develop/apps/dc_tools/odc/apps/dc_tools/esa_worldcover_to_dc.py#L185

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants