Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate Sandbox-specific steps for setting up local Dask cluster #32

Open
robbibt opened this issue Feb 26, 2020 · 3 comments
Open

Automate Sandbox-specific steps for setting up local Dask cluster #32

robbibt opened this issue Feb 26, 2020 · 3 comments
Labels
enhancement New feature or request

Comments

@robbibt
Copy link
Collaborator

robbibt commented Feb 26, 2020

A common science-team workflow is to run several set-up steps to start a local Dask cluster on the Sandbox which do not have to be run on the NCI (see GeoscienceAustralia/dea-notebooks#528).

Some of these steps including dask.config.set, configure_s3_access and perhaps closing previous clients seem like they could be run behind-the-scenes in the Sandbox, removing the need for users to run them every time a notebook is launched on the Sandbox.

This could greatly improve performance for a lot of the code on the Sandbox (e.g. speed improvements of 200% for several of our notebooks), which in turn would improve user experience during demonstrations.

if 'AWS_ACCESS_KEY_ID' in os.environ:
    # configure dashboard link to go over proxy
    dask.config.set({"distributed.dashboard.link":
                 os.environ.get('JUPYTERHUB_SERVICE_PREFIX', '/')+"proxy/{port}/status"})

    # close previous client if any
    client = locals().get('client', None)
    if client is not None:
        client.close()
        del client

    # start up a local cluster  
    client = start_local_dask(mem_safety_margin = '3Gb')

    ## Configure GDAL for s3 access
    configure_s3_access(aws_unsigned=True,  
                        client=client);
else:
    # close previous client if any
    client = locals().get('client', None)
    if client is not None:
        client.close()
        del client

    # start up a local cluster
    client = start_local_dask(mem_safety_margin = '3Gb')

# show the dask cluster settings
display(client)

@Kirill888 @tom-butler @alexgleith

@robbibt robbibt changed the title Automate Sandbox-specific steps to set up local Dask cluster Automate Sandbox-specific steps for setting up local Dask cluster Feb 26, 2020
@robbibt robbibt added the enhancement New feature or request label Feb 26, 2020
@robbibt
Copy link
Collaborator Author

robbibt commented Mar 5, 2020

At this stage I think we're going to wrap the above code in a dea-notebooks function so at least we can more easily use Dask in notebooks without scaring users off with the code block above. @Kirill888 @tom-butler @alexgleith, can you let @cbur24 or myself know if there's any progress on getting some of those Sandbox-specific steps automated in the Sandbox?

@tom-butler
Copy link
Contributor

Most of this should probably be done in the dask config file instead, then a notebook can just include a simple client connection line (similar to datacube)

@tom-butler tom-butler reopened this Apr 1, 2020
@robbibt
Copy link
Collaborator Author

robbibt commented Apr 2, 2020

I think that sounds great 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants