Skip to content
This repository has been archived by the owner on Feb 3, 2021. It is now read-only.

Support Azure Blobs as application resources in Spark #663

Open
shtratos opened this issue Sep 27, 2018 · 1 comment
Open

Support Azure Blobs as application resources in Spark #663

shtratos opened this issue Sep 27, 2018 · 1 comment
Labels

Comments

@shtratos
Copy link
Contributor

Hello @jafreck @timotheeguerin

Right now AZTK in Spark SDK when aztk.spark.client.Client.submit() is called,
it assumes that ApplicationConfiguration contains paths to local files in jars and files fields.

In our case we already have the spark job resources uploaded to Azure Blob Storage so we want to avoid downloading and uploading them again.

From what I see, aztk.spark.client.Client.submit() calls generate_task which uploads files to blob storage, generates ResourceFiles for them, replaces local paths with file names in application config and uploads it as application.yml file to blob storage.

I would like to have an option to provide resource_files directly to Client.submit() and thus skip uploading files.

Right now we use a workaround where we basically reimplement generate_task and generate resource_files for our blobs ourselves. This seems brittle as it is coupled to AZTK SDK implementation and can break when AZTK changes in future.

@jafreck
Copy link
Member

jafreck commented Oct 1, 2018

I think this is a great feature. We should support both scenarios - local upload and referencing existing files in storage. Thanks for the feature request!

@jafreck jafreck added the feature label Nov 2, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants