Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question: Pyspark Processing Jobs in Local Mode? #19

Open
dcompgriff opened this issue Sep 1, 2022 · 1 comment
Open

question: Pyspark Processing Jobs in Local Mode? #19

dcompgriff opened this issue Sep 1, 2022 · 1 comment

Comments

@dcompgriff
Copy link

Hello. I was wondering if there existed a tutorial, or current support for 1) running a pyspark processing job locally and 2) doing so with a custom base docker (EMR) image? I see a tutorial for Dask using a script processor, and also some code for an SKLearn based processor. My goal is to be able to basically set up a local testing/dev environment that uses sagemaker spark processor code. I'm guessing this is more complicated than the other use cases since this processor is usually backed by an EMR cluster.

@eitansela
Copy link
Contributor

eitansela commented Sep 5, 2022

Hi @dcompgriff PySparkProcessor will not work in local mode. This is a SageMaker Docker image and has nothing to do with EMR.
You can build your own Spark Docker image, and use ScriptProcessor with it, the same as the Dask example and run it locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants