Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Path for Spark checkpoints #2

Open
YathishK opened this issue Sep 17, 2019 · 1 comment
Open

Path for Spark checkpoints #2

YathishK opened this issue Sep 17, 2019 · 1 comment
Labels
enhancement New feature or request

Comments

@YathishK
Copy link

When running in yarn mode , it has below warning message.

WARN SparkContext: Spark is not running in local mode, therefore the checkpoint directory must not be on the local filesystem. Directory '/tmp/spark_checkpoint/' appears to be on the local filesystem.

@ngmarchant
Copy link
Member

ngmarchant commented Sep 17, 2019

It looks like you're using one of the example config files to submit a job using spark-submit. The examples assume you're running Spark locally, so the key checkpointPath is set to /tmp/spark_checkpoint/. If you're running Spark in cluster mode, you should instead set checkpointPath to a location on HDFS. For example hdfs:///my-project-name/checkpoints/.

You should also ensure that the output (MCMC samples, saved state etc) is saved to HDFS when running in cluster mode. To do this, you'll need to change the outputPath setting to a HDFS URI.

Incidentally, we should probably make checkpointPath an optional setting so that it falls back to the default if not specified.

@ngmarchant ngmarchant added the enhancement New feature or request label Sep 20, 2019
@ngmarchant ngmarchant changed the title Directory '/tmp/spark_checkpoint/' Path for Spark checkpoints Sep 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants