Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use sparkdl on jupyter notebook, without web connection #214

Open
christophelebrun opened this issue Dec 18, 2019 · 1 comment
Open

Use sparkdl on jupyter notebook, without web connection #214

christophelebrun opened this issue Dec 18, 2019 · 1 comment

Comments

@christophelebrun
Copy link

Hello,

I am running a jupyter notebook on a EMR instance, without access to the web.
I have downloaded the .jar file of sparkdl to an s3 bucket.

I tried :

# Creating SparkSession
spark = (SparkSession
            .builder
            .config('spark.jars', "s3://my_bucket/libs/spark-deep-learning-1.5.0-spark2.4-s_2.11.jar")
            .getOrCreate()
)

This cell run without error.

But I got an error with from sparkdl import DeepImageFeaturizer
ModuleNotFoundError: No module named 'sparkdl'

Any idea of how to fix that ?

@spark-water
Copy link

use

spark.jars.packages, instead of spark.jars. Also, I had no success using a local package (in your case, you compiled one and put in S3 bucket) due to lack of parent dependency. You should pull from databricks spark package site. I know, this would have limitations but so far I've not able to find a solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants