Use sparkdl on jupyter notebook, without web connection #214

christophelebrun · 2019-12-18T10:43:00Z

Hello,

I am running a jupyter notebook on a EMR instance, without access to the web.
I have downloaded the .jar file of sparkdl to an s3 bucket.

I tried :

# Creating SparkSession
spark = (SparkSession
            .builder
            .config('spark.jars', "s3://my_bucket/libs/spark-deep-learning-1.5.0-spark2.4-s_2.11.jar")
            .getOrCreate()
)

This cell run without error.

But I got an error with from sparkdl import DeepImageFeaturizer
ModuleNotFoundError: No module named 'sparkdl'

Any idea of how to fix that ?

The text was updated successfully, but these errors were encountered:

spark-water · 2020-02-10T18:14:52Z

use

spark.jars.packages, instead of spark.jars. Also, I had no success using a local package (in your case, you compiled one and put in S3 bucket) due to lack of parent dependency. You should pull from databricks spark package site. I know, this would have limitations but so far I've not able to find a solution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use sparkdl on jupyter notebook, without web connection #214

Use sparkdl on jupyter notebook, without web connection #214

christophelebrun commented Dec 18, 2019

spark-water commented Feb 10, 2020

Use sparkdl on jupyter notebook, without web connection #214

Use sparkdl on jupyter notebook, without web connection #214

Comments

christophelebrun commented Dec 18, 2019

spark-water commented Feb 10, 2020