Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while importing sparkdl in google colab #226

Open
jai-dewani opened this issue Apr 30, 2020 · 3 comments
Open

Error while importing sparkdl in google colab #226

jai-dewani opened this issue Apr 30, 2020 · 3 comments

Comments

@jai-dewani
Copy link

Here is the error call back while importing sparkdl

ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-3-4a9be7b8a3d0> in <module>()
----> 1 import sparkdl

1 frames
/usr/local/lib/python3.6/dist-packages/sparkdl/image/imageIO.py in <module>()
     23 
     24 # pyspark
---> 25 from pyspark import Row
     26 from pyspark import SparkContext
     27 from pyspark.sql.types import (BinaryType, IntegerType, StringType, StructField, StructType)

ModuleNotFoundError: No module named 'pyspark'

Spark version -> sparkdl-0.2.2

@scook12
Copy link

scook12 commented May 7, 2020

Hey @jai-dewani this is expected behavior. Google colab's environment doesn't include all of spark's dependencies, including pyspark, hence the ModuleNotFoundError. You'll need to install these dependencies first.

This repo (https://github.com/asifahmed90/pyspark-ML-in-Colab) has an example of that, but it's a bit dated, so you might ask @asifahmed90 if you run into any issues. Good luck!

@jai-dewani
Copy link
Author

jai-dewani commented May 8, 2020

Actually I did all the necessary steps from the start yet I am ending with this problem,
Here is the link to my collab notebook https://colab.research.google.com/drive/1nYq-rv6MT78UaiQPcSaFT-PHpsgVBe7R?usp=sharing

While running the document, just run the first two subsections and you will end up with eh the same result. I am looking hard for any minor mistake I could be doing or something I missed out on, but can't seem to find something :/

Edit: A similar issue has been posted with the same problem
#209 AttributeError: module 'sparkdl' has no attribute 'graph'

@SaiNikhileshReddy
Copy link

@jai-dewani, This setup worked for me.

!apt-get install openjdk-8-jdk-headless -qq > /dev/null
!wget -q https://downloads.apache.org/spark/spark-3.1.1/spark-3.1.1-bin-hadoop3.2.tgz
!tar xf spark-3.1.1-bin-hadoop3.2.tgz
!pip install -q findspark
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"] = "/content/spark-3.1.1-bin-hadoop3.2"

import findspark
findspark.init()
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[*]").getOrCreate()

I have come around to that solution by looking into latest spark package distribution page. You can do the same by checking out
https://downloads.apache.org/spark/ and look out for latest version of spark and hadoop. Ex: spark-X.X.X/spark-X.X.X-bin-hadoopX.X.tgz.

Change these filenames in the above code as required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants