Skip to content

Latest commit

 

History

History
49 lines (31 loc) · 1.6 KB

README.md

File metadata and controls

49 lines (31 loc) · 1.6 KB

This project contains Apache Spark experiments I implemented as part of learning Spark.

These experiments are based on:

  • Python 3.5
  • Apache Spark 2.X
  • pyspark
  • IPython / Jupyter

The experiments are in the form of Jupyter notebooks that may be executed.

This guide assumes you're using some *nix / MacOS when it comes to OS commands.

Installation

You'll need the above dependencies to run the notebooks. You can install the dependencies yourself, but if you don't already have them set up, an easier way to go is to use this ready-made Docker image:

To install and run:

  1. Install Docker: https://docs.docker.com/engine/installation/
  2. Create and run a Docker container based on the above pyspark-notebook image:
> export SPARK_EXPERIMENTS=`pwd`
> docker run -d -p 8888:8888 -v $SPARK_EXPERIMENTS:/home/jovyan/work --name spark-experiments jupyter/pyspark-notebook
  1. Open http://127.0.0.1:8888
  2. If the above URL doesn't work, use the IP address of the Docker machine:
> docker-machine ip <name of your running Docker machine>   # See 'docker-machine ls' if you don't know the name

Once you're done, you can remove the above Docker container using:

> docker stop spark-experiments && docker rm spark-experiments

Exported notebooks

Following are HTML exports of the notebooks: