How to use HDFS/Spark Workbench

To start an HDFS/Spark Workbench:

    docker-compose up -d

docker-compose does not work to scale up spark-workers, for distributed setup see swarm folder

Starting workbench with Hive support

Before starting the next command, check that the previous service is running correctly (with docker logs servicename).

docker-compose -f docker-compose-hive.yml up -d namenode hive-metastore-postgresql
docker-compose -f docker-compose-hive.yml up -d datanode hive-metastore
docker-compose -f docker-compose-hive.yml up -d hive-server
docker-compose -f docker-compose-hive.yml up -d spark-master spark-worker spark-notebook hue

Interfaces

Namenode: http://localhost:50070
Datanode: http://localhost:50075
Spark-master: http://localhost:8080
Spark-notebook: http://localhost:9001
Hue (HDFS Filebrowser): http://localhost:8088/home

Important

When opening Hue, you might encounter NoReverseMatch: u'about' is not a registered namespace error after login. I disabled 'about' page (which is default one), because it caused docker container to hang. To access Hue when you have such an error, you need to append /home to your URI: http://docker-host-ip:8088/home

Docs

Motivation behind the repo and an example usage @BDE2020 Blog

Count Example for Spark Notebooks

val spark = SparkSession
  .builder()
  .appName("Simple Count Example")
  .getOrCreate()

val tf = spark.read.textFile("/data.csv")
tf.count()

Maintainer

Ivan Ermilov @earthquakesan

Note: this repository was a part of BDE H2020 EU project and no longer actively maintained by the project participants.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
example		example
swarm		swarm
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
docker-compose-hive.yml		docker-compose-hive.yml
docker-compose.yml		docker-compose.yml
hadoop-hive.env		hadoop-hive.env
hadoop.env		hadoop.env
scale-up-spark-worker.sh		scale-up-spark-worker.sh
start-hadoop-spark-workbench-with-Hive.sh		start-hadoop-spark-workbench-with-Hive.sh
start-hadoop-spark-workbench.sh		start-hadoop-spark-workbench.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

example

example

swarm

swarm

.gitignore

.gitignore

Makefile

Makefile

README.md

README.md

docker-compose-hive.yml

docker-compose-hive.yml

docker-compose.yml

docker-compose.yml

hadoop-hive.env

hadoop-hive.env

hadoop.env

hadoop.env

scale-up-spark-worker.sh

scale-up-spark-worker.sh

start-hadoop-spark-workbench-with-Hive.sh

start-hadoop-spark-workbench-with-Hive.sh

start-hadoop-spark-workbench.sh

start-hadoop-spark-workbench.sh

Repository files navigation

How to use HDFS/Spark Workbench

Starting workbench with Hive support

Interfaces

Important

Docs

Count Example for Spark Notebooks

Maintainer

About

Releases

Packages

Contributors 4

Languages

big-data-europe/docker-hadoop-spark-workbench

Folders and files

Latest commit

History

Repository files navigation

How to use HDFS/Spark Workbench

Starting workbench with Hive support

Interfaces

Important

Docs

Count Example for Spark Notebooks

Maintainer

About

Resources

Stars

Watchers

Forks

Languages