Skip to content

Simulating concrete data engineering scenarios arising within enterprises in generation of business intelligence (BI) reports, delivery of data insights and deploying applied machine learning.

Notifications You must be signed in to change notification settings

dmollo45/apache-spark-cluster-docker

Repository files navigation

Spark Cluster with Docker & docker-compose

Pre requisites

  • Docker installed

  • Docker compose installed

Build the image

docker build -t cluster-apache-spark:3.0.2 .

Run the docker-compose

The final step to create your test cluster will be to run the compose file:

docker-compose up -d

start a worker shell

docker exec -it apache-spark-cluster-docker_spark-worker-a_1 bash
pip3 install requests

from the shell run the apps as:

/opt/spark/bin/spark-submit  \
   --master spark://spark-master:7077  \
   --jars /opt/spark-apps/postgresql-42.2.22.jar  \
   --driver-memory 1G  \
   --executor-memory 1G \
   /opt/spark-apps/task1_1.py

About

Simulating concrete data engineering scenarios arising within enterprises in generation of business intelligence (BI) reports, delivery of data insights and deploying applied machine learning.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published