Skip to content

DeepElement/docker-spark

Repository files navigation

Docker-Spark

Summary

Images to support various Spark/Hadoop configurations in a Stand-alone Cluster controlled by Docker.

Fetch images by schema deepelement/docker-spark:{Spark Version}-{Hadoop Version}. Examples:

  • deepelement/docker-spark:2.0.2-2.7
  • deepelement/docker-spark:2.0.0-2.7
  • deepelement/docker-spark:2.0.2-2.4

Workers auto-discover master Spark driver via injected network environmental variables at launch.

Usage

To use with Docker Compose, in docker-compose.yml:

  spark_master:
    image: deepelement/docker-spark:2.0.2-2.7
    container_name: spark_master
    network_mode: 'bridge'
    command: /start-master.sh
    ports:
      - "7077:7077"
      - "8080:8080"

  spark_worker:
    image: deepelement/docker-spark:2.0.0-2.7
    command: /start-worker.sh
    network_mode: 'bridge'
    links:
      - spark_master

To scale up workers, use the standard Compose interface:

docker-compose scale spark_worker=5
docker-compose up

While the Spark setup follows traditional configuration, making things less noisy is a good example of configuration override:

RUN cp $SPARK_HOME/conf/log4j.properties.template $SPARK_HOME/conf/log4j.properties \
  && grep -rl 'log4j.rootCategory=INFO' $SPARK_HOME/conf | xargs sed -i 's/log4j.rootCategory=INFO/log4j.rootCategory=WARN/g'

References

SparkHadoop

Releases

No releases published

Packages

No packages published

Languages