Skip to content

aguadoenzo/hadoop_cluster

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hadoop cluster

Made for an assignement, not fit for anything remotely serious.

Getting started

You need at least one Docker Swarm manager

# On manager
docker swarm init

This will print a command that you'll need to run on all other nodes.

Starting the cluster

We use Docker Deploy to start and manage the services.

docker stack deploy --compose-file=docker-compose.yml [name of the cluster]

Stopping a cluster

docker stack rm [name of the cluster]

Scaling a cluster

docker service scale [service name]=[number of desired replicas]

Useful stuff

Visualize cluster health

You can access a visualizer by using a brower to go to [IP]:8080 where IP is a node in the cluster (any node should work).

Query logs of a service (all container of one kind)

To get an aggregated log of all the containers of a specific service, run

docker service logs [service name]

Docker service names can be found with

docker service ls

Enter in a node to issue commands

Executing docker ps will list all containers running on one node.

docker exec -it [container name] bash

Known issues

  • Running the cluster on non linux hosts may cause issues with docker DNS (VIP)
  • Restarting the master causes all the nodes to fail.

About

Running Hadoop in a docker swarm on multiple hosts

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published