The VCC is a framework for building containers that encapsulate parallel applications.
It supports both single node and multi node execution, where a system of linked processes need to be executed on the same host or across different hosts respectively.
The orchestration of these processes is self-contained, offering a high degree of portability between container runtimes (Docker, Singularity, etc) and external orchestration middleware (PBS, Kubernetes, etc).
This allows fast setup and teardown of complex virtual environments to support different kinds of cluster and parallel applications, regardless of the underlying infrastructure.
The interactions between each component required to support the parallel application are modeled through a set of dependency linked services and related hooks that must be run when the providers of a service, or the number of hosts running the application, are changed.
The VCC is mostly un-opinionated on how the software you wish to run is actually installed - for this you might choose a tool like EasyBuild, Nix, Guix, or one that comes with a programming language (like npm
for Node.js). Alternatively, your Dockerfile might just contain the directives to compile the exact environment from source.
The VCC aims to fill the gap between the OS and the isolated software, where the isolated software (or libraries it depends on) expect an certain execution model. For example, it is trivial to package a parallel MPI program in a portable way. However, for execution as intended, it must be run on a system providing the correct interfaces for parallel MPI execution - i.e. a cluster. Same goes for other kinds of applications, such as Hadoop. If you don't have the appropriate system for execution available, or you have a different kind of system, you might be stuck. The VCC takes the approach that this dependency on a logical execution model is just as much a part of a reproducible experiment as the code, artefact and resulting publication.
To follow the instructions in this readme, you will need to install Docker.
This repository holds the Javascript implementation of the Virtual Container Cluster. The technical documentation is on the Wiki pages.
A container image built using the VCC contains a large number of components. If you already have one, you can find out more about it by invoking the tool:
docker run -it --rm <IMAGE> --help
docker run -it --rm <IMAGE> --info
The general process for running any container built using the VCC is as follows:
- Start the discovery container (if multi node execution is required)
- Start the first container
- Start subsequent containers on the same or other nodes
If you are just getting started, you probably want to do one of the following:
This is the recommended way to get started and to test the solution. The pre-built images require a discovery container to be running, as they support multi node execution.
Start the discovery service under Docker
docker run -d --name=discovery hpchud/vcc-discovery
The hpchud/vcc-torque image provides a full Torque/PBS cluster, with the MAUI scheduler, and demonstrates how to provision a cluster middleware in a VCC that can be dynamically scaled on top of an existing resource.
Start the head node for this system with the following command
docker run -d --name=headnode --link discovery:discovery \
hpchud/vcc-torque \
--cluster=test \
--storage-host=discovery \
--storage-port=2379 \
--service=headnode
On the same machine, we can also provision a worker node to test the functionality.
docker run -d --name=workernode --link discovery:discovery \
hpchud/vcc-torque \
--cluster=test \
--storage-host=discovery \
--storage-port=2379 \
--service=workernode
You can add as many workernodes as you like.
If you expose the containers to the network, using --net=host
, you may start the containers on different Docker hosts - in this case, use the real IP address of the host running the discovery container instead of a Docker link for the --storage-host
option.
Now you can log in to the headnode and see that the cluster is running
docker exec -it headnode /bin/bash
After a few moments, run the command
pbsnodes
This should show output like the following
vnode_23323f91cbe6
state = free
power_state = Running
np = 8
ntype = cluster
status = rectime=1488672384,cpuclock=Fixed <...>
mom_service_port = 15002
mom_manager_port = 15003
If you recieve a message saying the node list is empty, the discovery process has not yet finished - just wait a few more seconds and run it again.
A test job can be run to confirm that the cluster is working as expected: switch to the batchuser
account and submit the hello.job
file to the resource manager.
# su batchuser
$ cd /home/batchuser
$ qsub hello.job
$ qstat
This job will compile a short MPI test program and execute it. After a few moments, the expected output can be found in a new file in the current directory.
More information about this image can be found in the hpchud/vcc-torque repository.
This script was designed to support an educational environment, and will provision a Torque/PBS cluster, with 1 head node and 1 worker node, using the Docker container runtime.
Download the script from the v1.1
release somewhere on your path, perhaps $HOME/bin
or /usr/local/bin
:
wget -O $HOME/bin/vcc https://github.com/hpchud/vccjs/releases/download/v1.1/vcc
chmod a+x $HOME/bin/vcc
Run the update to make sure the Docker images are downloaded:
vcc update
Finally, provision the cluster:
vcc setuplab
You can enter the shell on the headnode either using docker exec
or by typing
vcc shell
See the wiki.
This repository contains the VCC tool and service daemons. It is written in Node.js and shell scripts for the service hooks.
For development, just pull in both the runtime and development dependencies using the Node Package Manager.
npm install
You need a discovery container accessible via localhost for some tests to complete. Start one like so using Docker
docker run -d --net=host hpchud/vcc-discovery
Run the test suite using the command
npm test
You will see some errors and warnings, this is normal - some tests are testing for errors!
We would love to recieve pull requests and bug reports.
The code in this repository is licensed under the MIT License. See the LICENSE
file for the full text.