Implement an infrastructure to manage jobs submitted to a cluster of Executors. Each client may submit a job to any of the executors receiving a job id as a return value. Through such job id, clients may check (contacting the same executor they submitted the job to) if the job has been executed and may retrieve back the results produced by the job. Executors communicate and coordinate among themselves in order to share load such that at each time every Executor is running the same number of jobs (or a number as close as possible to that). Assume links are reliable but processes (i.e., Executors) may fail (and resume back, re-joining the system immediately after). Choose the strategy you find more appropriate to organize communication and coordination. Use stable storage to cope with failures of Executors.
ReverseProxy
: Grand central loadbalancer and job dispatcherClusterNode
: Node on which jobs are executed
- Client connects to
ClusterNode
and submits as many jobs as he/she wants and for each job he/she submits a ticket is returned so that he/she can later check the job's result ClusterNode
forwards every incoming job from clients toReverseProxy
ReverseProxy
constantly collects info on everyClusterNode
jobQueueReverseProxy
distributes jobs equally among eachClusterNode
so that they have, almost always, the same amount of jobs in their queueReverseProxy
andClusterNode
constantly exchange messages to request/respond for/with results until every job result is fulfilled
Name | Payload type |
---|---|
INFO |
Integer |
REQUEST_OF_RES |
List<String> |
RESULT |
List<Tuple2<String, String>> |
RES_REQ |
Tuple2<List<Tuple2<String, String>>, List<String>> |
Name | Description |
---|---|
INFO |
Message containing an integer value used by ClusterNodes to notify how many jobs they have in their local job queue |
REQUEST_OF_RES |
Message containing a list of jobIds that both ReverseProxy and ClusterNode use to request those job results |
RESULT |
Message containing a list of tuples that both ReverseProxy and ClusterNode use to send jobs result.Each tuple contains a jobId and its result |
RES_REQ |
Message containing a list of jobIds to request and a list of tuples of job results. This message is used to optimize message communication and to avoid network cluttering |
docker pull ghcr.io/mattrighetti/cluster_node
docker pull ghcr.io/mattrighetti/loadbalancer
You can find the Docker images on my Github Container Registry cluster_node and loadbalancer
Name | Description | Default value |
---|---|---|
DISPATCH_PERIOD |
How frequently should the loadbalancer dispatch jobs (in ms) | 5000 |
MAX_NUM_JOBS_DISPATCH |
How many jobs are dispatched every DISPATCH_PERIOD |
30 |
REQUEST_RESULT_PERIOD |
How frequently should the loadbalancer request/send results to nodes (in ms) | 3000 |
MAX_NUM_NODES |
Maximum number of nodes that can connect to the loadbalancer | 10 |
You can override this values at any time by passing -e ENV_NAME=value
to the docker run
command
-
Make sure to create a
docker network
withdocker network create rp_cluster_network
-
To use persistent storage you have to create a volume for each instance you want to launch
docker volume create rp_v
docker volume create c1_v
- Run ReverseProxy with
docker run --rm \
--network-alias reverse-proxy \
--network rp_cluster_network \
--mount source=rp_v,target=/usr/src/app \
-e DISPATCH_PERIOD=3000 \
-e MAX_NUM_NODES=5 \
-e REQUEST_RESULT_PERIOD=3000 \
-e MAX_NUM_JOBS_DISPATCH=50 \
-it ghcr.io/mattrighetti/loadbalancer:latest 8080
- Run ClusterNode with
docker run --rm \
--network rp_cluster_network \
--mount source=c1_v,target=/usr/src/app \
-p 9000:9000 \
-it ghcr.io/mattrighetti/cluster_node:latest \
reverse-proxy 8080
- Connect to
ClusterNode
withnc <cluster_node_ip> <cluster_node_port>