Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection error for Data Caching benchmark containers with docker swarm on VMware ESXi virtual machines #353

Open
pritam27 opened this issue Jan 11, 2022 · 0 comments

Comments

@pritam27
Copy link

I am running a single server and a single client of CloudSuite’s Data Caching benchmark on two separate nodes. The nodes are VMware ESXi virtual machines (Ubuntu) running behind a virtual NAT appliance. The nodes are part of docker swarm with overlay network swarm-network. The server container is running on manager node while client container is running on worker node.

On manager node, I run the following,
docker run --name dc-server --net swarm-network -d cloudsuite3/data-caching:server -t 4 -m 4096 -n 550
On running docker inspect on the dc-server container, I observe the IP address assigned by swarm-network to be 10.0.1.2.

On worker node, I run the following,
docker run -it --name dc-client --net swarm-network cloudsuite3/data-caching:client bash
and warm up the server as follows,

memcache@2092ffc8eb2b:/$ cd /usr/src/memcached/memcached_client/
memcache@2092ffc8eb2b:/usr/src/memcached/memcached_client$ vi docker_servers.txt 
memcache@2092ffc8eb2b:/usr/src/memcached/memcached_client$ cat docker_servers.txt 
dc-server, 11211
memcache@2092ffc8eb2b:/usr/src/memcached/memcached_client$ ./loader -a ../twitter_dataset/twitter_dataset_unscaled -o ../twitter_dataset/twitter_dataset_30x -s docker_servers.txt -w 1 -S 30 -D 4096 -j -T 1
stats_time = 1
Configuration:

nProcessors on system: 4
nWorkers: 1
runtime: -1
Get fraction: 0.900000
Naggle's algorithm: False


host: dc-server
address: 10.0.1.2
Loading key value file...Average Size = 1057.34758
Keys to Preload = 3557357
created uniform distribution 1000
rps -1 cpus 1
num_worker_connections 1
Connection error

However, I am getting connection error, although dc-servergets resolved to 10.0.1.2, the IP address of dc-server container running on manager node. I tried running ping dc-server inside dc-client container, however, ping command was not found. So, just to be sure, I ran a dummy container alpine1 on worker node as follows,
docker run -it --name alpine1 --network swarm-network alpine
and checked network access of dc-server container from inside alpine1 container as follows,

/ # ping dc-server
PING dc-server (10.0.1.2): 56 data bytes
64 bytes from 10.0.1.2: seq=0 ttl=64 time=0.311 ms
64 bytes from 10.0.1.2: seq=1 ttl=64 time=0.144 ms
64 bytes from 10.0.1.2: seq=2 ttl=64 time=0.191 ms
64 bytes from 10.0.1.2: seq=3 ttl=64 time=0.186 ms
64 bytes from 10.0.1.2: seq=4 ttl=64 time=0.163 ms
64 bytes from 10.0.1.2: seq=5 ttl=64 time=0.195 ms
^C
--- dc-server ping statistics ---
6 packets transmitted, 6 packets received, 0% packet loss
round-trip min/avg/max = 0.144/0.198/0.311 ms
/ # 

which works fine.

Could anyone please help me know what I am doing wrong here. Similar network issues exist with other benchmarks too, when running containers on separate nodes in docker swarm. Also, I tried by opening up TCP port 11211 in addition to docker required ports and also by disabling firewall, however, the connection error persists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant