Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

many timeout: yes spot out #21

Open
rmrf opened this issue Jun 13, 2017 · 4 comments
Open

many timeout: yes spot out #21

rmrf opened this issue Jun 13, 2017 · 4 comments

Comments

@rmrf
Copy link

rmrf commented Jun 13, 2017

within our cloud deployment, for internal, with nearly 10+ hosts inside the array .

I can see many "timed_out":"yes" lines from the log output, and from statsd there is did no metrics between these 2 hosts. the communication seems working well under tcpdump view.

any other reason can makeing this timeout happen? will this caused by host_name instead of ip inside internal array?

@rmrf
Copy link
Author

rmrf commented Jun 13, 2017

I tried using ip instead of host_name, still no help. even increase timeout to "timeout": "5s",, still showing

{"level":"info","ts":1497349039.773023,"msg":"Result","PID":25342,"hostname":"host1,"service_name":"arachne","host":"host4.","source_port":32001,"2-way":"-","1-way":"-","timed_out":"yes"}

@rmrf
Copy link
Author

rmrf commented Jun 13, 2017

{"level":"error","ts":1497349506.3300288,"msg":"host exists in resultStore, but not in remoteStore","PID":16496,
"hostname":"host4","service_name":"arachne","host":"100.100.19.42",

btw, when I narrow down the communication between host1 and host4, I catch this kind of error log as above, this strange ip 100.100.19.42, I can find it nowhere! not sure how it came into the log.

not sure this related or not.

@Vasilis
Copy link
Contributor

Vasilis commented Jun 14, 2017

Hi, @rmrf !
Are the timeouts happening just for source port 32001?
If you try a range of ephemeral source ports do all timeout? You could troubleshoot that easily if you check the table generated in stdin when running with -foreground mode enabled.
Could the listen port on host4 be firewalled?

@rmrf
Copy link
Author

rmrf commented Mar 22, 2018

@Vasilis
I'm not sure if the firewall working there which out of my control (we are using arachne inside a public cloud env), but I do tcpdump for the target port , I can see the network packet:

10:29:17.255501 IP 192.168.79.1.41013 > 192.168.64.249.54111: Flags [S], seq 4188118887, win 43690, length 0
10:29:17.255522 IP 192.168.79.1.41014 > 192.168.64.249.54111: Flags [S], seq 4188118887, win 43690, length 0
10:29:17.255531 IP 192.168.79.1.41015 > 192.168.64.249.54111: Flags [S], seq 4188118887, win 43690, length 0
10:29:17.255667 IP 192.168.64.249.54111 > 192.168.79.1.41000: Flags [R.], seq 0, ack 4188118888, win 0, length 0
10:29:17.255710 IP 192.168.64.249.54111 > 192.168.79.1.41001: Flags [R.], seq 0, ack 4188118888, win 0, length 0
10:29:17.255761 IP 192.168.64.249.54111 > 192.168.79.1.41002: Flags [R.], seq 0, ack 4188118888, win 0, length 0
10:29:17.255804 IP 192.168.64.249.54111 > 192.168.79.1.41003: Flags [R.], seq 0, ack 4188118888, win 0, length 0
10:29:17.255846 IP 192.168.64.249.54111 > 192.168.79.1.41004: Flags [R.], seq 0, ack 4188118888, win 0, length 0
10:29:17.255854 IP 192.168.64.249.54111 > 192.168.79.1.41005: Flags [R.], seq 0, ack 4188118888, win 0, length 0

here I'm run tcpdump on 192.168.79.1 host, with "base_src_tcp_port": 41000, and "target_tcp_port": 54111, configuration.
the length 0 is the main different compare with those working nodes. which making the timeout happen.

Weird thing is:
if I only run arachne between these 2 hosts above, everything working fine. then can see each other, but when I put internal configuration with 13 hosts, these 2 hosts can't see each other.

I tried to increase "num_src_tcp_ports": 16, to 160, but this didn't help, and I can notice latency increase like x5 when do this increase.

not sure what happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants