-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tee: on detecting a failed receiver #557
Comments
Dear @OCrylic , To confirm today there is nothing that could help builtin pmacct. I can think to two possible ways to tackle this:
Let me know a bit your thoughts on the two options above - and whether you do see something else / different. Then i would slightly re-title this issue and mark it as Enhancement. Paolo |
Hi @paololucente, Regarding the first solution, if you can automatically exclude the failed server at any time during the Netflow replication, why is it not possible to get it back as soon as the ICMP answer (because the ICMP checking would be still made during that time, even if receiver does not answer) ? Unless you are talking about ICMP ping from netflow receiver (Graylog in my case) to netflow sender (pmacct). But maybe the ICMP checks can be made automatically from pmacct to Graylog, every 30 seconds for example, according IP address in tee_receivers.lst and as soon there is a ping fail (or 3 to be sure) the failed server is excluded. This can be improved I think but it is a good start :) I explain it: The second solution you provide does not relate to my case indeed. I am not sure I understood it completely but it could work. If I understand it correctly, instead of using ICMP checks, you relie on the registering process between two pmacct nodes. |
Hi @OCrylic , Thanks for your feedback. The first solution would rely on ICMP Unreachables, not ICMP Ping. So the first packet that is sent to a remote UDP port that triggers an ICMP Unreachable back makes that node go off the list. This is why once the node is off, something external should bring it back on once restored. In other words i was not speaking about adding extra ICMP checks to pmacct, ie. ICMP Pings since those would be very coarse-grained, ie. the whole host should go down in order for the recovery mechanism to kick in. And you are entirely correct on the high-level picture of the second solution i was proposing. Paolo |
Description
Hi Paolo,
This is a feature request since (from my understanding) it is not implemented in pmacct.
With nfacctd and the plugin tee, we can load-balance the forwarding of the flows to several servers (2 Graylog nodes in my case).
However, if one of the Graylog server crashes, flows are still forwarded to that node since there is no health checks in pmacct.
Health checks is a bit tricky since UDP does not send back acknowledgement and Netflow protocol does not use responses (instead of DNS for example), We have to use another type of health check (REST API to get the status of the graylog cluster or ICMP checks) but using HAProxy or Nginx is not possible since:
Do you have any advice or an other solution for this ? Do you think the health checks could be integrated one day in pmacct ?
The text was updated successfully, but these errors were encountered: