Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASN Statistics broken #72

Open
vidister opened this issue Oct 26, 2021 · 18 comments
Open

ASN Statistics broken #72

vidister opened this issue Oct 26, 2021 · 18 comments
Labels
bug Something isn't working

Comments

@vidister
Copy link

In our setups with exports from Junos based routers (using both sflow and netflow) the default dashboard doesn't display any ASN statistics.
When I remove the AND FlowDirection = conditions from the queries everything works fine.
Maybe our routers don't add FlowDirection Attributes to the flowsamples?
Is it possible to remove the condition?

@leoluk
Copy link
Member

leoluk commented Oct 26, 2021

That's odd - if you query some raw data, does it have any FlowDirection values at all?

kubectl exec -it chi-netmeta-netmeta-0-0-0 -c clickhouse -- clickhouse-client <<< 'select * from flows_raw limit 10 format JSONEachRow'

@vidister
Copy link
Author

Okay, this is a funny one: We have two distinct (or kinda related) problems here.

Junos IPFIX

On the IPFIX Setup I get flow entries like this:

{
  "Date": "2021-10-25",
  "FlowType": "FLOWUNKNOWN",
  "SequenceNum": "3014",
  "TimeReceived": "1635174093",
  "SamplingRate": "2000",
  "FlowDirection": 255,
  "SamplerAddress": "REDACTED",
  "TimeFlowStart": "1635174074",
  "TimeFlowEnd": "1635174074",
  "Bytes": "52",
  "Packets": "1",
  "SrcAddr": "REDACTED",
  "DstAddr": "REDACTED",
  "EType": 2048,
  "Proto": 6,
  "SrcPort": REDACTED,
  "DstPort": REDACTED,
  "InIf": 528,
  "OutIf": 552,
  "SrcMac": "0",
  "DstMac": "0",
  "SrcVlan": 101,
  "DstVlan": 0,
  "VlanId": 101,
  "IngressVrfId": 0,
  "EgressVrfId": 0,
  "IPTos": 0,
  "ForwardingStatus": 0,
  "IPTTL": 55,
  "TCPFlags": 16,
  "IcmpType": 0,
  "IcmpCode": 0,
  "IPv6FlowLabel": 0,
  "FragmentId": 0,
  "FragmentOffset": 0,
  "BiFlowDirection": 0,
  "SrcAS": REDACTED,
  "DstAS": REDACTED,
  "NextHop": "REDACTED",
  "NextHopAS": 0,
  "SrcNet": 15,
  "DstNet": 24
}

So, "FlowDirection": 255, on every entry. (WHERE FlowDirection != 255 returns 0 rows).
There's this Blog post describing the issue:

They export 255 to avoid reporting the wrong flow direction when a packet is sampled by both ingress and egress PFE

https://www.plixer.com/blog/juniper-mx240-ipfix-support-direction-problems/

So an easy solution would be to match FlowDirection != 1 instead of FlowDirection == 0 and vice versa.

Junos sFlow

Then there's a second setup using sflow on JunOS. There are only entries with "FlowDirection": 0:

$ kubectl exec -it chi-netmeta-netmeta-0-0-0 -c clickhouse -- clickhouse-client <<< 'select count(*) from flows_raw WHERE FlowDirection == 0 limit 10 format JSONEachRow'
{"count()":"463967"}

$ kubectl exec -it chi-netmeta-netmeta-0-0-0 -c clickhouse -- clickhouse-client <<< 'select count(*) from flows_raw WHERE FlowDirection != 0 limit 10 format JSONEachRow'
{"count()":"0"}

It definitively is sampling both directions, so I don't know why it sets the FlowDirection to 0. I'll grab some pcaps and try to figure out what's going on there.

But even then it should display something on the SrcASN Graph, right? Well, just Reserved-ASN 0. This is because SrcAS and DstAS are both set to 0. So I guess we have to check if the value is zero and perform another lookup in the risinfo dict.

sFlow Dump:

{
  "Date": "2021-10-22",
  "FlowType": "FLOWUNKNOWN",
  "SequenceNum": "51378",
  "TimeReceived": "1634864823",
  "SamplingRate": "4000",
  "FlowDirection": 0,
  "SamplerAddress": "REDACTED",
  "TimeFlowStart": "1634864823",
  "TimeFlowEnd": "1634864823",
  "Bytes": "1498",
  "Packets": "1",
  "SrcAddr": "REDACTED",
  "DstAddr": "REDACTED",
  "EType": 2048,
  "Proto": 17,
  "SrcPort": REDACTED,
  "DstPort": REDACTED,
  "InIf": 542,
  "OutIf": 508,
  "SrcMac": "REDACTED",
  "DstMac": "REDACTED",
  "SrcVlan": 10,
  "DstVlan": 1,
  "VlanId": 0,
  "IngressVrfId": 0,
  "EgressVrfId": 0,
  "IPTos": 0,
  "ForwardingStatus": 0,
  "IPTTL": 63,
  "TCPFlags": 0,
  "IcmpType": 0,
  "IcmpCode": 0,
  "IPv6FlowLabel": 0,
  "FragmentId": 9561,
  "FragmentOffset": 0,
  "BiFlowDirection": 0,
  "SrcAS": 0,
  "DstAS": 0,
  "NextHop": "::",
  "NextHopAS": 0,
  "SrcNet": 0,
  "DstNet": 0
}

@leoluk
Copy link
Member

leoluk commented Oct 27, 2021

Thanks for debugging!

So an easy solution would be to match FlowDirection != 1 instead of FlowDirection == 0 and vice versa.

Happy to implement this - if I understood the linked article correctly, this would result in correct-ish data by counting all ingress/egress traffic in both tables, right?

I suppose we could also implement #60 and use interface IDs to figure out the flow direction instead, which would also solve the problem with sFlow where no FlowDirection is included.

We already do something similar for the other graphs: if(FlowDirection == 1, 'out', 'in') AS FlowDirection

But even then it should display something on the SrcASN Graph, right? Well, just Reserved-ASN 0.

It definitely should do exactly that, this is what it looks like on one of my sFlow samplers:

image

The solution here is to fill in ASN data using the risinfo dict at capture time to have proper historic data - this is already on the short-term backlog and shouldn't be hard to do.

@vidister
Copy link
Author

vidister commented Nov 5, 2021

Happy to implement this - if I understood the linked article correctly, this would result in correct-ish data by counting all ingress/egress traffic in both tables, right?

this is correct.

We already do something similar for the other graphs: if(FlowDirection == 1, 'out', 'in') AS FlowDirection

Yes, in that case every flow is labeled as "in" right now, which can be a bit confusing.
What are we doing with these? We could put a third string "Unknown" in there, but I think that could also be confusing in the dashboard. Maybe leaving the string empty could do the job? Still a bit messy when mixed sources with differently broken flow export implementations are involved, but probably as good as it gets...

@leoluk
Copy link
Member

leoluk commented Nov 5, 2021

Sounds to me like the "correct" solution is to fix up the data at ingestion time. Your data seems to have correct InIf and OutIf data, so presumably one could deduce the correct flow direction from that?

@vidister
Copy link
Author

vidister commented Nov 5, 2021

But how do you know if a interface is a edge-port or some internal/backbone interface? Set a flag in the interfaceMap?

@leoluk
Copy link
Member

leoluk commented Nov 5, 2021

Yup, that was the idea - just have a map of all interfaces and which way they're facing, possibly determined from Netbox and/or SNMP. Does that sound workable?

@vidister
Copy link
Author

vidister commented Nov 5, 2021

Not nice, but probably the best solution.

@leoluk
Copy link
Member

leoluk commented Nov 5, 2021

Yeah... I don't think we can avoid it unless the device tells us the physical flow direction, which it doesn't want to...

How about having a list of "local" CIDR ranges and using that to determine direction? AS won't work but IPs might.

@vidister
Copy link
Author

vidister commented Nov 6, 2021

This would work fine for Hosting-Provider-Like networks but not so well for ISP networks with downstream ASNs (like ours.).
I think fastnetmon solves this by receiving the routes via BGP... which seems a bit overkill here.

@leoluk
Copy link
Member

leoluk commented Nov 6, 2021

Hmm...we could do that! No half measures 😆

How would that look like - use BGP/BMP to figure out local networks?

@vidister
Copy link
Author

vidister commented Nov 7, 2021

Yup. BGP/BMP integration could become useful anyways. We could work with BGP communities.

@leoluk
Copy link
Member

leoluk commented Nov 7, 2021

Okay, let's do that - sounds like the "correct" solution.

(was meaning to have BMP support anyway to get AS path and avoid the risinfo trick)

We could work with BGP communities.

i.e. have a config setting which communities mark "local" networks?

@vidister
Copy link
Author

vidister commented Nov 7, 2021

i.e. have a config setting which communities mark "local" networks?

yes. And we can generalize this to use other communities as well for filtering.. For example customer networks, region/city communities, etc.

@fionera
Copy link
Collaborator

fionera commented May 10, 2022

The ASN Lookup is now fixed on the development branch by #87 / #88 . Can the issue then be closed or do we still have the FlowDirection issue?

@leoluk
Copy link
Member

leoluk commented May 10, 2022

This is now implemented, thanks again :)

@leoluk leoluk closed this as completed May 10, 2022
@fionera
Copy link
Collaborator

fionera commented Jul 22, 2022

The FlowDirection issue is still open :/

@fionera fionera reopened this Jul 22, 2022
@fionera fionera added the bug Something isn't working label Dec 11, 2022
@fionera
Copy link
Collaborator

fionera commented Dec 11, 2022

The issue with the FlowDirection is partly happening on portmirror deployments too, because the flow data that gets ingested only has one InIf/OutIf set. Because of this, all Graphs have the sum of all traffic from the other direction displayed. We should probably find a way to infer it ourselves since it is the broader solution to issues like this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants