Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad values time_flow_start_ns, time_flow_end_ns when sending Newflow v9 and exporting them to Kafka #305

Open
doup123 opened this issue Apr 3, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@doup123
Copy link

doup123 commented Apr 3, 2024

Describe the bug
The goflow2 collector decodes badly the values of fields time_flow_start_ns and time_flow_end_ns in my case it expands data of 10 seconds among half day without even including the time range of the netflow data.

To Reproduce

  1. Data are sent via tcpreplay (data cannot be shared) using a packet capture of 10 seconds.
  2. netsampler/goflow2:latest is used
  3. Data are stored in our DB and plotted as it is illustrated below:
    image

Expected behavior
The results have been compared both to the wireshark output and also by another tools that collects NetFlow data (pmacct) and our data expand within the 10 seconds of our packet capture.
Plot from pmacct data:
image

Sampler device:
Data are replayed via tcpreplay using a packet capture of NetFlow data.

GoFlow2:

  • Version: GoFlow2 v2.1.3-1-g1b390e4
  • Environment: Kubernetes
  • OS: Alpine Linux v3.19
@doup123 doup123 added the bug Something isn't working label Apr 3, 2024
@lspgn
Copy link
Member

lspgn commented Apr 3, 2024

Hello,
Thank you for reporting but It will be difficult to troubleshoot without actual capture.
Was the replay done at 18:52: does pmacct discard the timestamp by ang chance and just aggregates?

What is the original source of the capture that's being replayed?

What's the SQL query used to plot those graphs?

What are the data template fields used for timestamp?

When outputting JSON, what are the actual values of the timestamps and how do they compare to what's shown in Wireshark?

Could you configure -producer=raw and show me the values as well?

Thank you

@doup123
Copy link
Author

doup123 commented Apr 4, 2024

Hello @lspgn and thank you for your rapid reply.
Unfortunately, I cannot share directly the pcap, since it contains sensitive data.
But I will try to answer inline your questions.

Was the replay done at 18:52: does pmacct discard the timestamp by ang chance and just aggregates?

The captured traffic contains Netflow data spanning from Mar 13, 2024 16:52:19 (1710341539 -> first packet's reported timestamp in unix seconds at wireshark) to Mar 13, 2024 16:52:33 (1710341553 -> last packets reported timestamp in unix seconds at wireshark).

What is the original source of the capture that's being replayed?

Those are production NetFlow v9 data sent to a collector.

What's the SQL query used to plot those graphs?

I am investigating Druid and I have selected as __time the time_flow_start_ns that goflow2 is generating.

SELECT
  TIME_FLOOR(__time,'PT1S') as timem,
  SUM(packets)*1000 as total_packets
FROM "goflow2"

What are the data template fields used for timestamp?

goflow2 -> time_flow_start_ns
pmacct -> timestamp_start

When outputting JSON, what are the actual values of the timestamps and how do they compare to what's shown in Wireshark?

I have selected a single flow to illustrate the behaviour in goflow2:

{"type":"NETFLOW_V9","time_received_ns":1712219044800653208,"sequence_num":112310449,"sampling_rate":0,"sampler_address":"XXXXXXXX","time_flow_start_ns":1710281495000000000,"time_flow_end_ns":1710331885000000000

The time_received_ns seems fine as I am assuming it reports the date that this flow was received by goflow2.

What seems not to be correct in my case is the time_flow_start_ns and the time_flow_end_ns. The difference between them is approximately 3 hours while they also report dates that are not correct.

The NetFlow packet in Wireshark reports the following values:
SysUptime: 1047938.855000000
CurrentSecs: 1710341553
Flow StartTime (FIRST_SWITCHED): 1047878.797000000
Flow EndTime (LAST_SWITCHED): 1047929.187000000

In pmacct, I did the same experiment and the following data are reported:
"timestamp_start": "2024-03-13 14:51:33.000000", "timestamp_end": "2024-03-13 14:52:24.000000"
which seem correct assuming that timestamp_start = CurrentSecs - (SysUptime - Flow StartTime)

Could you configure -producer=raw and show me the values as well?

I could not pinpoint the flow output for that.

@lspgn
Copy link
Member

lspgn commented Apr 4, 2024

I'll have a look at the code, it seems the (correct) calculation is

Current time - sys uptime + first_switched
1710341553 - 1047938 + 1047878 = 1710341493

where GoFlow2 returns

1710281495

(diff between the two: 59998 seconds)

@lspgn
Copy link
Member

lspgn commented May 19, 2024

Could you have a look if #325 solves your issue?

The buggy calculation was

1710341553 - 1047938855 +  1047878797

The correct calculation was

1710341553 - 1047938.855 +  1047878.797

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants