Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add flow risks #151

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

Conversation

stefanDeveloper
Copy link

@stefanDeveloper stefanDeveloper commented Jan 5, 2023

Add nDPI flow risks

Description

I have added the flow risks of nDPI. I adapted the lib_engine.c and the python wrapper, respectively.

Info:
I will update the documentation and tests if this general approach is alright.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

from nfstream import NFStreamer


my_streamer = NFStreamer(source="tests/pcaps/tls-esni-fuzzed.pcap", # Live capture mode. 
                         # Disable L7 dissection for readability purpose only.
                         n_dissections=2,
                         statistical_analysis=True,
                         system_visibility_poll_ms=100,
                         system_visibility_mode=1)
                         
for flow in my_streamer:
    print(flow)  # print it.

Results in:

NFlow(id=0,
      expiration_id=0,
      src_ip=192.168.1.12,
      src_mac=28:37:37:00:6d:c8,
      src_oui=28:37:37,
      src_port=49897,
      dst_ip=104.22.71.197,
      dst_mac=10:13:31:f1:39:76,
      dst_oui=10:13:31,
      dst_port=443,
      protocol=6,
      ip_version=4,
      vlan_id=0,
      tunnel_id=0,
      bidirectional_first_seen_ms=1590680391590,
      bidirectional_last_seen_ms=1590680391590,
      bidirectional_duration_ms=0,
      bidirectional_packets=1,
      bidirectional_bytes=770,
      src2dst_first_seen_ms=1590680391590,
      src2dst_last_seen_ms=1590680391590,
      src2dst_duration_ms=0,
      src2dst_packets=1,
      src2dst_bytes=770,
      dst2src_first_seen_ms=0,
      dst2src_last_seen_ms=0,
      dst2src_duration_ms=0,
      dst2src_packets=0,
      dst2src_bytes=0,
      bidirectional_min_ps=770,
      bidirectional_mean_ps=770.0,
      bidirectional_stddev_ps=0.0,
      bidirectional_max_ps=770,
      src2dst_min_ps=770,
      src2dst_mean_ps=770.0,
      src2dst_stddev_ps=0.0,
      src2dst_max_ps=770,
      dst2src_min_ps=0,
      dst2src_mean_ps=0.0,
      dst2src_stddev_ps=0.0,
      dst2src_max_ps=0,
      bidirectional_min_piat_ms=0,
      bidirectional_mean_piat_ms=0.0,
      bidirectional_stddev_piat_ms=0.0,
      bidirectional_max_piat_ms=0,
      src2dst_min_piat_ms=0,
      src2dst_mean_piat_ms=0.0,
      src2dst_stddev_piat_ms=0.0,
      src2dst_max_piat_ms=0,
      dst2src_min_piat_ms=0,
      dst2src_mean_piat_ms=0.0,
      dst2src_stddev_piat_ms=0.0,
      dst2src_max_piat_ms=0,
      bidirectional_syn_packets=0,
      bidirectional_cwr_packets=0,
      bidirectional_ece_packets=0,
      bidirectional_urg_packets=0,
      bidirectional_ack_packets=1,
      bidirectional_psh_packets=1,
      bidirectional_rst_packets=0,
      bidirectional_fin_packets=0,
      src2dst_syn_packets=0,
      src2dst_cwr_packets=0,
      src2dst_ece_packets=0,
      src2dst_urg_packets=0,
      src2dst_ack_packets=1,
      src2dst_psh_packets=1,
      src2dst_rst_packets=0,
      src2dst_fin_packets=0,
      dst2src_syn_packets=0,
      dst2src_cwr_packets=0,
      dst2src_ece_packets=0,
      dst2src_urg_packets=0,
      dst2src_ack_packets=0,
      dst2src_psh_packets=0,
      dst2src_rst_packets=0,
      dst2src_fin_packets=0,
      application_name=TLS,
      application_category_name=Web,
      application_is_guessed=0,
      application_confidence=6,
      requested_server_name=,
      client_fingerprint=957015a0b1e2500d8777219893a09495,
      server_fingerprint=,
      user_agent=,
      content_type=,
      flow_risk={'Missing SNI TLS Extn': {'risk_severity': 'Medium', 'risk_score_total': 300, 'risk_score_client': 210, 'risk_score_server': 90}, 'Unidirectional Traffic': {'risk_severity': 'Low', 'risk_score_total': 500, 'risk_score_client': 430, 'risk_score_server': 70}})
NFlow(id=1,
      expiration_id=0,
      src_ip=192.168.1.12,
      src_mac=28:37:37:00:6d:c8,
      src_oui=28:37:37,
      src_port=49887,
      dst_ip=104.16.125.175,
      dst_mac=10:13:31:f1:39:76,
      dst_oui=10:13:31,
      dst_port=443,
      protocol=6,
      ip_version=4,
      vlan_id=0,
      tunnel_id=0,
      bidirectional_first_seen_ms=1590680387847,
      bidirectional_last_seen_ms=1590680387847,
      bidirectional_duration_ms=0,
      bidirectional_packets=1,
      bidirectional_bytes=770,
      src2dst_first_seen_ms=1590680387847,
      src2dst_last_seen_ms=1590680387847,
      src2dst_duration_ms=0,
      src2dst_packets=1,
      src2dst_bytes=770,
      dst2src_first_seen_ms=0,
      dst2src_last_seen_ms=0,
      dst2src_duration_ms=0,
      dst2src_packets=0,
      dst2src_bytes=0,
      bidirectional_min_ps=770,
      bidirectional_mean_ps=770.0,
      bidirectional_stddev_ps=0.0,
      bidirectional_max_ps=770,
      src2dst_min_ps=770,
      src2dst_mean_ps=770.0,
      src2dst_stddev_ps=0.0,
      src2dst_max_ps=770,
      dst2src_min_ps=0,
      dst2src_mean_ps=0.0,
      dst2src_stddev_ps=0.0,
      dst2src_max_ps=0,
      bidirectional_min_piat_ms=0,
      bidirectional_mean_piat_ms=0.0,
      bidirectional_stddev_piat_ms=0.0,
      bidirectional_max_piat_ms=0,
      src2dst_min_piat_ms=0,
      src2dst_mean_piat_ms=0.0,
      src2dst_stddev_piat_ms=0.0,
      src2dst_max_piat_ms=0,
      dst2src_min_piat_ms=0,
      dst2src_mean_piat_ms=0.0,
      dst2src_stddev_piat_ms=0.0,
      dst2src_max_piat_ms=0,
      bidirectional_syn_packets=0,
      bidirectional_cwr_packets=0,
      bidirectional_ece_packets=0,
      bidirectional_urg_packets=0,
      bidirectional_ack_packets=1,
      bidirectional_psh_packets=1,
      bidirectional_rst_packets=0,
      bidirectional_fin_packets=0,
      src2dst_syn_packets=0,
      src2dst_cwr_packets=0,
      src2dst_ece_packets=0,
      src2dst_urg_packets=0,
      src2dst_ack_packets=1,
      src2dst_psh_packets=1,
      src2dst_rst_packets=0,
      src2dst_fin_packets=0,
      dst2src_syn_packets=0,
      dst2src_cwr_packets=0,
      dst2src_ece_packets=0,
      dst2src_urg_packets=0,
      dst2src_ack_packets=0,
      dst2src_psh_packets=0,
      dst2src_rst_packets=0,
      dst2src_fin_packets=0,
      application_name=TLS,
      application_category_name=Web,
      application_is_guessed=0,
      application_confidence=6,
      requested_server_name=,
      client_fingerprint=957015a0b1e2500d8777219893a09495,
      server_fingerprint=,
      user_agent=,
      content_type=,
      flow_risk={'Unidirectional Traffic': {'risk_severity': 'Low', 'risk_score_total': 500, 'risk_score_client': 430, 'risk_score_server': 70}})
NFlow(id=2,
      expiration_id=0,
      src_ip=192.168.1.12,
      src_mac=28:37:37:00:6d:c8,
      src_oui=28:37:37,
      src_port=49886,
      dst_ip=104.27.129.77,
      dst_mac=10:13:31:f1:39:76,
      dst_oui=10:13:31,
      dst_port=443,
      protocol=6,
      ip_version=4,
      vlan_id=0,
      tunnel_id=0,
      bidirectional_first_seen_ms=1590680386576,
      bidirectional_last_seen_ms=1590680386576,
      bidirectional_duration_ms=0,
      bidirectional_packets=1,
      bidirectional_bytes=770,
      src2dst_first_seen_ms=1590680386576,
      src2dst_last_seen_ms=1590680386576,
      src2dst_duration_ms=0,
      src2dst_packets=1,
      src2dst_bytes=770,
      dst2src_first_seen_ms=0,
      dst2src_last_seen_ms=0,
      dst2src_duration_ms=0,
      dst2src_packets=0,
      dst2src_bytes=0,
      bidirectional_min_ps=770,
      bidirectional_mean_ps=770.0,
      bidirectional_stddev_ps=0.0,
      bidirectional_max_ps=770,
      src2dst_min_ps=770,
      src2dst_mean_ps=770.0,
      src2dst_stddev_ps=0.0,
      src2dst_max_ps=770,
      dst2src_min_ps=0,
      dst2src_mean_ps=0.0,
      dst2src_stddev_ps=0.0,
      dst2src_max_ps=0,
      bidirectional_min_piat_ms=0,
      bidirectional_mean_piat_ms=0.0,
      bidirectional_stddev_piat_ms=0.0,
      bidirectional_max_piat_ms=0,
      src2dst_min_piat_ms=0,
      src2dst_mean_piat_ms=0.0,
      src2dst_stddev_piat_ms=0.0,
      src2dst_max_piat_ms=0,
      dst2src_min_piat_ms=0,
      dst2src_mean_piat_ms=0.0,
      dst2src_stddev_piat_ms=0.0,
      dst2src_max_piat_ms=0,
      bidirectional_syn_packets=0,
      bidirectional_cwr_packets=0,
      bidirectional_ece_packets=0,
      bidirectional_urg_packets=0,
      bidirectional_ack_packets=1,
      bidirectional_psh_packets=1,
      bidirectional_rst_packets=0,
      bidirectional_fin_packets=0,
      src2dst_syn_packets=0,
      src2dst_cwr_packets=0,
      src2dst_ece_packets=0,
      src2dst_urg_packets=0,
      src2dst_ack_packets=1,
      src2dst_psh_packets=1,
      src2dst_rst_packets=0,
      src2dst_fin_packets=0,
      dst2src_syn_packets=0,
      dst2src_cwr_packets=0,
      dst2src_ece_packets=0,
      dst2src_urg_packets=0,
      dst2src_ack_packets=0,
      dst2src_psh_packets=0,
      dst2src_rst_packets=0,
      dst2src_fin_packets=0,
      application_name=TLS,
      application_category_name=Web,
      application_is_guessed=0,
      application_confidence=6,
      requested_server_name=,
      client_fingerprint=957015a0b1e2500d8777219893a09495,
      server_fingerprint=,
      user_agent=,
      content_type=,
      flow_risk={'Unidirectional Traffic': {'risk_severity': 'Low', 'risk_score_total': 500, 'risk_score_client': 430, 'risk_score_server': 70}})

Test Configuration:

  • OS version: Linux nixos-work 5.15.85 #1-NixOS SMP Wed Dec 21 16:36:38 UTC 2022 x86_64 GNU/Linux
  • Python version: 3.9.10
  • Hardware: ThinkPad T14s (nothing special)

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

@aouinizied
Copy link
Member

@stefanDeveloper Thanks a lot for your valuable contribution. I will review next week .

Thanks!
Zied

@smith558
Copy link
Contributor

smith558 commented Feb 9, 2023

This looks like a promising feature.

Copy link
Member

@aouinizied aouinizied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stefanDeveloper Can you please make these changes. I'm planning to merge it before the next release.

Many Thanks!

nfstream/flow.py Outdated Show resolved Hide resolved
nfstream/flow.py Outdated Show resolved Hide resolved
@smith558
Copy link
Contributor

smith558 commented Mar 1, 2023

@aouinizied @stefanDeveloper Should the flow risks be perhaps one hot encoded rather than in the current format? That would be way more suitable for any ML use. In that case, there could probably be a streamer attribute to enable/disable flow risks output.

The current format when exported to CSV is rather unusable.

@stefanDeveloper
Copy link
Author

@aouinizied @stefanDeveloper Should the flow risks be perhaps one hot encoded rather than in the current format? That would be way more suitable for any ML use. In that case, there could probably be a streamer attribute to enable/disable flow risks output.

The current format when exported to CSV is rather unusable.

Representing the risk itself as one-hot encoded is a valid point.
How would you represent risk_score_client, risk_score_server, and risk_score_total as one-hot encoded?

@aouinizied
Copy link
Member

@stefanDeveloper One hot encoding is part of the post processing. We do not have to deal with it at this point.
The idea is to fix what is missing in this PR and to make a new release with it.

@smith558
Copy link
Contributor

smith558 commented Mar 15, 2023

@stefanDeveloper One hot encoding is part of the post processing. We do not have to deal with it at this point. The idea is to fix what is missing in this PR and to make a new release with it.

That would normally be true. But because of the way how currently the flow_risk column dictionaries are built (no fixed format with varying keys and lengths and arbitrary nesting (even multiple level nesting) of further dictionaries), the post-processing is pretty much impossible or extremely difficult. I built the version with flow risks from the source and spent a whole week trying to post process and failed.

@drnpkr
Copy link
Member

drnpkr commented Mar 15, 2023

Can you quantify how much overhead flow risk calculation brings to NFStream's basic functionality?

@aouinizied
Copy link
Member

@smith558 I understand your point. Switching to a format where the key is the risk and the value is a pair of cli/src scores will make this easier (but still imperfect).
As I said to idea is to release a version with it. We can implement a function for post processing and make it available under utils or whatever for the community.

Zied

@aouinizied
Copy link
Member

@drnpkr This is not an easy question.

The overhead for a feature depends of several stages.

  • Compute the feature. Whatever it's a simple arithmetic or a ML model inference call or whatever? But see it as the computation cost.
  • Serialization
  • Transmit on IPC.
  • Deserialize it.

Today, NFstream relies on Pipe as IPC and default pickle for serialization.

A dummy feature that is stored somehow in a complex object can result in an overhead that is coming from serialization.
Same for statistical analysis where the computational cost is very low but the overhead is there due to serialization/deserialization.

We can assess later the overhead of risk analysis by disabling manually all export between streamer and meters replacing each meassage by the same dummy flow export.

@drnpkr
Copy link
Member

drnpkr commented Mar 15, 2023

Not easy, indeed, but a relevant one. You might remember that we observed a performance drop at higher speeds.

Having new features are fine but we should be aware of its costs.

Nonetheless, I agree that this can be evaluated at some later point.

A note shall be put somewhere that this can have certain limitations on the performance. At least until it hasn't been tested.

@stefanDeveloper
Copy link
Author

Any idea why the pipeline for windows is failing?

@stefanDeveloper
Copy link
Author

Have you had time review the changes to resolve your change requests?

Copy link
Member

@aouinizied aouinizied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a test for it?

@stefanDeveloper
Copy link
Author

@aouinizied I've rebased my branch to resolve the merge conflicts and added a tests.
Could you please have another look? I'm looking forward to integrate this feature!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants