Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding risk message validation++ #92

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

cmcginley-splunk
Copy link
Collaborator

@cmcginley-splunk cmcginley-splunk commented Jan 4, 2024

Context

  • As part of PEX-363, we wanted to expand integration testing to validate risk message content
  • If invalid fields are referenced in the risk message, they will fail to render silently within ES

Code changes

  • Added new RiskEvent model for the events returned by ES
  • Refined search criteria inCorrelationSearch to only search for events which match the appropriate search name
  • Risk validation expanded to check:
    • Risk message contains no $...$ literals
    • Risk message in detection matches risk message in risk event (regex conversion)
    • Risk score matches detection
    • Analytic stories match detection
    • MITRE ATT&CK IDs match detection
  • Added new NotableEvent model for the events returned by ES (will support additional notable validation, potentially added in the future)
  • Removed a bug where we we were sleeping for 60s initially twice (should remove 20h from our cumulative compute time, about 30min on our total runtime across 40 instances)
  • Refined cleanup logic
  • Refactored format_pbar_string s.t. it uses the start_time instance attribute if none is provided explicitly
  • Linted/formatted observable.py

New detection failures

I spot checked these, but did not do a deep dive on every single one; but I believe they are all legitimate validation issues

  • 59 detections are failing this new validation (tested on v4.29.0)
    • 51 detections create risk events where the risk message contains a $...$ literal -> represents a bad field substitution, likely because the referenced field doesn't exist in final SPL output
    • 8 detections have mismatches between the analytic stories listed in the detection and what's observed in the risk event -> spot checking shows this is mostly due to typos/casing in analytic story names in the detection which fails to link the actual analytic story to the risk event in ES

NOTE: this testing was performed locally, and some detections failed due to networking issues, likely due to my ISP bandwidth; so there may be more legitimate failures similar to the above not captured in this initial test

Testing

Will post some results from an SCA pipeline when that run completes

TODO

  • Disable extra logging in CorrelationSearch
  • Post results from SCA test pipelines

Future work

  • This PR also adds some commented out code which matches risk events to observables in the detection
  • This feature is still in progress as it does generate some false positives
  • That said, it did expose some legitimate issues (e.g. Attacker role observables creating risk events), so it will be a good addition once that validation logic has been refined to remove false positives.

@pyth0n1c pyth0n1c added the Draft label Feb 17, 2024
@cmcginley-splunk cmcginley-splunk changed the base branch from feature/add-integration-testing to main March 19, 2024 23:41
…ily with risk message checks and the scaffolding for forthcoming risk/observable matching
@@ -416,7 +412,7 @@ def test_detection(self, detection: Detection) -> None:
"""
# TODO: do we want to return a failure here if no test exists for a production detection?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pyth0n1c there is a check elsewhere that handles the case where a production detection lacks test cases, correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, contentctl validate fails today, when a production detection does not have tests: Verified by removing tests from production detection
image



# Suppress logging by default; enable for local testing
ENABLE_LOGGING = False
ENABLE_LOGGING = True
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will set this back to False after a final full test w/in an SCA pipeline

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmcginley-splunk : Should we set this back to false now that we have ran this a few times in SCA repo?

…ttern matching; added docstrings and JIRA ticket IDs
@cmcginley-splunk cmcginley-splunk changed the title Feature/risk message validation Adding risk message validation++ Apr 8, 2024
@cmcginley-splunk cmcginley-splunk marked this pull request as ready for review April 8, 2024 23:39
@cmcginley-splunk cmcginley-splunk self-assigned this Apr 8, 2024
@cmcginley-splunk
Copy link
Collaborator Author

Some detections seem to be generating false positives (test failures that are not actual failures); e.g.: Windows Excessive Disabled Services Event

Resolve conflicts associated with
update to Pydantic2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants