Show prevalence of rules in the output #1737

Aayush-Goel-04 · 2023-08-19T10:05:03Z

relates to #520

Checklist

No CHANGELOG update needed

No new tests needed

No documentation update needed

Update default.py Update default.py update color format Update default.py

…Goel-04/capa into Aayush-Goel-04/Issue#520

Update default.py Update CHANGELOG.md

capa/render/default.py

Aayush-Goel-04 · 2023-08-19T10:51:32Z

we can add a prompt at bottom to reference what unknown means

mr-tz

Good start!
Do you have an idea on if/how to display this data for the other output modes (verbose, very verbose, and also JSON)?

assets/rules_prevalence.pickle

capa/render/default.py

try.py

Aayush-Goel-04 · 2023-08-27T10:11:33Z

We can also add create a new field prevalence to RuleMetadata or RuleMatches.
We can directly store prevalence : rare | common | unknown (if not found) while building resultDocument, in this way while rendering we will only need slight modifications to json, -v, -vv and default render modes.

capa/capa/render/result_document.py

Lines 559 to 575 in 9d21add

    
           class ResultDocument(FrozenModel): 
        
               meta: Metadata 
        
               rules: Dict[str, RuleMatches] 
        
               @classmethod 
        
               def from_capa(cls, meta: Metadata, rules: RuleSet, capabilities: MatchResults) -> "ResultDocument": 
        
                   rule_matches: Dict[str, RuleMatches] = {} 
        
                   for rule_name, matches in capabilities.items(): 
        
                       rule = rules[rule_name] 
        
                       if rule.meta.get("capa/subscope-rule"): 
        
                           continue 
        
                       rule_matches[rule_name] = RuleMatches( 
        
                           meta=RuleMetadata.from_capa(rule), 
        
                           source=rule.definition, 
        
                           matches=tuple(

What are your thoughts @mr-tz

Delete try.py, rules_prevalence.pickle

capa/render/default.py

Co-authored-by: Moritz <mr-tz@users.noreply.github.com>

mr-tz · 2023-08-28T09:20:46Z

We can also add create a new field prevalence to RuleMetadata or RuleMatches.

That could work well if we find a place that requires few modifications and is flexible. I think we'd want to keep prevalence data and rule information separate (with a separate DB as you're proposing here).

Aayush-Goel-04 · 2023-09-06T14:49:19Z

for verbose we can do as follows

receive data (2 matches)
namespace    communication
description    all known techniques for receiving data from a potential C2 server
prevalence    common
scope            function
matches        0x10003A13

we can do similar for vverbose

capa/render/default.py

capa/render/result_document.py

williballenthin · 2023-11-14T11:15:46Z

capa/render/result_document.py

+    CD = Path(__file__).resolve().parent.parent.parent
+    file = CD / "assets" / "rules_prevalence_data" / "rules_prevalence.json.gz"


use get_default_root()

capa/capa/main.py

Line 444 in 210a13d

def get_default_root() -> Path:

using get_default_root works well locally but it cause circular import when being during pyinstaller build.
@williballenthin I suggest moving such functions to capa.helpers.

moving it makes sense (see #1821 also)

@mr-tz I suggest we move ahead with proposal 3 in above mentioned PR.
moving below to a new capa.loader or we can move them to capa.helper

has_file_limitation is_supported_format is_supported_arch get_arch is_supported_os get_os is_running_standalone get_default_root get_default_signatures get_workspace get_extractor get_file_extractors get_signatures get_sample_analysis collect_metadata compute_dynamic_layout compute_static_layout compute_layout

sounds good to me!

capa/render/result_document.py

williballenthin · 2023-11-14T11:22:11Z

capa/render/result_document.py

@@ -521,6 +544,7 @@ def from_capa(cls, rule: capa.rules.Rule) -> "RuleMetadata":
        return cls(
            name=rule.meta.get("name"),
            namespace=rule.meta.get("namespace"),
+            prevalence=load_rules_prevalence().get(rule.meta.get("name"), "unknown"),


is the rule prevalence database distributed with capa the library? i think its important that people be able to use capa the library without maintaining this database. so perhaps we want to handle the case of the database not existing here?

In case database is not present, all rule matches will have prevalence as unknown in the results.

maybe we can provide a warning if no db is found (in case that's not already there) pointing to one and explaining shortly what it does

mr-tz · 2023-11-17T10:02:53Z

capa/render/result_document.py

+    if not file.exists():
+        return {}
+    with gzip.open(file, "rb") as gzfile:
+        return json.loads(gzfile.read().decode("utf-8"))


while we're at it, is it worth defining a pydantic data model for the DB file/format?

looks like the format is dict[rule name, prevalence] which will be hard to represent in pydantic, unless we enumerate all the rule names as potential values. i think the type hint above is a good start. still, adding some comments here showing a snippet of the file would be valuable.

Comments on loading rules_prevalence and warning if file not found

This reverts commit 66d0ab7.

Aayush-Goel-04 · 2024-02-03T16:16:24Z

Apologies for disappearing there for a bit – college placement stuff got pretty intense.

Back to PR, Tests are failing in pyinstaller due circular import when trying to fetch path for rules_prevalence database using get_default_root , while trying to load_rules_prevalence in result_document. What we can do is -

move get_default_root to capa.helpers.
We can move prevalence database to python files just like we did with COM database.

What are ur thoughts @mr-tz @williballenthin

mr-tz · 2024-02-05T09:47:47Z

I think moving to Python files analogous to the COM DB files sounds good.

There's been a bunch of changes recently on the API, so please ensure the PR is up to date with master.

Aayush-Goel-04 · 2024-02-05T16:45:35Z

I have converted the database to python file. Now we just need the actual prevalence values for rules, and this will be good to go.

VascoSch92 · 2024-03-13T14:09:02Z

capa/render/result_document.py

+    """
+    Load and return a dictionary containing prevalence information for rules defined in capa.
+
+    Returns:


Suggested change

Returns:

Return:

Aayush-Goel-04 added 12 commits July 29, 2023 11:41

Entropy Methods

7603f85

Merge branch 'mandiant:master' into Aayush-Goel-04/Issue#520

f5b38d5

Sort rules in render based on match probability

bf1f59b

Rendering rules into two sections. * for interesting rules.

31bd6b3

Merge branch 'mandiant:master' into Aayush-Goel-04/Issue#520

9ca4f9d

Merge branch 'mandiant:master' into Aayush-Goel-04/Issue#520

78877f2

update

f5f3e87

Update default.py Update default.py update color format Update default.py

Merge branch 'mandiant:master' into Aayush-Goel-04/Issue#520

a6797de

Update default.py

0b5a326

Merge branch 'Aayush-Goel-04/Issue#520' of https://github.com/Aayush-…

def2d98

…Goel-04/capa into Aayush-Goel-04/Issue#520

Update utils.py

039fdbd

Update default.py Update CHANGELOG.md

Merge branch 'master' into Aayush-Goel-04/Issue#520

8a0e61b

Aayush-Goel-04 commented Aug 19, 2023

View reviewed changes

capa/render/default.py Outdated Show resolved Hide resolved

mr-tz reviewed Aug 23, 2023

View reviewed changes

assets/rules_prevalence.pickle Outdated Show resolved Hide resolved

assets/rules_prevalence.pickle Outdated Show resolved Hide resolved

capa/render/default.py Outdated Show resolved Hide resolved

try.py Outdated Show resolved Hide resolved

Update default.py

f6058b1

Delete try.py, rules_prevalence.pickle

Aayush-Goel-04 force-pushed the Aayush-Goel-04/Issue#520 branch from cdcd32f to f6058b1 Compare August 27, 2023 10:57

Merge branch 'master' into Aayush-Goel-04/Issue#520

dc399c3

Aayush-Goel-04 force-pushed the Aayush-Goel-04/Issue#520 branch from 26e4096 to 712cee3 Compare August 27, 2023 11:15

prevalence db update

c5302cd

Aayush-Goel-04 force-pushed the Aayush-Goel-04/Issue#520 branch from 712cee3 to c5302cd Compare August 27, 2023 11:36

Update default.py

430bde6

mr-tz reviewed Aug 28, 2023

View reviewed changes

capa/render/default.py Outdated Show resolved Hide resolved

Update capa/render/default.py

7f1566d

Co-authored-by: Moritz <mr-tz@users.noreply.github.com>

Aayush-Goel-04 added 2 commits September 6, 2023 16:35

Merge branch 'mandiant:master' into Aayush-Goel-04/Issue#520

24541b6

updated default render

6787555

Update utils.py

7c84926

mr-tz reviewed Nov 12, 2023

View reviewed changes

capa/render/default.py Outdated Show resolved Hide resolved

williballenthin reviewed Nov 14, 2023

View reviewed changes

capa/render/result_document.py Outdated Show resolved Hide resolved

williballenthin reviewed Nov 14, 2023

View reviewed changes

capa/render/result_document.py Outdated Show resolved Hide resolved

williballenthin reviewed Nov 14, 2023

View reviewed changes

capa/render/result_document.py Outdated Show resolved Hide resolved

williballenthin reviewed Nov 14, 2023

View reviewed changes

capa/render/result_document.py Outdated Show resolved Hide resolved

williballenthin reviewed Nov 14, 2023

View reviewed changes

Aayush-Goel-04 added 2 commits November 16, 2023 14:54

Imports, Paths, Comments & Exceptions handled

5102ca1

Update result_document.py

07553a6

mr-tz reviewed Nov 17, 2023

View reviewed changes

Update result_document.py

2c4931d

Comments on loading rules_prevalence and warning if file not found

mr-tz added the dont merge Indicate a PR that is still being worked on label Jan 31, 2024

Aayush-Goel-04 added 5 commits February 3, 2024 19:08

Merge branch 'master' into Aayush-Goel-04/Issue#520

c531a15

Added prevalence to verbose

61e7459

linter checks

66d0ab7

Revert "linter checks"

e3ca32b

This reverts commit 66d0ab7.

Update result_document.py

f084040

Aayush-Goel-04 added 3 commits February 5, 2024 18:38

Merge branch 'master' into Aayush-Goel-04/Issue#520

b07d600

Convert database to python files

10d2140

Lint checks

9bebffc

Aayush-Goel-04 requested review from williballenthin and mr-tz February 5, 2024 16:44

Aayush-Goel-04 and others added 4 commits February 25, 2024 06:10

Delete rules_prevalence.json.gz

fa89f44

Merge branch 'mandiant:master' into Aayush-Goel-04/Issue#520

d93f135

Merge branch 'mandiant:master' into Aayush-Goel-04/Issue#520

08ea4a9

Merge branch 'master' into Aayush-Goel-04/Issue#520

7992b1b

VascoSch92 reviewed Mar 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Show prevalence of rules in the output #1737

Show prevalence of rules in the output #1737

Aayush-Goel-04 commented Aug 19, 2023 •

edited

Aayush-Goel-04 commented Aug 19, 2023

mr-tz left a comment

Aayush-Goel-04 commented Aug 27, 2023 •

edited

mr-tz commented Aug 28, 2023

Aayush-Goel-04 commented Sep 6, 2023

williballenthin Nov 14, 2023

Aayush-Goel-04 Nov 16, 2023 •

edited

mr-tz Nov 17, 2023

Aayush-Goel-04 Nov 26, 2023

mr-tz Nov 26, 2023

williballenthin Nov 14, 2023

Aayush-Goel-04 Nov 16, 2023

mr-tz Nov 17, 2023

mr-tz Nov 17, 2023

williballenthin Nov 17, 2023

Aayush-Goel-04 commented Feb 3, 2024 •

edited

mr-tz commented Feb 5, 2024

Aayush-Goel-04 commented Feb 5, 2024

VascoSch92 Mar 13, 2024

		CD = Path(__file__).resolve().parent.parent.parent
		file = CD / "assets" / "rules_prevalence_data" / "rules_prevalence.json.gz"

Show prevalence of rules in the output #1737

Are you sure you want to change the base?

Show prevalence of rules in the output #1737

Conversation

Aayush-Goel-04 commented Aug 19, 2023 • edited

Checklist

Aayush-Goel-04 commented Aug 19, 2023

mr-tz left a comment

Choose a reason for hiding this comment

Aayush-Goel-04 commented Aug 27, 2023 • edited

mr-tz commented Aug 28, 2023

Aayush-Goel-04 commented Sep 6, 2023

Choose a reason for hiding this comment

Aayush-Goel-04 Nov 16, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Aayush-Goel-04 commented Feb 3, 2024 • edited

mr-tz commented Feb 5, 2024

Aayush-Goel-04 commented Feb 5, 2024

Choose a reason for hiding this comment

Aayush-Goel-04 commented Aug 19, 2023 •

edited

Aayush-Goel-04 commented Aug 27, 2023 •

edited

Aayush-Goel-04 Nov 16, 2023 •

edited

Aayush-Goel-04 commented Feb 3, 2024 •

edited