Skip to content

Edi-Codutti/Cybersec-Report-Analysis-Tools

Repository files navigation

Cybersec Analysis Tools

Brief introduction

This project is composed of four python scripts, each one with its own function:

  1. build_dictionary is an helper script that generates files used by the other scripts
  2. report_analyzer analizes a report (webpage, PDF or .txt file) and builds a MITRE ATT&CK Navigator layer containing the mentiond techniques
  3. scraper, as the name says, analyzes the content of CISA Cybersecurity Alerts and Advisories and creates or updates a CSV file with various information about every report on this website
  4. db_analyzer is a tool to analyze a CSV file generated by scraper

Required libraries and python versions

This project was built with python 3.10.12, so I recommend to use this version or a higher one.

For the required libraries, see the requirements.txt file.

You can install all the libraries by executing the following command:

python3 -m pip install -r requirements.txt

Basic Usage

Creating the necessary files

First, create the necessary files using: python3 build_dictionary.py.

This will generate a folder called TTLists in which there are some CSV files containing all the tactics or techniques given a specific matrix.

You now have two options:

Option 1: Generate a MITRE ATT&CK Navigator layer from a report

You can analyze a report using the report_analyzer script. The fastest way is to do the following:

python3 report_analyzer.py -i [report to analyze] -m [matrix] -t [techniques CSV file to use] -o [output file name]

Option 2: Scraping and analysis

You can scrape the content of the home page of CISA Cybersecurity Alerts and Advisories using the following command:

python3 scraper.py -t id

This will generate a file called db.csv containing some useful information from the CISA reports at the specified page.

You can then see which are the most frequent tactics and techniques mentioned in all the reports by issuing the following command:

python3 db_analyzer.py -i db.csv

More detailed script usage

build_dictionary.py

This script gathers the MITRE ATT&CK STIX data in order to fetch tactics and/or techniques ID and Name and puts them in CSV files all contained in a folder called TTLists.

A MITRE matrix must be specified and the generated file is called "[matrix_type]_tactics.csv" or "[matrix_type]_techniques.csv", where [matrix_type] is one between "enterprise", "mobile" or "ics"

Here are the options you can use:

  • -m: Choose a matrix type among (e)nterprise, (m)obile, (i)cs or (a)ll . In case 'all' is selected (which is the default case), an extra file containing all techniques and/or all tactics from all matrices called compendium_techniques.csv/compendium_tactics.csv is created
  • -g: Choose wether to generate dictionaries for techniques (t), tactics (T) or (a)ll. 'All' is the default case
  • -h, --help: Prints a help screen and exits

I recommend to launch this script with the default option values before proceeding with the execution of the other scripts in this project

report_analyzer.py

This script takes in input one or more reports in the form of a webpage, a PDF or a .txt file and builds a MITRE ATT&CK Navigator layer with all the mentioned techniques it can find.

Here are the options you can use:

  • -i: Input (or inputs) of the program. They can be either an URL or a path to a PDF file or txt file separated by a whitespace. If multiple reports are specified, a single layer containing all techniques from all reports is produced.
  • -m: Choose the layer matrix from enterprise (e), mobile (m) or ics (i)
  • -o: Specifies output filename. If not specified the output will be printed on stdout
  • -l: Specifies layer name. If not specified the layer will be called "layer"
  • -c: Specifies the color of the cells as a triplet of values from 0 to 255. The three numbers must be separated by a whitespace
  • -s: Search techniques in the text by technique ID (id), by technique name (name) or both (default)
  • -t: Specify a CSV file with the techniques to search in the report (compatible with the output files generated from build_dictionary)
  • -h, --help: Prints a help screen and exits

scraper.py

This script is made to scrape the content of CISA Cybersecurity Alerts and Advisories. From this website, every report of every page is analyzed and the following information are collected:

  1. Report ID
  2. Report Title
  3. Creation/Last Update Date
  4. URL of the report
  5. MITRE matrices found
  6. Tactics belonging to the Enterpise matrix found
  7. Tactics belonging to the ICS matrix found
  8. Tactics belonging to the Mobile matrix found
  9. Techniques belonging to the Enterpise matrix found
  10. Techniques belonging to the ICS matrix found
  11. Techniques belonging to the Mobile matrix found

For every report, then, report_analyzer is used to produce:

  1. MITRE ATT&CK Navigator layer concerning all Enterprise techniques found
  2. MITRE ATT&CK Navigator layer concerning all ICS techniques found
  3. MITRE ATT&CK Navigator layer concerning all Mobile techniques found

Then, for every report a row in a CSV file called db.csv is added with all the information listed above. If the file doesn't exist, then is created otherwise is updated. As a bonus, a folder called ScraperLayers containing all the layers generated by the invocations of report_analyzer is produced.

The usable options are:

  • -t: Search techniques in the reports by technique ID (id), by technique name (name) or both (default)
  • -T: Search tactics in the reports by tactic ID (id), by tactic name (name) or both (default)
  • -h, --help: Prints a help screen and exits

I recommend to set the technique search to id and tactic search to both for a good balance between precision and recall.

db_analyzer.py

This script is used to extract information from the db.csv file generated by the scraper. In particular, it can be used to count the number of occurrences of a given tactic or technique globally or for a specified year. A percentage of the ratio between the count and the total number of reports considered is also shown.

In order to work properly, before running this script you should run build_dictionary to generate all the tactics and all the technique files.

The available options are:

  • -i: Database input file in CSV format, result of the scraper
  • -y: Year to specify for the counting. If unspecified the program will count globally
  • -t: Tactic or technique to count. If unspecified it will count all tactics and all techniques and sort them in descending order
  • -h, --help: Prints a help screen and exits

Note for MAC users

A warning message may appear using build_dictionary.py, report_analyzer.py or scraper.py that says:

urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with [name of your library]. See: https://github.com/urllib3/urllib3/issues/3020

The various scripts should work anyway, but you can find discussions (and maybe possible solutions) here and here