Skip to content

mohamedbenchikh/MDML

Repository files navigation

Malware Detection using machine learning (MDML)

Running it locally

streamlit run app.py

Author

Mohamed Benchikh

Analysis modules:

  • Static: Features are extracted from PE file headers (mainly Optional Header), Yara rules and digital signature. Static Analysis

  • Dynamic: Features are the API calls traced using Cuckoo Sandbox Dynamic Analysis

Datasets construction

  • Static

Malware samples were acquired from MalwareBazaar while benign samples were acquired from multiple online hosting websites (ie. CNET) we then used pefile module in Python to parse PE headers and extract relevant features (chosen using benchmarks), we also used Yara capabilities, digital signature, and packing as features

  • Dynamic

we tweaked the APIMDS dataset from hksecurity and changed it from a dataset of API calls sequences to a dataset of binary values with predetermined features

Algorithm used

We compared multiple algorithms using a 10-Fold stratified cross validation process algorithm, we settled on Extreme Gradient Boosting (XGBoost) classification algorithm as it had the highest F1 score

Project interfaces

Static

Static interface

Dynamic

Dynamic interface