Skip to content

Distribution denial of service attack detection using machine-learning

Notifications You must be signed in to change notification settings

KolanHarsha/DDos-detection-Using-Machine-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 

Repository files navigation

DDos-detection-Using-Machine-Learning

What is DDOs(Distributed denial of service)?

A Distributed Denial of Service (DDoS) attack is a malicious attempt to disrupt the regular functioning of a network, service, website, or online resource by overwhelming it with a flood of internet traffic. In a DDoS attack, multiple compromised computers or devices (often referred to as "botnets") are coordinated to send an excessive volume of data requests or traffic to the target, making it difficult or impossible for legitimate users to access the targeted resource. A ddos attack usually occurs in layer-7(Application-layer),layer-4(Transport-layer) and layer-3(Network-layer) of the Networking model. In this work we try to detect a DDos attack in the layer-7 using machine-learning algorithms(Random-forests and Gradient-Boosting).

What is a Botnet

A botnet is a network of compromised computers or devices controlled by a single entity, often a cybercriminal or hacker, without the owners' knowledge. These compromised devices, referred to as "bots" or "zombies," can be infected with malware, allowing the attacker to commandeer them remotely. In DDoS attacks, botnets are used to amplify and distribute attack traffic. The attacker instructs the bots to simultaneously send a flood of requests to the target, overwhelming its resources. Since botnets can consist of thousands or even millions of devices, they generate a massive volume of traffic, making it difficult for the target to distinguish legitimate requests from the malicious ones.

Different types of DDos attacks in layer-7

Slowloris attack

A Slowloris attack is a type of DDoS attack that targets web servers. It works by opening multiple connections to the server and sending partial HTTP requests, keeping them open by sending data very slowly. This ties up server resources, preventing new connections and legitimate requests. Slowloris doesn't require a large number of attacking machines, making it hard to detect. It focuses on resource exhaustion, causing the server to become slow or unresponsive.

HTTP GET/POST flood attack

An HTTP GET/POST flood attack is a type of DDoS attack that targets web servers. Attackers send a massive number of GET or POST requests to overwhelm the server's capacity. GET requests retrieve data, while POST requests send data to the server, both tying up server resources. This flood of requests can slow down or crash the server, making the targeted website or application inaccessible.

Dataset description

The dataset has two sets balanced dataset and imbalanced dataset both with 84 features. The balanced dataset has 50% benign flows and 50% Ddos flows. The main goal of this work is to detect ddos attacks in application layers in which the attack traffic is in smaller proportion when compared to benign flows hence imbalnced dataset is used which has 83% benign flows and 17% Ddos flows. The total number of benign flows in imbalnced dataset are 6321980 and total number of Ddos flows in the imbalanced dataset are 1294529.

Screenshot 2023-10-09 215033

Tools

jup python azure

How to run the Notebook

Install Neccessary packages

Ip-Address:

pip install ipaddress

Numpy:

pip install numpy

Pandas:

pip install pandas

Matplotlib:

pip install matplotlib

Seaborn:

pip install seaborn

Scikit-learn:

pip install scikit-learn

Running on Azure Cloud Platform

  1. Go to the Azure Machine Learning platform and launch Azure Machine learning studio.
  2. Once the studio is launched go to the compute section and choose a compute instance. The compute instace which I choosed has the following specifications Standard_E8s_v3 (8 cores, 64 GB RAM, 128 GB disk).
  3. After the compute instance is created launch the jupyter notebook which can be found again in the compute section.
  4. Install the packages which I mentioned above and run the "Ddos.pynb" file.

Methodology

Random-Forests:

  1. A Random Forest is an ensemble model that builds upon the concepts of bagging and decision trees.
  2. It creates multiple decision trees, usually using a random subset of features for each tree.
  3. For each tree, it uses bootstrapped samples from the training data.
  4. During prediction, the Random Forest combines the outputs of all the trees, typically using majority voting for classification and averaging for regression.
  5. The key to the success of Random Forest is the diversity and randomness introduced by using both bootstrapped samples and random subsets of features for each tree.

Hyper-parameter Tuning:

  1. The model is trained using hyper-parameter tuning by implementing grid-search with 3-fold cross validation to prevent over-fitting.
Screenshot 2023-10-25 145406
  1. Once the grid-search is done best parameters are evaluated and the model is trained again using the best parameters.
image

Evaluating the performance of the Model

Accuracy: The model has an impressive accuracy of 99.9% on validation data with only 4 mis-classifications. Screenshot 2023-10-25 142608

Confusion-Matrix:

Screenshot 2023-10-09 215053
  1. True Positives (TP): 427,005 (DDoS cases correctly predicted as DDoS)

  2. True Negatives (TN): 2,070,680 (Normal cases correctly predicted as Normal)

  3. False Positives (FP): 0 (Normal cases incorrectly predicted as DDoS)

  4. False Negatives (FN): 4 (DDoS cases incorrectly predicted as Normal)

  5. Precision: Precision is the ratio of correctly predicted DDoS cases to all cases predicted as DDoS.

    Precision = TP / (TP + FP) = 427,005 / (427,005 + 0) = 1.0

    The precision is 1.0, indicating that when the model predicts an instance as DDoS, it is always correct.

  6. Recall (Sensitivity): Recall is the ratio of correctly predicted DDoS cases to all actual DDoS cases.

    Recall = TP / (TP + FN) = 427,005 / (427,005 + 4) ≈ 0.9999

    The recall is very close to 1.0, indicating that the model effectively identifies nearly all of the actual DDoS cases.

  7. F1-Score: The F1-score is the harmonic mean of precision and recall and provides a balance between the two metrics.

    F1-Score = 2 * (Precision * Recall) / (Precision + Recall)

    F1-Score = 2 * (1.0 * 0.9999) / (1.0 + 0.9999) ≈ 0.9999

    The F1-score is very close to 1.0, indicating that the model achieves a high balance between precision and recall.

Based on the precision, recall, and F1-score, the model appears to perform exceptionally well for DDoS detection. It has high precision, meaning that when it predicts an instance as DDoS, it is almost always correct, and it has high recall, indicating that it effectively captures nearly all the actual DDoS cases.

ROC-Curve: AUC(Area under ROC-Curve) measures the performance of a binary classification model, the Area under ROC-Curve is close to 1.0 which indicates the model almost classifies perfectly.

Screenshot 2023-10-09 215113

Top-10 Features

Screenshot 2023-10-09 215134
  1. Importance provides a score of the feature. Higher the score, major the role in making a decision in building a tree. The top 10 important features returned by trained RF-model are shown in above figure.

  2. "Src Ip","Dst Ip","Src Port","Dst Port", are in the top 10 features which makes sense as these four are combined to determine the "Flow ID" which describes the entire flow.

Contributors

Thanks for reading!