Skip to content

Latest commit



146 lines (94 loc) · 6.26 KB

File metadata and controls

146 lines (94 loc) · 6.26 KB

Web Vulnerability Detection with Deep Learning

This is a detection method that using combine Convolutional Neural Network (CNN) and a family of Recurrent Neural Network (RNN) to analyze features and relationships in requests from users and predict whether they are vulnerability or not.

Model Architecture

This is a compact architectural model with two channels. For channel A, I using three layer include Conv1D - MaxPooling - GlobalMaxPooling. And for channel B, I using two layer of the RNN family (RNN, LSTM, GRU). With extremely large data sets, the model can scale with multiple channels and multiple layers to be able to respond to the size of the dataset.


Vulnerabilities Detection

  • Cross-Site Scripting
  • SQL Injection
  • Path Traversal (LFI)
  • Command Injection
  • Remote File Inclusion (RFI)
  • Json & XML Injection
  • HTML5 Injection
  • Server Side Includes (SSI) Injection


The training dataset is split 70:30 for training and testing. With 70% of the district training, I use k-fold cross validation with k=5 to train the model.

Dataset Sample Access
CISC2010 61065 (SQLi, XSS, CSRF, ...) Public
HTTPPram 31066 -> 10852(SQLi) 532(XSS) 89(CMDi) 290(LFI) Public
Shah's 44605 -> 13686(XSS) 30919(SQLi) Public
Generate Dataset 592479 -> 331129 (Normal) 261350 (Abnormal) Private

Data Decoder

The decoder was built with multiple decode layers including base64 - URL - Unicode - utf8 - clean data - ....

Original Decoded
<object data="data:text/html;base64,PHNjcmlwdD5hbGVydCgxKTwvc2NyaXB0Pg=="></object> <objectdata="data:text/html;base64,<script>alert(1)</script>"></object>

Data Processing

Using SentenceTransformers. A Python framework for state-of-the-art sentence, text and image embeddings.

Original Encoder
/etc/mixmaster/remailer/pgponly.hlp [-2.79157665e-02 7.86799937e-02 -1.95519626e-02 -4.09332477e-02 9.84075591e-02 -8.66753384e-02 -4.61700819e-02 -2.39454824e-02 ...]

Model Summary

Model: "model_3"
 Layer (type)                Output Shape                 Param #   Connected to
 input_4 (InputLayer)        [(None, 384)]                0         []

 reshape_3 (Reshape)         (None, 384, 1)               0         ['input_4[0][0]']

 conv1d_15 (Conv1D)          (None, 382, 32)              128       ['reshape_3[0][0]']

 max_pooling1d_15 (MaxPooli  (None, 380, 32)              0         ['conv1d_15[0][0]']

 conv1d_16 (Conv1D)          (None, 378, 64)              6208      ['max_pooling1d_15[0][0]']

 max_pooling1d_16 (MaxPooli  (None, 376, 64)              0         ['conv1d_16[0][0]']

 conv1d_17 (Conv1D)          (None, 374, 128)             24704     ['max_pooling1d_16[0][0]']

 max_pooling1d_17 (MaxPooli  (None, 372, 128)             0         ['conv1d_17[0][0]']

 conv1d_18 (Conv1D)          (None, 370, 256)             98560     ['max_pooling1d_17[0][0]']

 gru_15 (GRU)                (None, 384, 32)              3360      ['reshape_3[0][0]']

 max_pooling1d_18 (MaxPooli  (None, 368, 256)             0         ['conv1d_18[0][0]']

 gru_16 (GRU)                (None, 384, 64)              18816     ['gru_15[0][0]']

 conv1d_19 (Conv1D)          (None, 366, 512)             393728    ['max_pooling1d_18[0][0]']

 gru_17 (GRU)                (None, 384, 128)             74496     ['gru_16[0][0]']

 max_pooling1d_19 (MaxPooli  (None, 364, 512)             0         ['conv1d_19[0][0]']

 gru_18 (GRU)                (None, 384, 256)             296448    ['gru_17[0][0]']

 global_max_pooling1d_3 (Gl  (None, 512)                  0         ['max_pooling1d_19[0][0]']

 gru_19 (GRU)                (None, 512)                  1182720   ['gru_18[0][0]']

 dropout_9 (Dropout)         (None, 512)                  0         ['global_max_pooling1d_3[0][0]

 dropout_10 (Dropout)        (None, 512)                  0         ['gru_19[0][0]']

 multiply_3 (Multiply)       (None, 512)                  0         ['dropout_9[0][0]',

 dropout_11 (Dropout)        (None, 512)                  0         ['multiply_3[0][0]']

 dense_18 (Dense)            (None, 512)                  262656    ['dropout_11[0][0]']

 dense_19 (Dense)            (None, 256)                  131328    ['dense_18[0][0]']

 dense_20 (Dense)            (None, 128)                  32896     ['dense_19[0][0]']

 dense_21 (Dense)            (None, 64)                   8256      ['dense_20[0][0]']

 dense_22 (Dense)            (None, 32)                   2080      ['dense_21[0][0]']

 dense_23 (Dense)            (None, 1)                    33        ['dense_22[0][0]']

Total params: 2536417 (9.68 MB)
Trainable params: 2536417 (9.68 MB)
Non-trainable params: 0 (0.00 Byte)


1852/1852 [==============================] - 86s 46ms/step - loss: 0.0560 - accuracy: 0.9815
1852/1852 [==============================] - 81s 44ms/step
Accuracy: 98.15%
              precision    recall  f1-score   support

           0       0.98      0.99      0.98     33162
           1       0.98      0.98      0.98     26086

    accuracy                           0.98     59248
   macro avg       0.98      0.98      0.98     59248
weighted avg       0.98      0.98      0.98     59248

Latest trained model on 11-11-2023