Skip to content

lhw-1/pytorch-phishing-rnn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

pytorch-phishing-rnn

This was made as part of a project for CS3244: Machine Learning, a module I took in National University of Singapore (NUS).

This repository contains two Jupyter Notebooks - each contains a Recurrent Neural Network (RNN) model that can take in an input URL and predict with relatively high accuracy whether it is likely to be a Phishing website or a Non-Phishing website. Each URL is broken down into 16 different attributes and then processed using the network in order to determine the probabilities for each class.

The first model is a standard RNN model that is trained and tested on a set of 10,988 URLs taken from the PhishTank Database (recent as of March 31, 2022). The second is an identical model, this time trained and tested on a combination of two datasets - the initial dataset from the PhishTank Database, as well as an additional second set of 11,000 URLs generated by a Generative Adversarial Network (GAN). The first model showed great accuracy (though bearing in mind the possibility of overfitting) over the test data, with the average accuracy ranging from 88% to 91%. The second model showed a lower accuracy as expected, with possibly noisy data from the GAN-generated URLs, with the average accuracy ranging from 72% to 78%.

The implementation is planned to be further updated to incorporate state-of-the-arts extensions and improvements in the future.

About

Recurrent Neural Network (RNN) project for NUS CS module CS3244: Machine Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published