pytorch-phishing-rnn

This was made as part of a project for CS3244: Machine Learning, a module I took in National University of Singapore (NUS).

This repository contains two Jupyter Notebooks - each contains a Recurrent Neural Network (RNN) model that can take in an input URL and predict with relatively high accuracy whether it is likely to be a Phishing website or a Non-Phishing website. Each URL is broken down into 16 different attributes and then processed using the network in order to determine the probabilities for each class.

The first model is a standard RNN model that is trained and tested on a set of 10,988 URLs taken from the PhishTank Database (recent as of March 31, 2022). The second is an identical model, this time trained and tested on a combination of two datasets - the initial dataset from the PhishTank Database, as well as an additional second set of 11,000 URLs generated by a Generative Adversarial Network (GAN). The first model showed great accuracy (though bearing in mind the possibility of overfitting) over the test data, with the average accuracy ranging from 88% to 91%. The second model showed a lower accuracy as expected, with possibly noisy data from the GAN-generated URLs, with the average accuracy ranging from 72% to 78%.

The implementation is planned to be further updated to incorporate state-of-the-arts extensions and improvements in the future.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
src		src
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

.gitignore

.gitignore

README.md

README.md

environment.yml

environment.yml

Repository files navigation

pytorch-phishing-rnn

About

Releases

Packages

Languages

lhw-1/pytorch-phishing-rnn

Folders and files

Latest commit

History

Repository files navigation

pytorch-phishing-rnn

About

Resources

Stars

Watchers

Forks

Languages