Skip to content

Phishers use the websites which are visually similar to those real websites. So, we developed this website so that user can know whether the URL is phishing or not before using it. URL -

Notifications You must be signed in to change notification settings

asrith-reddy/Phishing-detector

Repository files navigation

Phishing URL Detection

image2

image1

Objective

A phishing website is a common social engineering method that mimics trustful uniform resource locators (URLs) and webpages. The objective of this project is to train machine learning models and deep neural nets on the dataset created to predict phishing websites. Both phishing and benign URLs of websites are gathered to form a dataset and from them required URL and website content-based features are extracted. The performance level of each model is measures and compared.

Installation

To install the required packages and libraries, run this command in the project directory after Forking and cloning this repository:

pip install -r requirements.txt

Technologies Used

Feature Extraction

The system starts by retrieving URLs to be checked for phishing. These URLs can be collected from user input in the webpage created. Once the URLs are obtained, the system extracts relevant features from the web pages. These features are essential for training and evaluating the machine learning models. Various features were extracted from the URL database based on Domain, HTML and Address bar of the URLs.

Machine Learning Models

Various machine learning models are compared and The machine learning model with high accuracy is selected which predicts whether the URL is a phishing site or not. It provides a probability score or a binary classification (phishing or not phishing) based on the trained model's decision boundary. The system categorize URLs into "phishing" or "legitimate" and the result is finally displayed on the webpage.

Refer Phishingproject.ipynb for more details.

Result

Accuracy of various model used for URL detection


ML Model Accuracy f1_score Recall Precision
0 Gradient Boosting Classifier 0.974 0.977 0.994 0.986
1 CatBoost Classifier 0.972 0.975 0.994 0.989
2 Multi-layer Perceptron 0.969 0.973 0.995 0.981
3 Random Forest 0.967 0.971 0.993 0.990
4 Support Vector Machine 0.964 0.968 0.980 0.965
5 Decision Tree 0.960 0.964 0.991 0.993
6 K-Nearest Neighbors 0.956 0.961 0.991 0.989
7 Logistic Regression 0.934 0.941 0.943 0.927
8 Naive Bayes Classifier 0.605 0.454 0.292 0.997

Conclusion

  1. The final take away form this project is to explore various machine learning models, perform Exploratory Data Analysis on phishing dataset and understanding their features.
  2. Creating this notebook helped me to learn a lot about the features affecting the models to detect whether URL is safe or not, also I came to know how to tuned model and how they affect the model performance.
  3. The final conclusion on the Phishing dataset is that the some feature like "HTTTPS", "AnchorURL", "WebsiteTraffic" have more importance to classify URL is phishing URL or not.
  4. Gradient Boosting Classifier currectly classify URL upto 97.4% respective classes and hence reduces the chance of malicious attachments.

About

Phishers use the websites which are visually similar to those real websites. So, we developed this website so that user can know whether the URL is phishing or not before using it. URL -

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published