Phishing-website-prediction

Phishing is a type of fraud wherein an attacker impersonates a reputable company or person in order to get sensitive information such as login credentials or account information via email or other communication channels. Phishing is popular among attackers because it is easier to persuade someone to click a malicious link that appears to be authentic than it is to break through a computer's protection measures.

This project presents the applicability of Machine Learning algorithms in predicting phishing attacks and concludes the positives and negatives. There are many Machine Learning algorithms that are inspected to present the best machine learning algorithm to predict and prevent the phishing attacks. We have extracted features from the URL like Address Bar based Features, Abnormal Based Features, HTML & JavaScript based Features and Domain based Features. We also present the numerical results of the machine learning algorithms like Decision Tree, Random Forest, Support Vector Machine, K-Nearest Neighbour, Logistic Regression, Naive Bayes, AdaBoost Classifier and Hybrid Ensembler. And determine if a website is legitimate or not.

Proposed Architecture

In our work we use different types of features from the given URL like:

• URL-Based Features: The URL is the first thing to look at when determining whether or not a website is phishing. As previously stated, phishing domain URLs contain several distinguishing characteristics. When the URL is parsed, features linked to these points are acquired.

• Page-Based Features: Page-Based Features make use of data from reputation ranking systems to calculate information about pages. Some of these characteristics indicate how trustworthy a website is.

• Content-Based Features: Obtaining these features necessitates an active scan of the target domain. The contents of the page are analysed to see if the target domain is being used for phishing.

And we will be using different machine learning algorithms like • Logistic Regression • Naive Bayes • K-Nearest Neighbours • Decision Tree • SVM • Random Forest • AdaBoost • Hybrid Ensembler And compare the error percent and execution time and propose the best algorithm suited with the additional features we think can be useful to accurately predict.

Conclusion

Phishing is a serious threat to users' personal information these days. Because detecting phishing websites is a time-consuming task, the number of phishers is continually expanding. To overcome the issue, researchers and experts worked on many approaches and techniques, but it resulted in low rates of detection. We used a range of algorithm in our work. Out of which Hybrid Ensembler gave the maximum training accuracy of “0.9863” and test accuracy of “0.9620” and minimum training accuracy of “0.7096” and test accuracy of “0.7015” as shown in the table below. Hence our Hybrid Ensembler is the best classifier for predicting phishing websites.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Final_Report.pdf		Final_Report.pdf
PhishingDomainDetection.ipynb		PhishingDomainDetection.ipynb
README.md		README.md
features.ipynb		features.ipynb
finalized_model.sav		finalized_model.sav
phishing_detection.csv		phishing_detection.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Final_Report.pdf

Final_Report.pdf

PhishingDomainDetection.ipynb

PhishingDomainDetection.ipynb

README.md

README.md

features.ipynb

features.ipynb

finalized_model.sav

finalized_model.sav

phishing_detection.csv

phishing_detection.csv

Repository files navigation

Phishing-website-prediction

Proposed Architecture

Conclusion

About

Releases

Packages

Languages

laxmanbalaraman/Phishing-website-prediction

Folders and files

Latest commit

History

Repository files navigation

Phishing-website-prediction

Proposed Architecture

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Languages