Malicious Web Content Detection using Machine Learning

Install all the required packages using the following command - pip install -r requirements.txt. Make sure your pip is consistent with the Python version you are using by typing pip -V.
Move the project folder to the correct localhost location. For eg. /Library/WebServer/Documents in case of Macs.
(If you are using a Mac) Give permissions to write to the markup file sudo chmod 777 markup.txt.
Modify the path of your Python 2.x installation in clientServer.php.
(If you are using anything other than a Mac) Modify the localhost path in features_extraction.py to your localhost path (or host the application on a remote server and make the necessary changes).
Go to chrome://extensions, activate developer mode, click on load unpacked and select the 'Extension' folder from our project.
Now, you can go to any web page and click on the extension in the top right panel of your Chrome window. Click on the 'Safe of not?' button and wait for a second for the result.
Done!

Naive users using a browser have no idea about the back-end of the page. The users might be tricked into giving away their credentials or downloading malicious data.
Our aim is to create an extension for Chrome which will act as middleware between the users and the malicious websites, and mitigate the risk of users succumbing to such websites.
Further, all harmful content cannot be exhaustively collected as even that is bound to continuous development. To counter this we are using machine learning - to train the tool and categorize the new content it sees every time into the particular categories so that corresponding action can be taken.

A few snapshots of our system being run on different webpages -

Fig 1. A safe website - www.spit.ac.in (College website)

Fig 2. A phishing website which looks just like Google Drive.

Fig 3. A phishing website which looks just like Dropbox

Fig 4. A safe website - www.google.com

Name		Name	Last commit message	Last commit date
Latest commit History 221 Commits
.github		.github
Extension		Extension
Other Information		Other Information
classifier		classifier
dataset		dataset
docs		docs
tst		tst
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
clientServer.php		clientServer.php
data_validation.py		data_validation.py
features_extraction.py		features_extraction.py
patterns.py		patterns.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

philomathic-guy/Malicious-Web-Content-Detection-Using-Machine-Learning