SMS Spam Detection Using Machine Learning

This project is used a starting point for the course Release Engineering for Machine Learning Applications (REMLA) taught at the Delft University of Technology by Prof. Luís Cruz and Prof. Sebastian Proksch.

The codebase was originally adapted from: https://github.com/rohan8594/SMS-Spam-Detection

Instructions for Compiling

a) Clone repo.

$ git clone https://gitlab.com/nata1y/SMS-Spam-Detection
$ cd SMS-Spam-Detection
$ mkdir output
$ mkdir dataset

The easiest way to run our project is using the instructions located in b3!

b) Install dependencies.

$ python -m venv venv
$ source venv/bin/activate
$ pip install -r requirements.txt

b2) Alternatively, use Docker for dependencies and volumes.

$ docker build --progress plain . -t docker-sms
$ docker run -it --rm -v ${PWD}:/root/project -p "8080:8080" docker-sms
~# $ cd project

c) Run various scripts

$ python train_model/get_data.py
$ python train_model/read_data.py
$ python train_model/text_preprocessing.py
$ python train_model/text_classification.py

d) Serve the model as a REST API

NOTE: add host="0.0.0.0" parameter to app.run call in deploy_model/serve_model.py. (default 127.0.0.1 does not work in Docker)

$ python deploy_model/serve_model.py

b3) Or, use docker-compose and automatically train and host.

Be aware, the regression model is trained in this step and takes a while.

For Linux:

$ docker-compose -f docker-compose.train.yml build
$ docker-compose -f docker-compose.train.yml up -d && ./get_training_data.sh && docker-compose -f docker-compose.train.yml down

For Windows:

$ docker-compose -f docker-compose.train.yml build
$ docker-compose -f docker-compose.train.yml up -d && ./get_training_data.bat && docker-compose -f docker-compose.train.yml down

From now on, use this command to run the system without retraining everything.

docker-compose up --build

e) Production endpoint

Retrieves and splits the dataset from the first 1000 labels on which the model is trained. Generate the drifts based on the incoming data for experimentation. Get the predictions via HTTP requests from the model like in an actual deployment setup.

NOTE: to get predictions from inside another docker container use docker run -it --rm -v "$(pwd)":/root/project --net=host docker-sms, since the port is already opened for the server, but you want to connect to its local network. OR: if you use docker-compose run docker exec -it <container_id> bash to run the deploy script.

$ python production_endpoint/get_data.py
$ python production_endpoint/generate_drifts.py
$ python production_endpoint/get_predictions.py

You can test the API using the following:

$ curl -X POST "http://127.0.0.1:8080/predict" -H  "accept: application/json" -d "{sms: hello world!}"
or
$ curl -X POST "http://127.0.0.1:8080/predict" -H  "Content-Type: application/json" -d '{"sms": "hello world!"}'

Alternatively, you can access the UI using your browser: http://127.0.0.1:8080/apidocs To view Prometheus you can navigate to http://127.0.0.1:9090 To view Grafana you can navigate to http://127.0.0.1:3000/. You then need to link the Prometheus api as dataset by setting the host as http://prometheus:9090/ and add the relavent metrics to a new dashboard. These settings are then saved for future use.

Name		Name	Last commit message	Last commit date
Latest commit History 213 Commits
datadrift_detect		datadrift_detect
deploy_model		deploy_model
doc		doc
grafana-config		grafana-config
monitoring		monitoring
production_endpoint		production_endpoint
prometheus-config		prometheus-config
train_model		train_model
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.pylint.rc		.pylint.rc
Dockerfile		Dockerfile
Dockerfile.train		Dockerfile.train
README.md		README.md
docker-compose.train.yml		docker-compose.train.yml
docker-compose.yml		docker-compose.yml
get_training_data.bat		get_training_data.bat
get_training_data.sh		get_training_data.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
tox.ini		tox.ini

nata1y/SMS-Spam-Detection

Folders and files

Latest commit

History

Repository files navigation

SMS Spam Detection Using Machine Learning

Instructions for Compiling

About

Resources

Stars

Watchers

Forks

Languages