MachineLearning

Some fundamental machine learning and data analysis techniques are revisited here through practical projects.
Almost every project has been worked in Jupyter notebooks. The notebooks have also been converted into clean pdf files.

The Machine Learning Pipeline utilized for every project

Question and required data
Acquire the data
Data preprocessing
Prepare the data for the machine learning model
Train the model
Make predictions on the test data
Evaluate the model
If the performance is not satisfactory, adjust the model
Interpret the model and report results visually and numerically

Logistic Regression

Nudging customers to payed products by utilizing data produced by apps.
Companies often provide free premium product/services in an attempt to transition their customers to premium membership. In this case study, the services offered by a mobile app are examined. Customers have a 24 hour frame of free premium membership. Our goal is to determine which users are less likely to subscribe to the paid membership. In this way they can be targeted for further marketing.

Random Forests

Our goal is to predict the quality of wines given a bunch of features (acidity, density, pH etc.). The original paper uses SVN, NN, and MR(multiple regression). A model with random forests is here investigated which achieves a 97.31% accuracy.

SVN

We explore a scikit-learn dataset concerning malignant and begnin breast cancers. Our dataset consists of a feature vector of 30 features and 1 class describing whether the cancer is malignant or benign. Our goal is to train a machine learning algorithm to classify unseen samples.

SVM achieves the best result with 98.25%, followed by Logistic regression (lasso) with 97.37% and Random Forests with 95.61%. SVM in particular outperforms even the SVM model in this paper. This is probably due to the state-of-the-art algorithms of scikit-learn as a library but also due to the choice of a different kernel from the one used in the paper as a result of hyperparameter tuning.

kMeans Clustering

KMeans clustering belongs to a category called prototype-based clustering because each cluster is represented by a prototype, which is usually a centroid. It belongs to the class of unsupervised Machine Learning algorithms where the purpose is to scout for latent properties in the data. More information on the algorithm can be found here.

KMeans clustering and image compresion

The idea is that you can find n (32 for instance) clusters in the image and basically reduce all the 256^3 combinations of colors by creating a new image where the true input color is replaced by the color of the closest cluster. This is very feasible to apply since an image can just be thought of as a numpy array with a length equal to the height of the image and where each element is another array with length equal to the width of the image. Of course the width arrays just contain the RGB properties of the element (which is just a single pixel). The immediate downside of this approach is that the compresion comes at the cost of reducing the image quality.

Neural Nets

The task is to build an NN model which can predict autism. The model achieves a 99% acuracy rate on the testing set.

Deep Neural Nets

The task is to predict whether a bank customer will leave the bank given a bunch of features. The DL model was built using Keras and hyperparameter optimization was performed using talos

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
Autism Predictor		Autism Predictor
Customer Exit Predictor		Customer Exit Predictor
Disease Predictor		Disease Predictor
Image Compression		Image Compression
Subscribing Users		Subscribing Users
Wine Quality Predictor		Wine Quality Predictor
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autism Predictor

Autism Predictor

Customer Exit Predictor

Customer Exit Predictor

Disease Predictor

Disease Predictor

Image Compression

Image Compression

Subscribing Users

Subscribing Users

Wine Quality Predictor

Wine Quality Predictor

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

MachineLearning

The Machine Learning Pipeline utilized for every project

Table of Contents

Logistic Regression

Random Forests

SVN

kMeans Clustering

KMeans clustering and image compresion

Neural Nets

Deep Neural Nets

About

Releases

Packages

Languages

License

rlleshi/MachineLearning

Folders and files

Latest commit

History

Repository files navigation

MachineLearning

The Machine Learning Pipeline utilized for every project

Table of Contents

Logistic Regression

Random Forests

SVN

kMeans Clustering

KMeans clustering and image compresion

Neural Nets

Deep Neural Nets

About

Topics

Resources

License

Stars

Watchers

Forks

Languages