Skip to content

Deep Neural Network built from scratch to tackle ML binary classification predicting Titanic survival (Kaggle Competition). Best DNN AdamW 77.03% and best RF 78.95%.

Notifications You must be signed in to change notification settings

IasonC/Titanic-ML-Kaggle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Titanic: Machine Learning from Disaster

Data Cleaning & Processing

The first step of the model is data visualisation and analysis with pandas. To predict survivability, the following features from the Kaggle dataset were used: ["Pclass", "Sex", "Age", "SibSp", "Parch", "Fare"]. Sex is a binary categorical variable so it was converted to a quantitative variable by one-hot encoding. These features were selected because they were the most highly-correlated with Survivability in the training set. The "Cabin" variable was dumped because it was categorical, non-binary and missing too much data, meaning it would be hard to impute accurately without inducing noise.

Further, it was found by pandas analysis that the Age variable was missing datapoints. The data was imputed via Multiple Imputation by Chained Equations (MICE), which is a robust imputation algorithm.

Deep Neural Network (DNN) solution

Hyperparameter-tuned DNN with AdamW optimiser (with weight decay regularisation), early stoppage, input data normalisation and holdout cross-validation.

α = 0.00014341, λ = 0.009, layer_dims = [6, 15, 10, 5, 1].

Random Forest Classifer (RF) solution

Random forest classifier built using sklearn. As shown in the code, the RF was tuned to the best depth and other hyperparameters using the MAE performance metric from sklearn.

Discussion

From online reading, it seems that 80%+ performance on the test set is considered very good. The NN was slightly outperformed by the RF, which means that there is future work in identifying a better set of hyperparameters α, λ, layer_dims (and also β1, β2, ε) to outperform RF.

About

Deep Neural Network built from scratch to tackle ML binary classification predicting Titanic survival (Kaggle Competition). Best DNN AdamW 77.03% and best RF 78.95%.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages