Skip to content

kwantommy/breast-cancer-diagnosis

Repository files navigation

Breast cancer diagnosis using statistical techniques in Python (Scikit-learn, XGBoost, Keras)

  • includes application of grid search, SMOTE sampling, and visualization using principal component analysis
  • data used is from UCI Machine Learning Repository

For most models, k-fold cross validation was performed with grid search to find optimal model parameters.

XGBoost

XGBoost, or extreme gradient boosting, is a top machine learning model and the dominating algorithm among competitions.

Test accuracy achieved with XGBoost

Example tree

Feature importance

Training/validation error

Training/validation loss

Random Forest Classifier

Test accuracy achieved with RFC

Example tree

Support Vector Machine

  • note PCA was used to bring dimensionality down to 2 in order to plot the hyperplane
  • support vector machines classifies models by finding a hyperplane to separate classes while maximizing the margin distance from the classes
  • as a distance based method, SVM requires data normalization to ensure no features take precendence over others
  • documentation at https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC

Accuracy achieved with SVM

Linear decision boundary

Radial basis function decision boundary

K-Neighbours

Accuracy achieved with KNC

Nearest neighbours classification plot

Deep Learning

  • simple deep learning model used with residual connections similar to Resnet v1 to ensure "deep" networks train, at a minimum, as well as "shallow" networks
  • the goal of deep learning is to calculate weights and biases within the network using gradients and error functions
  • documentation at https://keras.io/

Model

Oversampled results

Non-oversampled results

About

Breast cancer diagnosis using multiple statistical/machine learning techniques: XGBoost, support vector machine, random forest, k-neighbours, and deep learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published