Project workflow:
-
Framing the problem:
- Cancer detection: Classification problem
- Choose an evaluation metric: recall
-
Getting the data:
- Use publicly available dataset: breast cancer (Wisconsin) dataset
-
Explore, prep and feature engineering:
- Missing data (we remove one column with missing data)
- Target class distribution
- Features distribution, data types, and charachteristics
- Feature correlation
- PCA for dimensonality reduction
- Feature scaling
-
Creating the model:
- Logistic regression and gridsearch CV to optimise hyperparameters
- Compare learning curves and cross validation score
- Select threshold for the highest recall (100%)
-
Presenting results:
- Results: Confusion metrix highlighting recall and precision
- Explaining the next steps and how the model will be used