Skip to content

wise-saint/Breast-Cancer-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Breast-Cancer-Detection

Breast cancer detection using machine learning models.

1. Dataset

We used the UCI Machine Learning Repository.
Link: http://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+%28diagnostic%29
The dataset was created by Dr. William H. Wolberg, physician at the University Of Wisconsin Hospital at Madison, Wisconsin, USA.

2. Programming Language, Libraries and IDE

Programming Language: Python 3
Libraries: pandas, numpy, seaborn, and sklearn
IDE: Jupyter Notebook

3. Basic Mathematics

3.1 Mean

Mean is the average of the given numbers and is calculated by dividing the sum of the given numbers by the total number of numbers.
Mean of a random varibale X, μ = Σ(Xi)/n

3.2 Standard Deviation

Standard deviation is a measure of how dispersed the data is in relation to the mean.
Standard deviation of a population X, σ = (Σ(Xi - μ)2/n)1/2

3.3 Correlation

Correlation describes the strength of association between two variables.
Pearson correlation coefficient between two random variables X and Y can be calculated by the formula:
Img 1

3.4 Standarization

Standardization scales each input variable separately by subtracting the mean and dividing by the standard deviation to shift the distribution to have a mean of zero and a standard deviation of one.
Formula for standarization: xnew = (xold-μ)/σ

4. Machine Learning Models

  1. Logistic Regression Classifier
  2. Nearest Neighbor Classifier
  3. Support Vector Machines Classifier
  4. Kernel SVM Classifier
  5. Naive Bayes Classifier
  6. Decision Tree Classifier
  7. Random Forest Classifier

5. Metrics

  1. F1 Score
  2. Accuracy Score

Confusion Matrix:
Screenshot (15)

Precision = TP/(TP + FP)
Recall = TP/(TP + FN)
F1 Score = 2*(Precision * Recall)/(Precision + Recall)
Accuracy Score = (TP + TN)/(TP + FP + FN + TN)

6. Result

Accuracy Score:

  1. Logistic Regression — 97.36%
  2. Nearest Neighbor — 94.73%
  3. Support Vector Machines — 95.61%
  4. Kernel SVM — 98.24%
  5. Naive Bayes — 96.49%
  6. Decision Tree Algorithm — 95.61%
  7. Random Forest Classification — 97.36%

F1 Score:

  1. Logistic Regression — 96.47%
  2. Nearest Neighbor — 93.02%
  3. Support Vector Machines — 94.25%
  4. Kernel SVM — 97.61%
  5. Naive Bayes — 95.23%
  6. Decision Tree Algorithm — 93.97%
  7. Random Forest Classification — 96.38%

About

Breast cancer detection using machine learning models.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published