Skip to content

akl21/diagnosing_breast_cancer

Repository files navigation

diagnosing_breast_cancer

An accurate diagnosis of breast cancer is critical to the well-being of the patient. The analysis of data from fine needle aspirate (FNA) images of cell nuclei sampled from benign and malignant breast tumors can be applied to develop a statistical learning model to correctly classify tumors as cancerous or benign, using measurements taken from similar FNA images. The data set used in this study is a cleaned version of the 1993 Street et al. data from the University of Wisconsin, and consists of 569 observations of women with breast tumors. The dependent variable is whether the tumor was malignant or benign, and the 30 features of the data are measures of the shape, size, and texture of the tumor cell nuclei derived from the FNA images.

Past models have achieved an estimated 97.5% accuracy rate for this data set, and the objective of this research is to improve this accuracy rate through the application of several classification techniques. One classification method will be selected as the best through repeated tests on a validation set randomly sampled from the data. Models to be investigated include the logistic regression model, tree methods such as random forests, support vector machines with linear kernels, and k nearest neighbors. Variable selection procedures will be implemented to refine these models and to discover the most important features. Health care professionals can implement the selected model in the R language to better diagnose breast cancer.

About

Using data from fine needle aspirate images of breast tumor cell nuclei to diagnose the tumors as malignant or benign.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages