This notebook looks into using various Python-based machine learning and data science libraries in an attempt to build a machine-learning model capable of predicting whether or not someone has heart disease based on their medical attributes.
We're going to take the following approach:
- Problem definition
- Data
- Evaluation
- Features
- Modelling
- Experimentation
In a statement,
Given clinical parameters about a patient, can we predict whether or not they have heart disease?
The original data came from the Cleavland data from UCI Machine Learning Repository. https://archive.ics.uci.edu/ml/datasets/heart+disease
There is also a version of it available on Kaggle. https://www.kaggle.com/ronitf/heart-disease-uci
If we can 95% accuracy at predicting whether or not a patient has heart disease during the proof of concept, we'll pursue the project.
This is where you'll get different information about each of the features in your data.
Data dictionary
- age. The age of the patient.
- sex. The gender of the patient. (1 = male, 0 = female).
- cp. Type of chest pain. (1 = typical angina, 2 = atypical angina, 3 = non — anginal pain, 4 = asymptotic).
- trestbps. Resting blood pressure in mmHg.
- chol. Serum Cholestero in mg/dl.
- fbs. Fasting Blood Sugar. (1 = fasting blood sugar is more than 120mg/dl, 0 = otherwise).
- restecg. Resting ElectroCardioGraphic results (0 = normal, 1 = ST-T wave abnormality, 2 = left ventricular hyperthrophy).
- thalach. Max heart rate achieved.
- exang. Exercise induced angina (1 = yes, 0 = no).
- oldpeak. ST depression induced by exercise relative to rest.
- slope. Peak exercise ST segment (1 = upsloping, 2 = flat, 3 = downsloping).
- ca. Number of major vessels (0–3) colored by flourosopy.
- thal. Thalassemia (3 = normal, 6 = fixed defect, 7 = reversible defect).
- num. Diagnosis of heart disease (0 = absence, 1, 2, 3, 4 = present).
Every Figure will be static and dynamic. For static plots Matplotlib is used, For dynamic plots Plotly is used