Skip to content

This is a End-to-End Classification Machine Learning Project, to classify Heart Disease.

Notifications You must be signed in to change notification settings

Saksham093/ML-Project-Heart-Disease-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Introduction to the Project

This is a introductory project of Machine Learning and Data Science concepts by exploring the problem of heart disease classification.

It is intended to be an end-to-end example of what a Data Science and Machine Learning proof of concept might look like.

we'll look at the following topics.

  • Exploratory data analysis (EDA) - the process of reviewing a dataset and finding out more about it.
  • Model training - create a model(s) to learn to predict a target variable based on other variables.
  • Model evaluation - evaluating a models predictions using problem-specific evaluation metrics.
  • Model comparison - comparing several different models to find the best one.
  • Model fine-tuning - once we've found a good model, how can we improve it?
  • Feature importance - since we're predicting the presence of heart disease, are there some more important things for prediction?
  • Cross validation - if we build a good model, can we be sure it will work on unseen data?
  • Reporting what we've found - if we had to present our work, what would we show someone?

To work through these topics, we'll use Pandas, Matplotlib, and NumPy for Data Analysis, as well as, Scikit-Learn for Machine Learning and modeling tasks.

Data Source

we've downloaded it in a formatted way from Kaggle. https://www.kaggle.com/datasets/johnsmith88/heart-disease-dataset

Heart Disease Data Dictionary The following are the features we'll use to predict our target variable (heart disease or no heart disease).

  1. age - age in years
  2. sex - (1 = male; 0 = female)
  3. cp - chest pain type
    • 0: Typical angina: chest pain related decrease blood supply to the heart
    • 1: Atypical angina: chest pain not related to heart
    • 2: Non-anginal pain: typically esophageal spasms (non heart related)
    • 3: Asymptomatic: chest pain not showing signs of disease
  4. trestbps - resting blood pressure (in mm Hg on admission to the hospital)
    • anything above 130-140 is typically cause for concern
  5. chol - serum cholestoral in mg/dl
    • serum = LDL + HDL + .2 * triglycerides
    • above 200 is cause for concern
  6. fbs - (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
    • '>126' mg/dL signals diabetes
  7. restecg - resting electrocardiographic results 0. : Nothing to note
    1. : ST-T Wave abnormality
      • can range from mild symptoms to severe problems
      • signals non-normal heart beat
    2. : Possible or definite left ventricular hypertrophy
      • Enlarged heart's main pumping chamber
  8. thalach - maximum heart rate achieved
  9. exang - exercise induced angina (1 = yes; 0 = no)
  10. oldpeak - ST depression induced by exercise relative to rest
    • looks at stress of heart during excercise
    • unhealthy heart will stress more
  11. slope - the slope of the peak exercise ST segment 0. : Upsloping: better heart rate with excercise (uncommon)
    1. : Flatsloping: minimal change (typical healthy heart)
    2. : Downslopins: signs of unhealthy heart
  12. ca - number of major vessels (0-3) colored by flourosopy
    • colored vessel means the doctor can see the blood passing through
    • the more blood movement the better (no clots)
  13. thal - thalium stress result
    • 1,3: normal
    • 6: fixed defect: used to be defect but ok now
    • 7: reversable defect: no proper blood movement when excercising
  14. target - have disease or not (1=yes, 0=no) (= the predicted attribute) Note: No personal identifiable information (PPI) can be found in the dataset.

About

This is a End-to-End Classification Machine Learning Project, to classify Heart Disease.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published