Skip to content

ihnguyen/nutrition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Prediction and Classification of Sleep

The purpose of this study is to predict sleeping hours and classify sleep trouble with study, demographic, physical measurement, health, and lifestyle variables based on 5,000 American participants’ health and nutrition examination surveys since early 1960’s data.

Predicting sleeping hours with Multiple Linear Regression and Decision Tree (Regression Tree)

Multiple Linear Regression

For the multiple regression model, an increase in the age decade is associated with a decrease in number of sleeping hours with decade 50-59 with the largest decrease. Out of Hispanic, Mexican, White and other races, Mexican race is associated with an increase of sleeping hours versus other race which has the least hours. Based on education, participants who are in high school are associated with a a decrease in sleeping hours versus other education groups. Those with lower household income are associated with an increase in sleeping hours compared to those with higher household income. Those with poor general health are associated with a decrease in sleeping hours compared to those with very good general health. Those who use the computer more than four hours a day are shown to be associated with an increase in sleeping hours than those less than four hours. Those who smoke at least 100 cigarettes a year are associated with a decrease in sleeping hours than others.

Decision Tree (Regression Tree)

In the regression tree model, the two main predictors that explain number of sleeping hours in a weekday/workday night are number of days of poor mental health and age decade. Those with 5.5 or more days of poor mental health are predicted to have 6.2 hours of sleep. Those with less than 5.5 days of poor mental health and belong to age range 20-69 are predicted to have 6.9 hours of sleep. Those with less than 5.5 days of poor mental health and are older than 69 are predicted to have 7.3 hours of sleep.

Figure 1: Decision Tree

decision_tree

When comparing the models, 7% of the variability observed in the number of hours of sleep a weekday or workday night is explained by the multiple regression model whereas only 3% of the variability observed in the number of hours of sleep a weekday or workday night is explained by the decision tree model. Since linear model has the lowest RMSE of 1.29 and MAE of 1.01, the multiple linear regression model is the best at predicting number of sleeping hours per weekday/workday night in the NHANES data set.

Table 1a: Evaluation of Null Model

null2

Table 1b: Evaluation of Multiple Linear Regression Model

mlr2

Table 1c: Evaluation of Regression Tree Model

regression_tree2

Classifying sleep trouble with Logistic Regression, kNN, C5.0, Random Forests, Naive Bayes

Figure 2: Missingness

missingness

The logistic model has a 76% accuracy and 70.8% ROC AUC value. k-NN model has a 89.3% accuracy and 94.0% ROC AUC value. C5.0 model has a 88.0% accuracy and 91.6% ROC AUC value. Random forests model has a 86.3% accuracy and 91.3% ROC AUC value. Naive Bayes model has a 74.6% accuracy and 68.9% ROC AUC value. When comparing the models, kNN performed the best out of the four models with the best ROC AUC value and highest accuracy.

Figure 3: ROC AUC Plot - Model Comparison

rocauc

Table 2: ROC AUC Values - Model Comparison

roc_auc_models_all2

Conclusion

When predicting sleeping hours, the best model for determining the number of hours a participant sleeps is the multiple linear regression model. The most significant predictors are age decade, race, education, household income, general health, computer/gaming usage, and smoking at least 100 cigarettes. As for classifying sleep trouble, the best model for determining if a participant has sleep trouble or not is the kNN model.

How to improve models and model performance?

  • C5.0 by boosting
  • Decision tree by cost or trials
  • Logistic and Linear Regression by regularization
  • Random Forests by tuning
  • Naive Bayes by laplace
  • Cross-Validation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages