Here is my machine learning project on using various methods to predict wine quality and wine type based on physiochemical measurements.
git clone https://github.com/erictleung/ml-final-proj.git
make report
The data comes from the University of California Irvine Machine Learning Repository and can be found at the Wine Quality Data Set.
The data has two datasets: one related to red wine, another is for white wine. Each type of wine is from Portugal.
The data includes eleven input variables (such as citric acid content and pH) and there is one output variable on quality, which is on a scale between zero and ten.
- Putting the data together, can we distinguish between white and red wine?
- Can we predict perceived wine quality based on the input variables?
- Are there any variables that contain redundant information? (In other words, are there any correlative variables?)
- What variables are most important in predicting perceived wine quality?
.
├── Makefile
├── README.md
├── bin
│ ├── decision-trees.R
│ ├── naive-bayes.R
│ ├── splitdf.R
│ └── svm.R
└── report
├── leung-final-report.Rmd
└── refs.bib
2 directories, 8 files