In this project, we compared two implementations of the Lasso (L1) regression algorithm:
- the R function cv.glmnet() in the glmnet package,
- the function LassoWithSGD() in MLlib.
Our objective was to compare the two algorithms in terms of:
- usability,
- performance,
- accuracy.
The following files are part of this project:
File | Content |
---|---|
CC.R | The main R script that both creates the input files and analyses each file with the cv.glmnet() function. |
properties.R | A properties file that controls the latin hypercube sampling. |
workingDir.R | This file is not submitted because it contains the hardcoded path variables of the local user. |
ScalaShell.scala | A scala script file that can be run in the Spark shell. |
CompareResultsLasso_CompareSparkvsR.csv | A csv file with the combined results of the R and Spark simulations. |
analyzeResults.R | A script to analyze the above scv file. |
Spark_versus_R.pdf | A poster that summerizes the results of the project. |