Lasso Regression

Regularized regression of a forest fire data set supplied by Cortez and Morais (2007)

Authors: Patrik Mirzai and Huixin Zhong

This project aims at predicting the burned area of wildfires using Lasso regression. Moreover, a comparison to multiple regression and regression trees is also carried out. A summary of the Lasso procedure is given below. See the attached source code "Lasso implementation of wildfires data set.R" for full details on the project.

Code used for the Lasso implementation

#Upload packages
library(readxl)  #For latex output (optional)
library(glmnet)  #For Lasso regression

df = read.table('forestfires.csv', sep=",", header = T)
df$area = log(df$area+1)  #Transform the variables

set.seed(2)

Let's divide the data into a train- and test set

#Index for our train data
train_index = sample(1:nrow(df), size = nrow(df)*0.7, replace = F)

#Select train and test set
train_data = df[train_index,]
test_data = df[-train_index,]

#Model matrix for train data
x = model.matrix(area~., train_data)[,-1]
y = train_data$area

Let's plot the coefficients against the L1 norm

#Plotting the coefficients against different values of lambda
fit = glmnet(x, y)
plot(fit)

Now let's choose the tuning parameter lambda through cross-validation

#Create a sequence of our tuninig parameter used in the cross validation
lambda_seq = 10^seq(2, -2, by = -.1)

#Train model with different tuning parameters
set.seed(2)
cv_output = cv.glmnet(x, y, alpha = 1, lambda = lambda_seq, type.measure="mse")

#Cross validation plot
plot(cv_output)
best_lam = cv_output$lambda.min

#Fit lasso model again with the best lambda
best_lasso = glmnet(x, y, alpha = 1, lambda = best_lam)

coef(best_lasso) #Get coefficients

The plot displays the mean squared error using 10-fold cross validation

Finally, let's compute the mean squared error of the test data

#Predicting
x_test = model.matrix(area~., test_data)[,-1]
pred = predict(best_lasso, x_test)

actual_test = test_data$area
mse = mean((actual_test - pred)^2)  #mse is 2.049039

References

Cortez, P. and Morais, A. (2007), ‘A Data Mining Approach to Predict Forest Fires using Meteorological Data’, New Trends in Artificial Intelligence, Proceedings of the 13th EPIA 2007 - Portuguese Conference on Artificial Intelligence pp. 512–523.

Friedman, J., Hastie, T., Höfling, H. and Tibshirani, R. (2007), ‘Pathwise Coordinate Optimization’, The Annals of Applied Statistics 1(2), 302–332.

Friedman, J., Hastie, T. and Tibshirani, R. (2009), The Elements of Statistical Learning, second edn, New York: Springer Verlag

Friedman, J., Hastie, T. and Tibshirani, R. (2010), ‘Regularized Paths for Generalized Linear Models Via Coordinate Descent’, Journal of Statistical Software 33, 1–22.

Hastie, T., Tibshirani, R. and Wainwright, M. (2015), Statistical Learning with Sparsity: The Lasso and Generalizations, first edn, Chapman & Hall/CRC.

Tibshirani, R. (1996), ‘Regression Shrinkage and Selection via the Lasso’, Journal of the Royal Statistical Society. Series B (Methodological) 58(1), 267–288.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
Lasso Project		Lasso Project
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lasso Project

Lasso Project

README.md

README.md

Repository files navigation

Lasso Regression

Authors: Patrik Mirzai and Huixin Zhong

Code used for the Lasso implementation

Let's divide the data into a train- and test set

Let's plot the coefficients against the L1 norm

Now let's choose the tuning parameter lambda through cross-validation

The plot displays the mean squared error using 10-fold cross validation

Finally, let's compute the mean squared error of the test data

References

About

Releases

Packages

Languages

mirzaipatrik/Lasso-regression

Folders and files

Latest commit

History

Lasso Project

Lasso Project

README.md

README.md

Repository files navigation

Lasso Regression

Authors: Patrik Mirzai and Huixin Zhong

Code used for the Lasso implementation

Let's divide the data into a train- and test set

Let's plot the coefficients against the L1 norm

Now let's choose the tuning parameter lambda through cross-validation

The plot displays the mean squared error using 10-fold cross validation

Finally, let's compute the mean squared error of the test data

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages