LASSO with $\lambda = 0$ and OLS produce different results in R glmnet

r least-squares lasso-regression lm

You're using the function wrong. The x should be the model matrix. Not the raw predictor value. When you do that, you get the exact same results:

x <- rnorm(500)y <- rnorm(500)mod1 <- lm(y ~ x) xmm <- model.matrix(mod1)mod2 <- glmnet(xmm, y, alpha=1, lambda=0)coef(mod1)coef(mod2)

r least-squares lasso-regression lm

I had the same problem, asked around to no avail, then I emailed the package maintainer (Trevor Hastie) who gave the answer. The problem occurs when series are highly correlated. The solution is to decrease the threshold in the glmnet() function call (rather than via glmnet.control()). The code below uses the built-in dataset EuStockMarkets and applies a VAR with lambda=0. For XSMI, the OLS coefficient is below 1, the default glmnet coefficient is above 1 with a difference of about 0.03, and the glmnet coefficient with thresh=1e-14 is very close to the OLS coefficient (a difference of 1.8e-7).

# Use built-in panel data with integrated seriesdata("EuStockMarkets")selected_market <- 2# Take logs for good measureEuStockMarkets <- log(EuStockMarkets)# Get dimensionsnum_entities <- dim(EuStockMarkets)[2]num_observations <- dim(EuStockMarkets)[1]# Build the response with the most recent observations at the topY <- as.matrix(EuStockMarkets[num_observations:2, selected_market])X <- as.matrix(EuStockMarkets[(num_observations - 1):1, ])# Run OLS, which adds an intercept by defaultols <- lm(Y ~ X)ols_coef <- coef(ols)# run glmnet with lambda = 0fit <- glmnet(y = Y, x = X, lambda = 0)lasso_coef <- coef(fit)# run again, but with a stricter thresholdfit_threshold <- glmnet(y = Y, x = X, lambda = 0, thresh = 1e-14)lasso_threshold_coef <- coef(fit_threshold)# build a dataframe to compare the two approachescomparison <- data.frame(ols = ols_coef,                         lasso = lasso_coef[1:length(lasso_coef)],                         lasso_threshold = lasso_threshold_coef[1:length(lasso_threshold_coef)])comparison$difference <- comparison$ols - comparison$lassocomparison$difference_threshold <- comparison$ols - comparison$lasso_threshold# Show the two values for the autoregressive parameter and their differencecomparison[1 + selected_market, ]

R returns:

           ols    lasso lasso_threshold  difference difference_thresholdXSMI 0.9951249 1.022945       0.9951248 -0.02782045         1.796699e-07

r least-squares lasso-regression lm

I have run with the "prostate" example dataset of Hastie's book the next code:

out.lin1 = lm( lpsa ~ . , data=yy ) out.lin1$coeff             out.lin2 = glmnet( as.matrix(yy[ , -9]), yy$lpsa, family="gaussian", lambda=0, standardize=T  ) coefficients(out.lin2)

and the result of the coefficients are similar. When we use the standardize option the returned coefficients by glmnet() are in the original units of the input variables.Please, check you are using the "gaussian" family

CodeHunter

LASSO with $\lambda = 0$ and OLS produce different results in R glmnet

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last