How does glmnet's standardize argument handle dummy variables?

r dataset machine-learning glmnet

In short, yes - this will standardize the dummy variables, but there's a reason for doing so. The glmnet function takes a matrix as an input for its X parameter, not a data frame, so it doesn't make the distinction for factor columns which you may have if the parameter was a data.frame. If you take a look at the R function, glmnet codes the standardize parameter internally as

    isd = as.integer(standardize)

Which converts the R boolean to a 0 or 1 integer to feed to any of the internal FORTRAN functions (elnet, lognet, et. al.)

If you go even further by examining the FORTRAN code (fixed width - old school!), you'll see the following block:

          subroutine standard1 (no,ni,x,y,w,isd,intr,ju,xm,xs,ym,ys,xv,jerr)    989          real x(no,ni),y(no),w(no),xm(ni),xs(ni),xv(ni)                        989          integer ju(ni)                                                        990          real, dimension (:), allocatable :: v                                               allocate(v(1:no),stat=jerr)                                           993          if(jerr.ne.0) return                                                  994          w=w/sum(w)                                                            994          v=sqrt(w)                                                             995          if(intr .ne. 0)goto 10651                                             995          ym=0.0                                                                995          y=v*y                                                                 996          ys=sqrt(dot_product(y,y)-dot_product(v,y)**2)                         996          y=y/ys                                                                997    10660 do 10661 j=1,ni                                                       997          if(ju(j).eq.0)goto 10661                                              997          xm(j)=0.0                                                             997          x(:,j)=v*x(:,j)                                                       998          xv(j)=dot_product(x(:,j),x(:,j))                                      999          if(isd .eq. 0)goto 10681                                              999          xbq=dot_product(v,x(:,j))**2                                          999          vc=xv(j)-xbq                                                         1000          xs(j)=sqrt(vc)                                                       1000          x(:,j)=x(:,j)/xs(j)                                                  1000          xv(j)=1.0+xbq/vc                                                     1001          goto 10691                                                           1002

Take a look at the lines marked 1000 - this is basically applying the standardization formula to the X matrix.

Now statistically speaking, one does not generally standardize categorical variables to retain the interpretability of the estimated regressors. However, as pointed out by Tibshirani here, "The lasso method requires initial standardization of the regressors, so that the penalization scheme is fair to all regressors. For categorical regressors, one codes the regressor with dummy variables and then standardizes the dummy variables" - so while this causes arbitrary scaling between continuous and categorical variables, it's done for equal penalization treatment.

r dataset machine-learning glmnet

glmnet doesn't know anything about dummy variables, because it doesn't have a formula interface (and hence doesn't touch model.frame and model.matrix.) If you want them to be treated specially, you'll have to do it yourself.

CodeHunter

How does glmnet's standardize argument handle dummy variables?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last