how to get pandas get_dummies to emit N-1 variables to avoid collinearity?

python pandas machine-learning dummy-variable

Pandas version 0.18.0 implemented exactly what you're looking for: the drop_first option. Here's an example:

In [1]: import pandas as pdIn [2]: pd.__version__Out[2]: u'0.18.1'In [3]: s = pd.Series(list('abcbacb'))In [4]: pd.get_dummies(s, drop_first=True)Out[4]:      b    c0  0.0  0.01  1.0  0.02  0.0  1.03  1.0  0.04  0.0  0.05  0.0  1.06  1.0  0.0

python pandas machine-learning dummy-variable

There are a number of ways of doing so.

Possibly the simplest is replacing one of the values by None before calling get_dummies. Say you have:

import pandas as pdimport numpy as nps = pd.Series(list('babca'))>> s0    b1    a2    b3    c4    a

Then use:

>> pd.get_dummies(np.where(s == s.unique()[0], None, s))    a   c0   0   01   1   02   0   03   0   14   1   0

to drop b.

(Of course, you need to consider if your category column doesn't already contain None.)

Another way is to use the prefix argument to get_dummies:

pandas.get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False)
prefix: string, list of strings, or dict of strings, default None - String to append DataFrame column names Pass a list with length equal to the number of columns when calling get_dummies on a DataFrame. Alternativly, prefix can be a dictionary mapping column names to prefixes.

This will append some prefix to all of the resulting columns, and you can then erase one of the columns with this prefix (just make it unique).

CodeHunter

how to get pandas get_dummies to emit N-1 variables to avoid collinearity?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last