Keep same dummy variable in training and testing data

python dataframe scikit-learn prediction dummy-variable

You can also just get the missing columns and add them to the test dataset:

# Get missing columns in the training testmissing_cols = set( train.columns ) - set( test.columns )# Add a missing column in test set with default value equal to 0for c in missing_cols:    test[c] = 0# Ensure the order of column in the test set is in the same order than in train settest = test[train.columns]

This code also ensure that column resulting from category in the test dataset but not present in the training dataset will be removed

python dataframe scikit-learn prediction dummy-variable

Assume you have identical feature's names in train and test dataset. You can generate concatenated dataset from train and test, get dummies from concatenated dataset and split it to train and test back.

You can do it this way:

import pandas as pdtrain = pd.DataFrame(data = [['a', 123, 'ab'], ['b', 234, 'bc']],                     columns=['col1', 'col2', 'col3'])test = pd.DataFrame(data = [['c', 345, 'ab'], ['b', 456, 'ab']],                     columns=['col1', 'col2', 'col3'])train_objs_num = len(train)dataset = pd.concat(objs=[train, test], axis=0)dataset_preprocessed = pd.get_dummies(dataset)train_preprocessed = dataset_preprocessed[:train_objs_num]test_preprocessed = dataset_preprocessed[train_objs_num:]

In result, you have equal number of features for train and test dataset.

python dataframe scikit-learn prediction dummy-variable

train2,test2 = train.align(test, join='outer', axis=1, fill_value=0)

train2 and test2 have the same columns. Fill_value indicates the value to use for missing columns.

CodeHunter

Keep same dummy variable in training and testing data

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last