pandas data frame transform INT64 columns to boolean pandas data frame transform INT64 columns to boolean numpy numpy

pandas data frame transform INT64 columns to boolean


df['column_name'] = df['column_name'].astype('bool')

For example:

import pandas as pdimport numpy as npdf = pd.DataFrame(np.random.random_integers(0,1,size=5),                   columns=['foo'])print(df)#    foo# 0    0# 1    1# 2    0# 3    1# 4    1df['foo'] = df['foo'].astype('bool')print(df)

yields

     foo0  False1   True2  False3   True4   True

Given a list of column_names, you could convert multiple columns to bool dtype using:

df[column_names] = df[column_names].astype(bool)

If you don't have a list of column names, but wish to convert, say, all numeric columns, then you could use

column_names = df.select_dtypes(include=[np.number]).columnsdf[column_names] = df[column_names].astype(bool)


Reference: Stack Overflow unutbu (Jan 9 at 13:25), BrenBarn (Sep 18 2017)

I had numerical columns like age and ID which I did not want to convert to Boolean. So after identifying the numerical columns like unutbu showed us, I filtered out the columns which had a maximum more than 1.

# code as per unutbucolumn_names = df.select_dtypes(include=[np.number]).columns # re-extracting the columns of numerical type (using awesome np.number1 :)) then getting the max of those and storing them in a temporary variable m.m=df[df.select_dtypes(include=[np.number]).columns].max().reset_index(name='max')# I then did a filter like BrenBarn showed in another post to extract the rows which had the max == 1 and stored it in a temporary variable n.n=m.loc[m['max']==1, 'max']# I then extracted the indexes of the rows from n and stored them in temporary variable p.# These indexes are the same as the indexes from my original dataframe 'df'.p=column_names[n.index]# I then used the final piece of the code from unutbu calling the indexes of the rows which had the max == 1 as stored in my variable p.# If I used column_names directly instead of p, all my numerical columns would turn into Booleans.df[p] = df[p].astype(bool)