Import pandas dataframe column as string not int Import pandas dataframe column as string not int python python

Import pandas dataframe column as string not int


Just want to reiterate this will work in pandas >= 0.9.1:

In [2]: read_csv('sample.csv', dtype={'ID': object})Out[2]:                            ID0  000130078548178400166718681  000130078548178400167492512  000130078548178400167546303  000130078548178400167818764  000130078548178400170288245  000130078548178400179632356  00013007854817840018860166

I'm creating an issue about detecting integer overflows also.

EDIT: See resolution here: https://github.com/pydata/pandas/issues/2247

Update as it helps others:

To have all columns as str, one can do this (from the comment):

pd.read_csv('sample.csv', dtype = str)

To have most or selective columns as str, one can do this:

# lst of column names which needs to be stringlst_str_cols = ['prefix', 'serial']# use dictionary comprehension to make dict of dtypesdict_dtypes = {x : 'str'  for x in lst_str_cols}# use dict on dtypespd.read_csv('sample.csv', dtype=dict_dtypes)


This probably isn't the most elegant way to do it, but it gets the job done.

In[1]: import numpy as npIn[2]: import pandas as pdIn[3]: df = pd.DataFrame(np.genfromtxt('/Users/spencerlyon2/Desktop/test.csv', dtype=str)[1:], columns=['ID'])In[4]: dfOut[4]:                        ID0  000130078548178400166718681  000130078548178400167492512  000130078548178400167546303  000130078548178400167818764  000130078548178400170288245  000130078548178400179632356  00013007854817840018860166

Just replace '/Users/spencerlyon2/Desktop/test.csv' with the path to your file


Since pandas 1.0 it became much more straightforward. This will read column 'ID' as dtype 'string':

pd.read_csv('sample.csv',dtype={'ID':'string'})

As we can see in this Getting started guide, 'string' dtype has been introduced (before strings were treated as dtype 'object').