python read_fwf error: 'dtype is not supported with python-fwf parser'
Instead of specifying dtypes, specify a converter for the column you want to keep as str, building on @TomAugspurger's example:
from io import StringIOimport pandas as pddata = StringIO(u"""121301234121300123121300012""")pd.read_fwf(data, colspecs=[(0,3),(4,8)], converters = {1: str})
Leads to
\n Unnamed: 10 121 01231 121 00122 121 0001
Converters are a mapping from a column name or index to a function to convert the value in the cell (eg. int would convert them to integer, float to floats, etc)
The documentation is probably incorrect there. I think the same base docstring is used for several readers. As for as a workaround, since you know the widths ahead of time, I think you can prepend the zeros after the fact.
With this file and widths [4, 5]
121301234121300123121300012
we get:
In [38]: df = pd.read_fwf('tst.fwf', widths=[4,5], header=None)In [39]: dfOut[39]: 0 10 1213 12341 1213 1232 1213 12
To fill in the missing zeros, would this work?
In [45]: df[1] = df[1].astype('str')In [53]: df[1] = df[1].apply(lambda x: ''.join(['0'] * (5 - len(x))) + x)In [54]: dfOut[54]: 0 10 1213 012341 1213 001232 1213 00012
The 5 in the lambda above comes from the correct width. You'd need to select out all the columns that need leading zeros and apply the function (with the correct width) to each.
This will work fine after pandas 0.20.2 version.
from io import StringIOimport pandas as pdimport numpy as npdata = StringIO(u"""121301234121300123121300012""")pd.read_fwf(data, colspecs=[(0,3),(4,8)], header = None, dtype = {0: np.str, 1: np.str})
Output:
0 10 NaN NaN1 121 01232 121 00123 121 0001