When to apply(pd.to_numeric) and when to astype(np.float64) in python? When to apply(pd.to_numeric) and when to astype(np.float64) in python? pandas pandas

When to apply(pd.to_numeric) and when to astype(np.float64) in python?


If you already have numeric dtypes (int8|16|32|64,float64,boolean) you can convert it to another "numeric" dtype using Pandas .astype() method.

Demo:

In [90]: df = pd.DataFrame(np.random.randint(10**5,10**7,(5,3)),columns=list('abc'), dtype=np.int64)In [91]: dfOut[91]:         a        b        c0  9059440  9590567  20769181  5861102  4566089  19473232  6636568   162770  24879913  6794572  5236903  56287794   470121  4044395  4546794In [92]: df.dtypesOut[92]:a    int64b    int64c    int64dtype: objectIn [93]: df['a'] = df['a'].astype(float)In [94]: df.dtypesOut[94]:a    float64b      int64c      int64dtype: object

It won't work for object (string) dtypes, that can't be converted to numbers:

In [95]: df.loc[1, 'b'] = 'XXXXXX'In [96]: dfOut[96]:           a        b        c0  9059440.0  9590567  20769181  5861102.0   XXXXXX  19473232  6636568.0   162770  24879913  6794572.0  5236903  56287794   470121.0  4044395  4546794In [97]: df.dtypesOut[97]:a    float64b     objectc      int64dtype: objectIn [98]: df['b'].astype(float)...skipped...ValueError: could not convert string to float: 'XXXXXX'

So here we want to use pd.to_numeric() method:

In [99]: df['b'] = pd.to_numeric(df['b'], errors='coerce')In [100]: dfOut[100]:           a          b        c0  9059440.0  9590567.0  20769181  5861102.0        NaN  19473232  6636568.0   162770.0  24879913  6794572.0  5236903.0  56287794   470121.0  4044395.0  4546794In [101]: df.dtypesOut[101]:a    float64b    float64c      int64dtype: object


I don't have a technical explanation for this but, I have noticed that pd.to_numeric() raises the following error when converting the string 'nan':

In [10]: df = pd.DataFrame({'value': 'nan'}, index=[0])In [11]: pd.to_numeric(df.value)Traceback (most recent call last):  File "<ipython-input-11-98729d13e45c>", line 1, in <module>    pd.to_numeric(df.value)  File "C:\Users\joshua.lee\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\tools\numeric.py", line 133, in to_numeric    coerce_numeric=coerce_numeric)  File "pandas/_libs/src\inference.pyx", line 1185, in pandas._libs.lib.maybe_convert_numericValueError: Unable to parse string "nan" at position 0

whereas astype(float) does not:

df.value.astype(float)Out[12]: 0   NaNName: value, dtype: float64


You can use this:

pd.to_numeric(df.valueerrors='coerce').fillna(0, downcast='infer')  

It will use zero in place of nan.