Python & Pandas - pd.Series difference between int32 and int64
They're semantically different in that in the first version you pass a dict with a single scalar value so the dtype becomes int64
, for the second, you pass a range
which can be trvially converted to a numpy array and this is int32
:
In[57]:np.array(range(6)).dtypeOut[57]: dtype('int32')
So the construction of the pandas series
involves a dtype matching in the first instance and none for the second because it's convertible to a numpy array and numpy has determined that int32
is preferred in this case
update
It looks like this is dependant on your numpy
version and maybe pandas
version.I'm running python 3.6, numpy 1.12.1 and pandas 0.20.3 and I get the above result. I'm also running Windows 7 64-bit
@jeremycg is running pandas 0.19.2
and numpy
1.11.2 and observes the same result whilst @coldspeed is running numpy
1.13.1 and observes int64
.
The takeaway from this that the dtype
will largely be determined by what numpy
does.
I believe that this line is what is called when we pass range
in this case.
subarr = np.array(arr, dtype=object, copy=copy)
The returned type is determined by numpy
and OS, in my case windows has defined a C Long as being 32-bits. See related: numpy array dtype is coming as int32 by default in a windows 10 64 bit machine