Pandas vs. Numpy Dataframes
pandas focuses on tabular data structures and when doing the operations (addition, subtraction etc.) it looks at the labels - not positions.
Consider the following DataFrame:
df = pd.DataFrame(np.random.randn(5, 3), index=list('abcde'), columns=list('xyz'))
Here, df[1:]
is:
df[1:]Out: x y zb 1.003035 0.172960 1.160033c 0.117608 -1.114294 -0.557413d -1.312315 1.171520 -1.034012e -0.380719 -0.422896 1.073535
And df[:-1]
is:
df[:-1]Out: x y za 1.367916 1.087607 -0.625777b 1.003035 0.172960 1.160033c 0.117608 -1.114294 -0.557413d -1.312315 1.171520 -1.034012
If you do df[1:] / df[:-1]
it will divide row b
's by row b
's, row c
's by row c
's and row d
's by row d
's. For row a
and e
, it will not be able to find corresponding rows in the other DataFrame (either in the first one or in the second one) so it will return nan
:
df[1:] / df[:-1]Out: x y za NaN NaN NaNb 1.0 1.0 1.0c 1.0 1.0 1.0d 1.0 1.0 1.0e NaN NaN NaN
If you just want to do element-wise division ignoring the labels, accessing the underlying numpy array by .values
for one of the frames is a way of telling pandas to ignore labels. Since numpy arrays don't have labels, pandas will just do element-wise operations:
df[1:]/df[:-1].valuesOut: x y zb 0.733258 0.159028 -1.853749c 0.117252 -6.442482 -0.480515d -11.158359 -1.051357 1.855018e 0.290112 -0.360981 -1.038223