Pandas vs. Numpy Dataframes Pandas vs. Numpy Dataframes pandas pandas

Pandas vs. Numpy Dataframes


pandas focuses on tabular data structures and when doing the operations (addition, subtraction etc.) it looks at the labels - not positions.

Consider the following DataFrame:

df = pd.DataFrame(np.random.randn(5, 3), index=list('abcde'), columns=list('xyz'))

Here, df[1:] is:

df[1:]Out:           x         y         zb  1.003035  0.172960  1.160033c  0.117608 -1.114294 -0.557413d -1.312315  1.171520 -1.034012e -0.380719 -0.422896  1.073535

And df[:-1] is:

df[:-1]Out:           x         y         za  1.367916  1.087607 -0.625777b  1.003035  0.172960  1.160033c  0.117608 -1.114294 -0.557413d -1.312315  1.171520 -1.034012

If you do df[1:] / df[:-1] it will divide row b's by row b's, row c's by row c's and row d's by row d's. For row a and e, it will not be able to find corresponding rows in the other DataFrame (either in the first one or in the second one) so it will return nan:

df[1:] / df[:-1]Out:      x    y    za  NaN  NaN  NaNb  1.0  1.0  1.0c  1.0  1.0  1.0d  1.0  1.0  1.0e  NaN  NaN  NaN

If you just want to do element-wise division ignoring the labels, accessing the underlying numpy array by .values for one of the frames is a way of telling pandas to ignore labels. Since numpy arrays don't have labels, pandas will just do element-wise operations:

df[1:]/df[:-1].valuesOut:            x         y         zb   0.733258  0.159028 -1.853749c   0.117252 -6.442482 -0.480515d -11.158359 -1.051357  1.855018e   0.290112 -0.360981 -1.038223