Why does pandas apply calculate twice Why does pandas apply calculate twice python python

Why does pandas apply calculate twice


This behavior is intended, as an optimization.

See the docs:

In the current implementation apply calls func twice on the first column/row to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects, as they will take effect twice for the first column/row.


Probably related to this issue. With groupby, the applied function is called one extra time to see if certain optimizations can be done. I'd guess something similar is going on here. It doesn't look like there's any way around it at the moment (although I could be wrong about the source of the behavior you're seeing). Is there a reason you need it to not do that extra call.

Also, calling it four times when you apply on the column is normal. When you get one columnm you get a Series, not a DataFrame. apply on a Series applies the function to each element. Since your column has four elements in it, the function is called four times.


This behavior has been fixed with pandas 1.1, please upgrade!

Now, apply and applymap on DataFrame evaluates first row/column only once.

Initially, we had GroupBy.apply and Series/df.apply evaluating the first group twice. The reason the first group is evaluated twice is because apply wants to know whether it can "optimize" the calculation (sometimes this is possible if apply receives a numpy or cythonized function). With pandas 0.25, this behavior was fixed for GroupBy.apply. Now, with pandas 1.1, this will also be fixed for df.apply.


Old Behavior [pandas <= 1.0.X]

pd.__version__ # '1.0.4'df.apply(mul2)hellohello      a0  2.001  4.002  1.343  2.68

New Behavior [pandas >= 1.1]

pd.__version__# '1.1.0.dev0+2004.g8d10bfb6f'df.apply(mul2)hello      a0  2.001  4.002  1.343  2.68