pandasUDF and pyarrow 0.15.0 pandasUDF and pyarrow 0.15.0 pandas pandas

pandasUDF and pyarrow 0.15.0


It's not a bug. We made an important protocol change in 0.15.0 that makes the default behavior of pyarrow incompatible with older versions of Arrow in Java -- your Spark environment seems to be using an older version.

Your options are

  • Set the environment variable ARROW_PRE_0_15_IPC_FORMAT=1 from where you are using Python
  • Downgrade to pyarrow < 0.15.0 for now.

Hopefully the Spark community will be able to upgrade to 0.15.0 in Java soon so this issue goes away.

This is discussed in http://arrow.apache.org/blog/2019/10/06/0.15.0-release/


In Spark try the following appendix:

spark-submit --deploy-mode cluster --conf spark.yarn.appExecutorEnv.ARROW_PRE_0_15_IPC_FORMAT=1 --conf spark.yarn.appMasterEnv.ARROW_PRE_0_15_IPC_FORMAT=1 --conf spark.executorEnv.ARROW_PRE_0_15_IPC_FORMAT=1