dask dataframe apply meta
meta
is the prescription of the names/types of the output from the computation. This is required because apply()
is flexible enough that it can produce just about anything from a dataframe. As you can see, if you don't provide a meta
, then dask actually computes part of the data, to see what the types should be - which is fine, but you should know it is happening. You can avoid this pre-computation (which can be expensive) and be more explicit when you know what the output should look like, by providing a zero-row version of the output (dataframe or series), or just the types.
The output of your computation is actually a series, so the following is the simplest that works
(dask_df.groupby('Column B') .apply(len, meta=('int'))).compute()
but more accurate would be
(dask_df.groupby('Column B') .apply(len, meta=pd.Series(dtype='int', name='Column B')))