dask dataframe apply meta dask dataframe apply meta python python

dask dataframe apply meta


meta is the prescription of the names/types of the output from the computation. This is required because apply() is flexible enough that it can produce just about anything from a dataframe. As you can see, if you don't provide a meta, then dask actually computes part of the data, to see what the types should be - which is fine, but you should know it is happening. You can avoid this pre-computation (which can be expensive) and be more explicit when you know what the output should look like, by providing a zero-row version of the output (dataframe or series), or just the types.

The output of your computation is actually a series, so the following is the simplest that works

(dask_df.groupby('Column B')     .apply(len, meta=('int'))).compute()

but more accurate would be

(dask_df.groupby('Column B')     .apply(len, meta=pd.Series(dtype='int', name='Column B')))