dask.multiprocessing or pandas + multiprocessing.pool: what's the difference? dask.multiprocessing or pandas + multiprocessing.pool: what's the difference? pandas pandas

dask.multiprocessing or pandas + multiprocessing.pool: what's the difference?


There is no difference. Dask is doing just what you are doing in your custom code. It uses pandas and a thread or multiprocessing pool for parallelism.

You might prefer Dask for a few reasons

  1. It would figure out how to write the parallel algorithms automatically
  2. You may want to scale to a cluster in the future

But if what you have works well for you then I would just stay with that.