dask.multiprocessing or pandas + multiprocessing.pool: what's the difference?
There is no difference. Dask is doing just what you are doing in your custom code. It uses pandas and a thread or multiprocessing pool for parallelism.
You might prefer Dask for a few reasons
- It would figure out how to write the parallel algorithms automatically
- You may want to scale to a cluster in the future
But if what you have works well for you then I would just stay with that.