Managing worker memory on a dask localcluster Managing worker memory on a dask localcluster pandas pandas

Managing worker memory on a dask localcluster


The argument memory_limit can be provided to the __init()__ functions of Client and LocalCluster.

general remarks

Just calling Client() is a shortcut for first calling LocalCluster() and, then, Client with the created cluster (Dask: Single Machine). When Client is called without an instance of LocalCluster, all possible arguments of LocalCluster.__init()__ can be provided to the initialization call of Client. Therefore, the argument memory_limit (and other arguments such as n_workers) are not documented in the API documentation of the Client class.

However, the argument memory_limit does not seem to be properly documented in the API documentation of LocalCluster (see Dask GitHub Issue #4118).

solution

A working example would be the following. I added some more arguments, which might be useful for people finding this question/answer.

# load/import classesfrom dask.distributed import Client, LocalCluster# set up cluster and workerscluster = LocalCluster(n_workers=4,                        threads_per_worker=1,                       memory_limit='64GB')client = Client(cluster)# have a look at your workersclient# do some work## ... # close workers and clusterclient.close()cluster.close()

The shortcut would be

# load/import classesfrom dask.distributed import Client# set up cluster and workersclient = Client(n_workers=4,                 threads_per_worker=1,                memory_limit='64GB')# have a look at your workersclient# do some work## ... # close workers and clusterclient.close()

further reading