dask: difference between client.persist and client.compute

python dask

Relevant doc page is here: http://distributed.readthedocs.io/en/latest/manage-computation.html#dask-collections-to-futures

As you say, both Client.compute and Client.persist take lazy Dask collections and start them running on the cluster. They differ in what they return.

Client.persist returns a copy for each of the dask collections with their previously-lazy computations now submitted to run on the cluster. The task graphs of these collections now just point to the currently running Future objects.
So if you persist a dask dataframe with 100 partitions you get backa dask dataframe with 100 partitions, with each partition pointing toa future currently running on the cluster.
Client.compute returns a single Future for each collection. This future refers to a single Python object result collected on one worker. This typically used for small results.
So if you compute a dask.dataframe with 100 partitions you get back a Future pointing to a single Pandas dataframe that holds all of the data

More pragmatically, I recommend using persist when your result is large and needs to be spread among many computers and using compute when your result is small and you want it on just one computer.

In practice I rarely use Client.compute, preferring instead to use persist for intermediate staging and dask.compute to pull down final results.

df = dd.read_csv('...')df = df[df.name == 'alice']df = df.persist()  # compute up to here, keep results in memory>>> df.value.max().compute()100>>> df.value.min().compute()0

When using delayed

Delayed objects only have one "partition" regardless, so compute and persist are more interchangble. Persist will give you back a lazy dask.delayed object while compute will give you back an immediate Future object.

CodeHunter

dask: difference between client.persist and client.compute

When using delayed

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last