Databricks Exception: Total size of serialized results is bigger than spark.driver.maxResultsSize Databricks Exception: Total size of serialized results is bigger than spark.driver.maxResultsSize azure azure

Databricks Exception: Total size of serialized results is bigger than spark.driver.maxResultsSize


You need to change this parameter in the cluster configuration. Go into the cluster settings, under Advanced select spark and paste spark.driver.maxResultSize 0 (for unlimited) or whatever the value suits you. Using 0 is not recommended. You should optimize the job by re partitioning.


Looks like your driver have a limited size for storing the result and your resulting files have cross the limit,so you can increase the size of result by the following command in your notebook.

sqlContext.getConf("spark.driver.maxResultSize")res19: String = 20g

It gives the current max size of storage capacity as 20 GB, mine

sqlContext.setConf("spark.driver.maxResultSize","30g")

To increase the maxResultSize you can use the above command.

It's not recommended because its reduce the performance size of your cluster because then you have minimize the free space allocate to the temporary files for processing in the cluster.But i think it solved your issue.


You need to increase the maxResultSize value for the cluster.

The maxResultSize must be set BEFORE the cluster is started -- trying to set the maxResultSize in the notebook after the cluster is started will not work.

"Edit" the cluster and set the value in the "Spark Config" section under "Advanced Options".

Here is a screenshot of Configure Cluster for Databricks in AWS, but something similar probably exists for Databricks in Azure.

cluster configuration

In your notebook you can verify that the value is already set by including the following command:

enter image description here

Of course 8g may not be large enough in your case, so keep increasing it until the problem goes away -- or something else blows up! Best of luck.

Note: When I ran into this problem my notebook was attempting to write to S3, not directly trying to "collect" the data, so-to-speak.