Job 65 cancelled because SparkContext was shut down

apache-spark hadoop pyspark apache-spark-sql apache-zeppelin

Your job is getting aborted at the write step. Job aborted. is the exception message for that, which is leading to the Spark Context being shutdown.

Look into optimising the write step, maxRecordsPerFile might be the culprit; maybe try a lower number.. you currently have 1M records in a file!

In general, Job ${job.jobId} cancelled because SparkContext was shut down just means that it's an exception due to which the DAG couldn't continue and needs to Error out. Its the Spark scheduler throwing this error when it faces an exception, it might be an exception that is unhandled in your code or a job failure due to any other reason. And as the DAG scheduler is stopped, the entire application will get stopped(this message is part of Cleanup).

To your questions -

When a SparkContext shuts down, does that mean my bridge to the Spark cluster is down?

SparkContext represents the connection to a Spark cluster, so if its dead it means you can't run run job on to it as you lost the link! On Zepplin, you can just restart the SparkContext (Menu -> Interpreter -> Spark Interpreter -> restart)

And, if that's the case, how can I cause the bridge to the spark cluster to go down?

With SparkException/Error in Jobs or manually using sc.stop()

CodeHunter

Job 65 cancelled because SparkContext was shut down

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last