Details of Stage in Spark Details of Stage in Spark hadoop hadoop

Details of Stage in Spark


[Stage 13:================================> (119 + 8) / 200]

  1. What is Stage 13?

Each Spark job is divided into stages. The job in this case is the saving of a DataFrame as a text file, and "stage 13" is one of the multiple stages for that job.

  1. What is (119+8)/200?

Examining the source code can help answer this:

val bar = stages.map { s =>  val total = s.numTasks()  val header = s"[Stage ${s.stageId()}:"  val tailer = s"(${s.numCompletedTasks()} + ${s.numActiveTasks()}) / $total]"  ...}.mkString("")

Each stage is divided into tasks. 119 is the number of completed tasks for this stage (i.e., stage 13), 8 is the number of active tasks for this stage, and 200 is the total number of tasks for this stage.

[Stage 18:=============>(199 + 1) / 200][Stage 27:============> (173 + 3) / 200]

  1. Here what is the meaning of this line.
  2. Previously only 1 stage was working, but here I can found 2 stages are working. Hence when multiple stage are working in parallel?

Again, looking at the source code (here and here) is useful:

/** ...If multiple stages run in the same time, the status *  of them will be combined together, showed in one line. */...if (stages.length > 0) {  show(now, stages.take(3))  // display at most 3 stages in same time}

The stages run concurrently, so within the window of time during which the progress bar is refreshed, multiple stages may be running. In this case, stages 18 and 27 are running at the same time. The code limits the display to three simultaneously running tasks.