how to perform an operation one time only at the end of a scalding job? how to perform an operation one time only at the end of a scalding job? hadoop hadoop

how to perform an operation one time only at the end of a scalding job?


The execution order in Scalding job is a bit tricky:

  1. The initializer statements in the Job class are executed and operation tree is built (that connects Pipes, Taps etc.)
  2. The tree is handed off to the optimizer. The execution plan is created
  3. The job starts executing. Hadoop jobs' Map and Reduce steps are kicked off according to the plan
  4. The main program waits for everything to complete and exits.

According to your code, the println statement will execute on step 1.