ALS.checkpointInterval and SparkContext.setCheckpointDir
SparkContext.setCheckpointDir
is used to set global checkpoint directory. It is not in limited to ALS
or any other specific algorithm but it is required for RDD.checkpoint
to work.
ALS.checkpointInterval
is an algorithm specific property and doesn't affect any global settings. From ML docs:
Param for set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations.
Putting this two things together:
- this two thing work in a completely different context and have different consequences
both are required for proper checkpointing in
ALS
. If checkpoint directory is not set ALS won't checkpoint even if checkpoint interval is set:val shouldCheckpoint: Int => Boolean = (iter) => sc.checkpointDir.isDefined && checkpointInterval != -1 && (iter % checkpointInterval == 0)