How to setup Apache Spark to use local hard disk when data does not fit in RAM in local mode?
Look at http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistenceYou can use various persistence models as per your need. MEMORY_AND_DISK is what will solve your problem . If you want a better performance, use MEMORY_AND_DISK_SER which stores data in serialized fashion.