Spark Scala list folders in directory Spark Scala list folders in directory hadoop hadoop

Spark Scala list folders in directory


We are using hadoop 1.4 and it doesn't have listFiles method so we use listStatus to get directories. It doesn't have recursive option but it is easy to manage recursive lookup.

val fs = FileSystem.get(new Configuration())val status = fs.listStatus(new Path(YOUR_HDFS_PATH))status.foreach(x=> println(x.getPath))


In Spark 2.0+,

import org.apache.hadoop.fs.{FileSystem, Path}val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)fs.listStatus(new Path(s"${hdfs-path}")).filter(_.isDir).map(_.getPath).foreach(println)

Hope this is helpful.


in Ajay Ahujas answer isDir is deprecated..

use isDirectory... pls see complete example and output below.

package examples    import org.apache.log4j.Level    import org.apache.spark.sql.SparkSession    object ListHDFSDirectories  extends  App{      val logger = org.apache.log4j.Logger.getLogger("org")      logger.setLevel(Level.WARN)      val spark = SparkSession.builder()        .appName(this.getClass.getName)        .config("spark.master", "local[*]").getOrCreate()      val hdfspath = "." // your path here      import org.apache.hadoop.fs.{FileSystem, Path}      val fs = org.apache.hadoop.fs.FileSystem.get(spark.sparkContext.hadoopConfiguration)      fs.listStatus(new Path(s"${hdfspath}")).filter(_.isDirectory).map(_.getPath).foreach(println)    }

Result :

file:/Users/user/codebase/myproject/targetfile:/Users/user/codebase/myproject/Relfile:/Users/user/codebase/myproject/spark-warehousefile:/Users/user/codebase/myproject/metastore_dbfile:/Users/user/codebase/myproject/.ideafile:/Users/user/codebase/myproject/src