Spark Scala list folders in directory
We are using hadoop 1.4 and it doesn't have listFiles method so we use listStatus to get directories. It doesn't have recursive option but it is easy to manage recursive lookup.
val fs = FileSystem.get(new Configuration())val status = fs.listStatus(new Path(YOUR_HDFS_PATH))status.foreach(x=> println(x.getPath))
In Spark 2.0+,
import org.apache.hadoop.fs.{FileSystem, Path}val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)fs.listStatus(new Path(s"${hdfs-path}")).filter(_.isDir).map(_.getPath).foreach(println)
Hope this is helpful.
in Ajay Ahujas answer isDir
is deprecated..
use isDirectory
... pls see complete example and output below.
package examples import org.apache.log4j.Level import org.apache.spark.sql.SparkSession object ListHDFSDirectories extends App{ val logger = org.apache.log4j.Logger.getLogger("org") logger.setLevel(Level.WARN) val spark = SparkSession.builder() .appName(this.getClass.getName) .config("spark.master", "local[*]").getOrCreate() val hdfspath = "." // your path here import org.apache.hadoop.fs.{FileSystem, Path} val fs = org.apache.hadoop.fs.FileSystem.get(spark.sparkContext.hadoopConfiguration) fs.listStatus(new Path(s"${hdfspath}")).filter(_.isDirectory).map(_.getPath).foreach(println) }
Result :
file:/Users/user/codebase/myproject/targetfile:/Users/user/codebase/myproject/Relfile:/Users/user/codebase/myproject/spark-warehousefile:/Users/user/codebase/myproject/metastore_dbfile:/Users/user/codebase/myproject/.ideafile:/Users/user/codebase/myproject/src