how to get the partitions info of hive table in Spark how to get the partitions info of hive table in Spark hadoop hadoop

how to get the partitions info of hive table in Spark


Assuming that your real goal is to restrict execution of unbounded queries, I think it would be easier to get query's execution plan and look under its FileScan / HiveTableScan leaf nodes to see if any partition filters are being applied. For partitioned tables, number of partitions that query is actually going to scan will also be presented, by the way. So, something like this should do:

scala> val df_unbound = spark.sql("select * from hottab")df_unbound: org.apache.spark.sql.DataFrame = [id: int, descr: string ... 1 more field]scala> val plan1 = df_unbound.queryExecution.executedPlan.toStringplan1: String ="*(1) FileScan parquet default.hottab[id#0,descr#1,loaddate#2] Batched: true, Format: Parquet, Location: CatalogFileIndex[hdfs://ns1/user/hive/warehouse/hottab], PartitionCount: 365, PartitionFilters: [],PushedFilters: [], ReadSchema: struct<id:int,descr:string>"scala> val df_filtered = spark.sql("select * from hottab where loaddate='2019-07-31'")df_filtered: org.apache.spark.sql.DataFrame = [id: int, descr: string ... 1 more field]scala> val plan2 = df_filtered.queryExecution.executedPlan.toStringplan2: String ="*(1) FileScan parquet default.hottab[id#17,descr#18,loaddate#19] Batched: true, Format: Parquet, Location: PrunedInMemoryFileIndex[hdfs://ns1/user/hive/warehouse/hottab/loaddate=2019-07-31], PartitionCount: 1, PartitionFilters: [isnotnull(loaddate#19), (loaddate#19 = 2019-07-31)], PushedFilters: [], ReadSchema: struct<id:int,descr:string>"

This way, you also don't have to deal with SQL parsing to find table name(s) from queries, and to interrogate metastore yourself.

As a bonus, you'll be also able to see if "regular" filter pushdown occurs (for storage formats that support it) in addition to partition pruning.


You can use Scala's Try class and execute show partitions on the required table.

val numPartitions = Try(spark.sql("show partitions database.table").count) match {    case Success(v) => v    case Failure(e) => -1}

Later you can check numPartitions. If the value is -1 then the table is not partitioned.


  val listPartitions = spark.sessionState.catalog.listPartitionNames(TableIdentifier("table_name", Some("db name")))  listPartitions: Seq[String] = ArrayBuffer(partition1=value1, ... )  // partition table  listPartitions: Seq[String] = ArrayBuffer() // not partition table