how to get the partitions info of hive table in Spark

apache-spark hadoop hive-metastore

Assuming that your real goal is to restrict execution of unbounded queries, I think it would be easier to get query's execution plan and look under its FileScan / HiveTableScan leaf nodes to see if any partition filters are being applied. For partitioned tables, number of partitions that query is actually going to scan will also be presented, by the way. So, something like this should do:

scala> val df_unbound = spark.sql("select * from hottab")df_unbound: org.apache.spark.sql.DataFrame = [id: int, descr: string ... 1 more field]scala> val plan1 = df_unbound.queryExecution.executedPlan.toStringplan1: String ="*(1) FileScan parquet default.hottab[id#0,descr#1,loaddate#2] Batched: true, Format: Parquet, Location: CatalogFileIndex[hdfs://ns1/user/hive/warehouse/hottab], PartitionCount: 365, PartitionFilters: [],PushedFilters: [], ReadSchema: struct<id:int,descr:string>"scala> val df_filtered = spark.sql("select * from hottab where loaddate='2019-07-31'")df_filtered: org.apache.spark.sql.DataFrame = [id: int, descr: string ... 1 more field]scala> val plan2 = df_filtered.queryExecution.executedPlan.toStringplan2: String ="*(1) FileScan parquet default.hottab[id#17,descr#18,loaddate#19] Batched: true, Format: Parquet, Location: PrunedInMemoryFileIndex[hdfs://ns1/user/hive/warehouse/hottab/loaddate=2019-07-31], PartitionCount: 1, PartitionFilters: [isnotnull(loaddate#19), (loaddate#19 = 2019-07-31)], PushedFilters: [], ReadSchema: struct<id:int,descr:string>"

This way, you also don't have to deal with SQL parsing to find table name(s) from queries, and to interrogate metastore yourself.

As a bonus, you'll be also able to see if "regular" filter pushdown occurs (for storage formats that support it) in addition to partition pruning.

apache-spark hadoop hive-metastore

You can use Scala's Try class and execute show partitions on the required table.

val numPartitions = Try(spark.sql("show partitions database.table").count) match {    case Success(v) => v    case Failure(e) => -1}

Later you can check numPartitions. If the value is -1 then the table is not partitioned.

apache-spark hadoop hive-metastore

  val listPartitions = spark.sessionState.catalog.listPartitionNames(TableIdentifier("table_name", Some("db name")))  listPartitions: Seq[String] = ArrayBuffer(partition1=value1, ... )  // partition table  listPartitions: Seq[String] = ArrayBuffer() // not partition table

CodeHunter

how to get the partitions info of hive table in Spark

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last