Create External table in Azure databricks

azure hive databricks azure-databricks external-tables

You can perform this operation, once the Azure Data lake storage is confiruged.

You should create a mount point using the method described below, if you want all users in the Databricks workspace to have access to the mounted Azure Data Lake Storage Gen2 account. The service client that you use to access the Azure Data Lake Storage Gen2 account should be granted access only to that Azure Data Lake Storage Gen2 account; it should not be granted access to other resources in Azure.

Once a mount point is created through a cluster, users of that cluster can immediately access the mount point. To use the mount point in another running cluster, users must run dbutils.fs.refreshMounts() on that running cluster to make the newly created mount point available for use.

There are three primary ways of accessing Azure Data Lake Storage Gen2 from a Databricks cluster:

Mounting an Azure Data Lake Storage Gen2 filesystem to DBFS using a service principal with delegated permissions and OAuth 2.0.
Using a service principal directly.
Using the Azure Data Lake Storage Gen2 storage account access key directly.

For more details, refer "Azure Data Lake Storage Gen2".

Hope this helps.

azure hive databricks azure-databricks external-tables

Sort of if you can use Python (or Scala).

Start by making the connection:

TenantID = "blah"def connectLake():  spark.conf.set("fs.azure.account.auth.type", "OAuth")  spark.conf.set("fs.azure.account.oauth.provider.type", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")  spark.conf.set("fs.azure.account.oauth2.client.id", dbutils.secrets.get(scope = "LIQUIX", key = "lake-sp"))  spark.conf.set("fs.azure.account.oauth2.client.secret", dbutils.secrets.get(scope = "LIQUIX", key = "lake-key"))  spark.conf.set("fs.azure.account.oauth2.client.endpoint", "https://login.microsoftonline.com/"+TenantID+"/oauth2/token")connectLake()lakePath = "abfss://liquix@mystorageaccount.dfs.core.windows.net/"

Using Python you can register a table using:

spark.sql("CREATE TABLE DimDate USING PARQUET LOCATION '"+lakePath+"/PRESENTED/DIMDATE/V1'")

You can now query that table if you have executed the connectLake() function - which is fine in your current session/notebook.

The problem is now if a new session comes in and they try select * from that table it will fail unless they run the connectLake() function first. There is no way around that limitation as you have to prove credentials to access the lake.

You may want to consider ADLS Gen2 credential pass through: https://docs.azuredatabricks.net/spark/latest/data-sources/azure/adls-passthrough.html

Note that this requires using a High Concurrency cluster.

CodeHunter

Create External table in Azure databricks

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last