How to access Azure datalake using the webhdfs API How to access Azure datalake using the webhdfs API curl curl

How to access Azure datalake using the webhdfs API


I am indebted to this forum post by Matthew Hicks which outlined how to do this with curl. I took it and wrapped it in PowerShell. I'm sure there are many ways to accomplish this, but here's one that works.

First setup an AAD application so that you can fill in the client_id and client_secret mentioned below. (That assumes you want to automate this rather than having an interactive login. If you want an interactive login, then there's a link to that approach in the forum post above.)

Then fill in the settings in the first 5 lines and run the following PowerShell script:

$client_id = "<client id>";$client_secret = "<secret>";$tenant = "<tenant>";$adlsAccount = "<account>";cd D:\path\to\curl#authenticate$cmd = { .\curl.exe -X POST https://login.microsoftonline.com/$tenant/oauth2/token  -F grant_type=client_credentials       -F resource=https://management.core.windows.net/       -F client_id=$client_id       -F client_secret=$client_secret };$responseToken = Invoke-Command -scriptblock $cmd;$accessToken = (ConvertFrom-Json $responseToken).access_token;#list root folders$cmd = {.\curl.exe -X GET -H "Authorization: Bearer $accessToken" https://$adlsAccount.azuredatalakestore.net/webhdfs/v1/?op=LISTSTATUS };$foldersResponse = Invoke-Command -scriptblock $cmd;#loop through directories directories(ConvertFrom-Json $foldersResponse).FileStatuses.FileStatus | ForEach-Object { $_.pathSuffix }#list files in one folder$cmd = {.\curl.exe -X GET -H "Authorization: Bearer $accessToken" https://$adlsAccount.azuredatalakestore.net/webhdfs/v1/weather/?op=LISTSTATUS };$weatherResponse = Invoke-Command -scriptblock $cmd;(ConvertFrom-Json $weatherResponse).FileStatuses.FileStatus | ForEach-Object { $_.pathSuffix }#download one file$cmd = {.\curl.exe -L "https://$adlsAccount.azuredatalakestore.net/webhdfs/v1/weather/2007small.csv?op=OPEN" -H "Authorization: Bearer $accessToken" -o d:\temp\curl\2007small.csv };Invoke-Command -scriptblock $cmd;#upload one file$cmd = {.\curl.exe -i -X PUT -L "https://$adlsAccount.azuredatalakestore.net/webhdfs/v1/weather/new2007small.csv?op=CREATE" -T "D:\temp\weather\smallcsv\new2007small.csv" -H "Authorization: Bearer $accessToken" };Invoke-Command -scriptblock $cmd;