Passing script files on hdfs to impala-shell

Impala shell can accept query text from STDIN. As described here, option -f

-f query_file or --query_file=query_file
query_file=path_to_query_file
Passes a SQL query from a file. Multiple statements must be semicolon (;) delimited. In Impala 2.3 and higher, you can specify a filename of - to represent standard input. This feature makes it convenient to use impala-shell as part of a Unix pipeline where SQL statements are generated dynamically by other tools.

So in your case, your shell script can simply do something like

$ hdfs dfs -cat <hdfs_file_name> | impala-shell -i <impala_daemon> -f -

shell hadoop hdfs oozie impala

If you have the fixed number of queries, or you can collect (cat) them into one file, then you can pass the name of this file(s) as a parameter out of the <action> using <capture-output/> tag:

$ hdfs hdfs -cat /user/impala/sql/custom_script_name.sql

CREATE TABLE default.t1(n INT);INSERT INTO default.t1 VALUES(1);

$ hdfs hdfs -cat /oozie/shell/prepare-implala-sql.sh

#!/bin/bashecho HDFS_IMPALA_SCRIPT:/user/impala/sql/custom_script_name.sql

$ hdfs hdfs -cat /user/oozie/workflow/wf_impala_env/wf_impala_env.xml

<workflow-app name="wf_impala_env" xmlns="uri:oozie:workflow:0.5">  <start to="a1"/>  <kill name="a0">    <message>Error: [${wf:errorMessage(wf:lastErrorNode())}]</message>  </kill>  <action name="a1">    <shell xmlns="uri:oozie:shell-action:0.2">      <job-tracker>${resourceManager}</job-tracker>      <name-node>${nameNode}</name-node>      <exec>bash</exec>      <argument>prepare-implala-sql.sh</argument>      <file>/oozie/shell/prepare-implala-sql.sh#prepare-implala-sql.sh</file>      <capture-output/>    </shell>    <ok to="a2"/>    <error to="a0"/>  </action>  ...

And then use it in Impala step as a <file> parameter:

  ...  <action name="a2">    <shell xmlns="uri:oozie:shell-action:0.2">      <job-tracker>${resourceManager}</job-tracker>      <name-node>${nameNode}</name-node>      <exec>impala-shell</exec>      <argument>-i</argument>      <argument>${impalad}</argument>      <argument>-f</argument>      <argument>query.sql</argument>      <env-var>PYTHON_EGG_CACHE=./myeggs</env-var>      <file>${wf:actionData("a1")["HDFS_IMPALA_SCRIPT"]}#query.sql</file>      <capture-output/>    </shell>    <ok to="a99"/>    <error to="a0"/>  </action>  <end name="a99"/></workflow-app>

Just don't forget about PYTHON_EGG_CACHE for impala-shell (or bash -> impala-shell).

CodeHunter

Passing script files on hdfs to impala-shell

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last