Passing script files on hdfs to impala-shell Passing script files on hdfs to impala-shell shell shell

Passing script files on hdfs to impala-shell


Impala shell can accept query text from STDIN. As described here, option -f

-f query_file or --query_file=query_file

query_file=path_to_query_file

Passes a SQL query from a file. Multiple statements must be semicolon (;) delimited. In Impala 2.3 and higher, you can specify a filename of - to represent standard input. This feature makes it convenient to use impala-shell as part of a Unix pipeline where SQL statements are generated dynamically by other tools.

So in your case, your shell script can simply do something like

$ hdfs dfs -cat <hdfs_file_name> | impala-shell -i <impala_daemon> -f -


If you have the fixed number of queries, or you can collect (cat) them into one file, then you can pass the name of this file(s) as a parameter out of the <action> using <capture-output/> tag:

$ hdfs hdfs -cat /user/impala/sql/custom_script_name.sql

CREATE TABLE default.t1(n INT);INSERT INTO default.t1 VALUES(1);

$ hdfs hdfs -cat /oozie/shell/prepare-implala-sql.sh

#!/bin/bashecho HDFS_IMPALA_SCRIPT:/user/impala/sql/custom_script_name.sql

$ hdfs hdfs -cat /user/oozie/workflow/wf_impala_env/wf_impala_env.xml

<workflow-app name="wf_impala_env" xmlns="uri:oozie:workflow:0.5">  <start to="a1"/>  <kill name="a0">    <message>Error: [${wf:errorMessage(wf:lastErrorNode())}]</message>  </kill>  <action name="a1">    <shell xmlns="uri:oozie:shell-action:0.2">      <job-tracker>${resourceManager}</job-tracker>      <name-node>${nameNode}</name-node>      <exec>bash</exec>      <argument>prepare-implala-sql.sh</argument>      <file>/oozie/shell/prepare-implala-sql.sh#prepare-implala-sql.sh</file>      <capture-output/>    </shell>    <ok to="a2"/>    <error to="a0"/>  </action>  ...

And then use it in Impala step as a <file> parameter:

  ...  <action name="a2">    <shell xmlns="uri:oozie:shell-action:0.2">      <job-tracker>${resourceManager}</job-tracker>      <name-node>${nameNode}</name-node>      <exec>impala-shell</exec>      <argument>-i</argument>      <argument>${impalad}</argument>      <argument>-f</argument>      <argument>query.sql</argument>      <env-var>PYTHON_EGG_CACHE=./myeggs</env-var>      <file>${wf:actionData("a1")["HDFS_IMPALA_SCRIPT"]}#query.sql</file>      <capture-output/>    </shell>    <ok to="a99"/>    <error to="a0"/>  </action>  <end name="a99"/></workflow-app>

Just don't forget about PYTHON_EGG_CACHE for impala-shell (or bash -> impala-shell).