Integration testing Hive jobs Integration testing Hive jobs hadoop hadoop

Integration testing Hive jobs


Ideally one would be able to test hive queries with LocalJobRunner rather than resorting to mini-cluster testing. However, due to HIVE-3816 running hive with mapred.job.tracker=local results in a call to the hive CLI executable installed on the system (as described in your question).

Until HIVE-3816 is resolved, mini-cluster testing is the only option. Below is a minimal mini-cluster setup for hive tests that I have tested against CDH 4.4.

Configuration conf = new Configuration();/* Build MiniDFSCluster */MiniDFSCluster miniDFS = new MiniDFSCluster.Builder(conf).build();/* Build MiniMR Cluster */System.setProperty("hadoop.log.dir", "/path/to/hadoop/log/dir"); // MAPREDUCE-2785int numTaskTrackers = 1;int numTaskTrackerDirectories = 1;String[] racks = null;String[] hosts = null;miniMR = new MiniMRCluster(numTaskTrackers, miniDFS.getFileSystem().getUri().toString(),                           numTaskTrackerDirectories, racks, hosts, new JobConf(conf));/* Set JobTracker URI */System.setProperty("mapred.job.tracker", miniMR.createJobConf(new JobConf(conf)).get("mapred.job.tracker"));

There is no need to run a separate hiveserver or hiveserver2 process for testing. You can test with an embedded hiveserver2 process by setting your jdbc connection URL to jdbc:hive2:///


I come to find one pretty good tool: HiveRunner.It is framework on top of jUnit to test hive scripts.Under the hood it starts a stand alone HiveServer with in memory HSQL as the metastore.


I have implemented HiveRunner.

https://github.com/klarna/HiveRunner

We tested it on Mac and had some trouble with Windows, however with a few changes listed below the util served well.

For windows here are some of the changes that were done in order to have HiveRunner work in windows environment. After these changes unit testing is possible for all Hive queries.

1.Clone the project at https://github.com/steveloughran/winutils to anywhere on your computer, Add a new environment variable, HADOOP_HOME, pointing to the /bin directory of that folder. no forward slashes or spaces allowed.2.Clone the project at https://github.com/sakserv/hadoop-mini-clusters to anywhere on your computer. Add a new environment variable HADOOP_WINDOWS_LIBS, pointing to the /lib directory of that folder. Again, no forward slashes or spaces allowed.3.I also installed cygwin, assuming severla win utils for linux might be available through.

This pull on gitbub helped with making it work on windows,https://github.com/klarna/HiveRunner/pull/63