Integration testing Hive jobs

java testing hadoop mapreduce hive

Ideally one would be able to test hive queries with LocalJobRunner rather than resorting to mini-cluster testing. However, due to HIVE-3816 running hive with mapred.job.tracker=local results in a call to the hive CLI executable installed on the system (as described in your question).

Until HIVE-3816 is resolved, mini-cluster testing is the only option. Below is a minimal mini-cluster setup for hive tests that I have tested against CDH 4.4.

Configuration conf = new Configuration();/* Build MiniDFSCluster */MiniDFSCluster miniDFS = new MiniDFSCluster.Builder(conf).build();/* Build MiniMR Cluster */System.setProperty("hadoop.log.dir", "/path/to/hadoop/log/dir"); // MAPREDUCE-2785int numTaskTrackers = 1;int numTaskTrackerDirectories = 1;String[] racks = null;String[] hosts = null;miniMR = new MiniMRCluster(numTaskTrackers, miniDFS.getFileSystem().getUri().toString(),                           numTaskTrackerDirectories, racks, hosts, new JobConf(conf));/* Set JobTracker URI */System.setProperty("mapred.job.tracker", miniMR.createJobConf(new JobConf(conf)).get("mapred.job.tracker"));

There is no need to run a separate hiveserver or hiveserver2 process for testing. You can test with an embedded hiveserver2 process by setting your jdbc connection URL to jdbc:hive2:///

java testing hadoop mapreduce hive

I come to find one pretty good tool: HiveRunner.It is framework on top of jUnit to test hive scripts.Under the hood it starts a stand alone HiveServer with in memory HSQL as the metastore.

java testing hadoop mapreduce hive

I have implemented HiveRunner.

https://github.com/klarna/HiveRunner

We tested it on Mac and had some trouble with Windows, however with a few changes listed below the util served well.

For windows here are some of the changes that were done in order to have HiveRunner work in windows environment. After these changes unit testing is possible for all Hive queries.

1.Clone the project at https://github.com/steveloughran/winutils to anywhere on your computer, Add a new environment variable, HADOOP_HOME, pointing to the /bin directory of that folder. no forward slashes or spaces allowed.2.Clone the project at https://github.com/sakserv/hadoop-mini-clusters to anywhere on your computer. Add a new environment variable HADOOP_WINDOWS_LIBS, pointing to the /lib directory of that folder. Again, no forward slashes or spaces allowed.3.I also installed cygwin, assuming severla win utils for linux might be available through.

This pull on gitbub helped with making it work on windows,https://github.com/klarna/HiveRunner/pull/63

CodeHunter

Integration testing Hive jobs

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last