How to return Struct from Hive UDF?
Here is a very simple example of such kind of UDF.It receives an User-Agent string, parse it using external lib and returns a structure with 4 text fields:
STRUCT<type: string, os: string, family: string, device: string>
You need to extend GenericUDF class and override two most important methods: initialize and evaluate.
initialize() describes the structure itself and defines data types inside.
evaluate() fills up the structure with actual values.
You don't need any special classes to return, struct<> in Hive is just an array of objects in Java.
import java.util.ArrayList;import org.apache.hadoop.hive.ql.exec.UDFArgumentException;import org.apache.hadoop.hive.ql.metadata.HiveException;import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;import org.apache.hadoop.io.Text;import eu.bitwalker.useragentutils.UserAgent;public class UAStructUDF extends GenericUDF { private Object[] result; @Override public String getDisplayString(String[] arg0) { return "My display string"; } @Override public ObjectInspector initialize(ObjectInspector[] arg0) throws UDFArgumentException { // Define the field names for the struct<> and their types ArrayList<String> structFieldNames = new ArrayList<String>(); ArrayList<ObjectInspector> structFieldObjectInspectors = new ArrayList<ObjectInspector>(); // fill struct field names // type structFieldNames.add("type"); structFieldObjectInspectors.add(PrimitiveObjectInspectorFactory.writableStringObjectInspector); //family structFieldNames.add("family"); structFieldObjectInspectors.add(PrimitiveObjectInspectorFactory.writableStringObjectInspector); // OS name structFieldNames.add("os"); structFieldObjectInspectors.add(PrimitiveObjectInspectorFactory.writableStringObjectInspector); // device structFieldNames.add("device"); structFieldObjectInspectors.add(PrimitiveObjectInspectorFactory.writableStringObjectInspector); StructObjectInspector si = ObjectInspectorFactory.getStandardStructObjectInspector(structFieldNames, structFieldObjectInspectors); return si; } @Override public Object evaluate(DeferredObject[] args) throws HiveException { if (args == null || args.length < 1) { throw new HiveException("args is empty"); } if (args[0].get() == null) { throw new HiveException("args contains null instead of object"); } Object argObj = args[0].get(); // get argument String argument = null; if (argObj instanceof Text){ argument = ((Text) argObj).toString(); } else if (argObj instanceof String){ argument = (String) argObj; } else { throw new HiveException("Argument is neither a Text nor String, it is a " + argObj.getClass().getCanonicalName()); } // parse UA string and return struct, which is just an array of objects: Object[] return parseUAString(argument); } private Object parseUAString(String argument) { result = new Object[4]; UserAgent ua = new UserAgent(argument); result[0] = new Text(ua.getBrowser().getBrowserType().getName()); result[1] = new Text(ua.getBrowser().getGroup().getName()); result[2] = new Text(ua.getOperatingSystem().getName()); result[3] = new Text(ua.getOperatingSystem().getDeviceType().getName()); return result; }}
There is a concept of SerDe ( serializer and deserialzer ) in HIVE that can be used with the kind of data format you are playing it. It serializes the objects (complex) and then de-serializes it according to the need.For instance, if you have a JSON file, that contains objects and values, so you need a way to store that content in hive.For that you weill use a JsonSerde, that is actually a jar file , containing the parser code written in java for playing around with Json data.
SO now you have a jar( SerDe), and the other requirement is for a schema to store that data.For eg: for XML files you need XSD,similarly for JSON you define object ,arrays and structures relations.You can check this link:http://thornydev.blogspot.in/2013/07/querying-json-records-via-hive.htmlPlease let me know if this helps and solves your purpose :)