Create multiple columns from single Hive UDF

java hadoop hive user-defined-functions

I just use GenericUDTF.After you write a udf extends of GenericUDTF, your udtf should implements the two important method:initialize and evaluate.

In initialize, you can check the argument type and set the return object type.For example, with ObjectInspectorFactory.getStandardStructObjectInspector, you specify the output columns with the name from structFieldNames argument and the column value type from structFieldObjectInspectors). The output columns size is the size of structFieldNames list.There are two type system:java and hadoop. The ObjectInspector of java is begein with javaXXObjectInspector, otherwise it starts with writableXXObjectInspector.
In process, it is similar to the common udf. Except that, you should use the ObjectInspector which is saved from initialize() to convert the Object to concrete value such as String, Integer and etc. Call forward function to output a row. In the row object forwardColObj, you can specific the columns object.

The following is simple example:

public class UDFExtractDomainMethod extends GenericUDTF {    private static final Integer OUT_COLS = 2;    //the output columns size    private transient Object forwardColObj[] = new Object[OUT_COLS];    private transient ObjectInspector[] inputOIs;    /**    *    * @param argOIs check the argument is valid.    * @return the output column structure.    * @throws UDFArgumentException    */    @Override    public StructObjectInspector initialize(ObjectInspector[] argOIs) throws UDFArgumentException {        if (argOIs.length != 1 || argOIs[0].getCategory() != ObjectInspector.Category.PRIMITIVE                || !argOIs[0].getTypeName().equals(serdeConstants.STRING_TYPE_NAME)) {            throw new UDFArgumentException("split_url only take one argument with type of string");        }        inputOIs = argOIs;        List<String> outFieldNames = new ArrayList<String>();        List<ObjectInspector> outFieldOIs = new ArrayList<ObjectInspector>();        outFieldNames.add("host");        outFieldNames.add("method");        outFieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);        //writableStringObjectInspector correspond to hadoop.io.Text        outFieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);        return  ObjectInspectorFactory.getStandardStructObjectInspector(outFieldNames, outFieldOIs);    }    @Override    public void process(Object[] objects) throws HiveException {        try {            //need OI to convert data type to get java type            String inUrl = ((StringObjectInspector)inputOIs[0]).getPrimitiveJavaObject(objects[0]);            URI uri = new URI(inUrl);            forwardColObj[0] = uri.getHost();            forwardColObj[1] = uri.getRawPath();            //output a row with two column            forward(forwardColObj);        } catch (URISyntaxException e) {            e.printStackTrace();        }    }    @Override    public void close() throws HiveException {    }}

CodeHunter

Create multiple columns from single Hive UDF

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last