Create multiple columns from single Hive UDF Create multiple columns from single Hive UDF hadoop hadoop

Create multiple columns from single Hive UDF


I just use GenericUDTF.After you write a udf extends of GenericUDTF, your udtf should implements the two important method:initialize and evaluate.

  • In initialize, you can check the argument type and set the return object type.For example, with ObjectInspectorFactory.getStandardStructObjectInspector, you specify the output columns with the name from structFieldNames argument and the column value type from structFieldObjectInspectors). The output columns size is the size of structFieldNames list.There are two type system:java and hadoop. The ObjectInspector of java is begein with javaXXObjectInspector, otherwise it starts with writableXXObjectInspector.
  • In process, it is similar to the common udf. Except that, you should use the ObjectInspector which is saved from initialize() to convert the Object to concrete value such as String, Integer and etc. Call forward function to output a row. In the row object forwardColObj, you can specific the columns object.

The following is simple example:


public class UDFExtractDomainMethod extends GenericUDTF {    private static final Integer OUT_COLS = 2;    //the output columns size    private transient Object forwardColObj[] = new Object[OUT_COLS];    private transient ObjectInspector[] inputOIs;    /**    *    * @param argOIs check the argument is valid.    * @return the output column structure.    * @throws UDFArgumentException    */    @Override    public StructObjectInspector initialize(ObjectInspector[] argOIs) throws UDFArgumentException {        if (argOIs.length != 1 || argOIs[0].getCategory() != ObjectInspector.Category.PRIMITIVE                || !argOIs[0].getTypeName().equals(serdeConstants.STRING_TYPE_NAME)) {            throw new UDFArgumentException("split_url only take one argument with type of string");        }        inputOIs = argOIs;        List<String> outFieldNames = new ArrayList<String>();        List<ObjectInspector> outFieldOIs = new ArrayList<ObjectInspector>();        outFieldNames.add("host");        outFieldNames.add("method");        outFieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);        //writableStringObjectInspector correspond to hadoop.io.Text        outFieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);        return  ObjectInspectorFactory.getStandardStructObjectInspector(outFieldNames, outFieldOIs);    }    @Override    public void process(Object[] objects) throws HiveException {        try {            //need OI to convert data type to get java type            String inUrl = ((StringObjectInspector)inputOIs[0]).getPrimitiveJavaObject(objects[0]);            URI uri = new URI(inUrl);            forwardColObj[0] = uri.getHost();            forwardColObj[1] = uri.getRawPath();            //output a row with two column            forward(forwardColObj);        } catch (URISyntaxException e) {            e.printStackTrace();        }    }    @Override    public void close() throws HiveException {    }}