Hive UDF Text to array Hive UDF Text to array hadoop hadoop

Hive UDF Text to array


Actually the 'UDF' interface does support returning an array.

Return ArrayList<Text> or even ArrayList<String> instead of Text[]

Your code should look like this:

import java.util.ArrayList;import java.util.List;import java.util.StringTokenizer;import org.apache.hadoop.hive.ql.exec.UDF;import org.apache.hadoop.io.Text;public class WordSplit extends UDF {  public ArrayList<String> evaluate(final Text text) {    ArrayList<String> splitList = new ArrayList<String>();    StringTokenizer tokenizer = new StringTokenizer(text.toString());    while (tokenizer.hasMoreElements()) {      String word = stemWord((String) tokenizer.nextElement());      splitList.add(word);    }    return splitList;  }  /**   * Stems words to normal form.   *   * @param word   * @return Stemmed word.   */  private String stemWord(String word) {    word = word.toLowerCase();    return word;  }}