Hive UDF Text to array
Actually the 'UDF' interface does support returning an array.
Return ArrayList<Text>
or even ArrayList<String>
instead of Text[]
Your code should look like this:
import java.util.ArrayList;import java.util.List;import java.util.StringTokenizer;import org.apache.hadoop.hive.ql.exec.UDF;import org.apache.hadoop.io.Text;public class WordSplit extends UDF { public ArrayList<String> evaluate(final Text text) { ArrayList<String> splitList = new ArrayList<String>(); StringTokenizer tokenizer = new StringTokenizer(text.toString()); while (tokenizer.hasMoreElements()) { String word = stemWord((String) tokenizer.nextElement()); splitList.add(word); } return splitList; } /** * Stems words to normal form. * * @param word * @return Stemmed word. */ private String stemWord(String word) { word = word.toLowerCase(); return word; }}
I don't think 'UDF' interface will provide what you want. You want to use GenericUDF. I would use the source of the split UDF as a guide.