COPY GZIP HDFS data into vertica
Yes there is GZIP support just need to compile GZIP libs [Vertica Guys helped me finally :)]
here are the steps :
- # cd /opt/vertica/sdk/examples/
- # make
- # vsql -f FilterFunctions.sql
- dbadmin=> CREATE LIBRARY GZipLib AS '/opt/vertica/sdk/examples/build/GZipLib.so';
- dbadmin=> CREATE FILTER GZip AS LANGUAGE 'C++' NAME 'GZipUnpackerFactory' LIBRARY GZipLib;
COPY abc002 SOURCE Hdfs(url='http://hadoop-namenode.com:50070/webhdfs/v1/03-01.txt.gz', username='xyz') filter GZip() DELIMITER E'\t';
Adding to roy answer,
Steps to make(build) are given below, (#2nd step on roy answer)
sudo apt-get install g++ sudo apt-get install zlib1g-dev # for gzip g++ -lz -D HAVE_LONG_INT_64 -I /opt/vertica/sdk/include -Wall -shared -Wno-unused-value -fPIC -o /opt/vertica/sdk/examples/build/GZipLib.so /opt/vertica/sdk/examples/FilterFunctions/GZip.cpp /opt/vertica/sdk/include/Vertica.cpp
Hint: -lz flag to link the zlib library statically with GZip.so
It doesn't look like copying from HDFS supports GZIP?:
I don't see it in that doc, in any case.