COPY GZIP HDFS data into vertica COPY GZIP HDFS data into vertica hadoop hadoop

COPY GZIP HDFS data into vertica


Yes there is GZIP support just need to compile GZIP libs [Vertica Guys helped me finally :)]

here are the steps :

  1. # cd /opt/vertica/sdk/examples/
  2. # make
  3. # vsql -f FilterFunctions.sql
  4. dbadmin=> CREATE LIBRARY GZipLib AS '/opt/vertica/sdk/examples/build/GZipLib.so';
  5. dbadmin=> CREATE FILTER GZip AS LANGUAGE 'C++' NAME 'GZipUnpackerFactory' LIBRARY GZipLib;

COPY abc002 SOURCE Hdfs(url='http://hadoop-namenode.com:50070/webhdfs/v1/03-01.txt.gz', username='xyz') filter GZip() DELIMITER E'\t';


Adding to roy answer,

Steps to make(build) are given below, (#2nd step on roy answer)

sudo apt-get install g++ sudo apt-get install zlib1g-dev  # for gzip g++ -lz -D HAVE_LONG_INT_64 -I /opt/vertica/sdk/include -Wall -shared -Wno-unused-value    -fPIC -o /opt/vertica/sdk/examples/build/GZipLib.so /opt/vertica/sdk/examples/FilterFunctions/GZip.cpp /opt/vertica/sdk/include/Vertica.cpp

Hint: -lz flag to link the zlib library statically with GZip.so

Vertica Documentation for compiling UDF