What is use of hcatalog in hadoop?

hadoop hive hbase hcatalog

In short, HCatalog opens up the hive metadata to other mapreduce tools. Every mapreduce tools has its own notion about HDFS data (example Pig sees the HDFS data as set of files, Hive sees it as tables). With having table based abstraction, HCatalog supported mapreduce tools do not need to care about where the data is stored, in which format and storage location (HBase or HDFS).

We do get the facility of WebHcat to submit jobs in an RESTful way if you configure webhcat along Hcatalog.

hadoop hive hbase hcatalog

Here is a very basic example of how ho use HCATALOG.

I have a table in hive ,TABLE NAME is STUDENT which is stored in one of the HDFS location:

neethu 90malini 90sunitha 98mrinal 56ravi 90joshua 8

Now suppose I want to load this table to pig for further transformation of data,In this scenario I can use HCATALOG:

When using table information from the Hive metastore with Pig, add the -useHCatalog option when invoking pig:

pig -useHCatalog

(you may want to export HCAT_HOME 'HCAT_HOME=/usr/lib/hive-hcatalog/')

Now loading this table to pig:A = LOAD 'student' USING org.apache.hcatalog.pig.HCatLoader();

Now you have loaded the table to pig.To check the schema , just do a DESCRIBE on the relation.

DESCRIBE A

Thanks

hadoop hive hbase hcatalog

Adding other great posts I would like add an image for clear understating of how HCatalog works and which layer it sits in cluster

how <code>HCatalog</code> works and where it sits in cluster

Q: how exactly it works?

As you mentioned "HCatalog is a table and storage management layer for Hadoop" Which gives high level abstraction to other frameworks like MR, Spark and Pig by performing I/O operations to Distributed storage layer for Hive tables.

HCatalog comprises 3 key elements

SerDe : Serialization and deserialization lib to process various data formats.
Meta store DB : Uses to stores the schema of Hive tables.
WebHCat/HCatalog REST : UI/REST layer on top of meta store DB for web clients.

Q: how to use it?

Once HCatalog installed and running successfully you do the following on CLI

usage: hcat { -e "<query>" | -f "<filepath>" }    [ -g "<group>" ] [ -p "<perms>" ]    [ -D"<name> = <value>" ]-D <property = value>    use hadoop value for given property-e <exec>                hcat command given from command line-f <file>                hcat commands in file-g <group>               group for the db/table specified in CREATE statement-h,--help                Print help information-p <perms>               permissions for the db/table specified in CREATE statement

Example:

./hcat –e "SELECT * FROM employee;"

CodeHunter

What is use of hcatalog in hadoop?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last