Can the raw data layer of a Data Lake contain a Table? Can the raw data layer of a Data Lake contain a Table? hadoop hadoop

Can the raw data layer of a Data Lake contain a Table?


I totally agree with above answer, but you can have no sql (hbase or cassandra) as better option to access IOT (streaming data ) since enourmously huge data will be pushed from IOT devices every second or time to time.

Where accessing the information will become tedious. If you want reporting then you can push this hourly data in to hive for reporting and analytics purpose.

Even hive meta store is down or corrupted, then you can fetch back data from cassandara at that point of time.

IOT with spark stream or something else -> cassandra/hbase -> hive/impala -> looker/presto would be good option. but one disadvantage is you would need to push the data in to 2 stores. positive side is at any point of time you can recover the data from nosql like cassandra.

Further reading which database is good for IOT


I am ingesting streaming data from some IoT devices. Can I then put this data directly into a Table ?.

IMHO this is one of the way. Some projects they will put the raw data in the cassandra/hbase as well based on the no sql as access pattern.

If you have a requirement to access the raw data(to see what data arrived..) using any BI tool like looker/presto then its ideal to put the data in to hive.

Other idea is storage in s3 parquet files partitioned by date (not time stamp) and then hive external table from s3 parquet files. This kind of access pattern will ensure that even though hive meta store was corrupted or something goes wrong with your Hadoop cluster s3 has the data and you can re-run the script to make tables.

It all depends on use case to ensure that data security , reliability and repotting.