Are RDBMS that bad as described in Hadoop: The definitive guide?

postgresql nosql rdbms hbase

Their assertion that an RDMBS query planner takes that solution to the query is incorrect, at least for Postgresql 9.0, and I should imagine for other platforms too. I did a quick test with a similar query:

explain select * from client_attribute where client_attribute_type_code in ('UAG', 'RFR', 'IPA', 'FVD') order by client_attribute_id desc limit 10;                                                      QUERY PLAN----------------------------------------------------------------------------------------------------------------------- Limit  (cost=0.00..0.93 rows=10 width=85)   ->  Index Scan Backward using client_attribute_pkey on client_attribute  (cost=0.00..15516.47 rows=167234 width=85)         Filter: (client_attribute_type_code = ANY ('{UAG,RFR,IPA,FVD}'::bpchar[]))(3 rows)

Here client_attribute_id is indexed, so it does exactly as desired- walks back through the index, applies the filter and stops when the output hits the limit.

If the ordering column is not indexed, a table scan and sort is requierd, but only one table scan:

explain analyze select * from client_attribute where client_attribute_type_code in ('UAG', 'RFR', 'IPA', 'FVD') order by updated desc limit 10;                                                              QUERY PLAN--------------------------------------------------------------------------------------------------------------------------------------- Limit  (cost=13647.00..13647.03 rows=10 width=85) (actual time=180.961..180.964 rows=10 loops=1)   ->  Sort  (cost=13647.00..14065.09 rows=167234 width=85) (actual time=180.960..180.961 rows=10 loops=1)         Sort Key: updated         Sort Method:  top-N heapsort  Memory: 26kB         ->  Seq Scan on client_attribute  (cost=0.00..10033.14 rows=167234 width=85) (actual time=0.010..106.791 rows=208325 loops=1)               Filter: (client_attribute_type_code = ANY ('{UAG,RFR,IPA,FVD}'::bpchar[]))

This uses a heapsort to maintain the top 10 results through the course of the sequential scan, which sounds exactly like the solution they wrote themselves.

postgresql nosql rdbms hbase

I don't think that Tom White is saying that relational databases are "bad"; they aren't optimal for non-relational, non-set based data.

It's been well known for a long time that deep object graphs don't lend themselves well to relational databases. They're typically found in problems like CAD representations of geometric data, where assemblies are made up of assemblies of assemblies of parts. The reference chains are very long, indeed.

Object and graph databases have been the solutions to that kind of problems since I was aware of them in the early 90s.

Relational databases are terrific for relational, set-based data. But all data doesn't fall into that category. That's why NoSQL is gaining mind share.

I think that's what the example you cite is saying.

postgresql nosql rdbms hbase

RDBMS is for the queries you haven't thought of. Once you are certain exactly what you want, you can then apply the most optimum solution.

CodeHunter

Are RDBMS that bad as described in Hadoop: The definitive guide?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last