What is meant by sparse data/ datastore/ database?

database hadoop database-schema hbase sparse-matrix

In a regular database, rows are sparse but columns are not. When a row is created, storage is allocated for every column, irrespective of whether a value exists for that field (a field being storage allocated for the intersection of a row and and a column).

This allows fixed length rows greatly improving read and write times. Variable length data types are handled with an analogue of pointers.

Sparse columns will incur a performance penalty and are unlikely to save you much disk space because the space required to indicate NULL is smaller than the 64-bit pointer required for the linked-list style of chained pointer architecture typically used to implement very large non-contiguous storage.

Storage is cheap. Performance isn't.

database hadoop database-schema hbase sparse-matrix

At the storage level, all data is stored as a key-value pair. Each storage file contains an index so that it knows where each key-value starts and how long it is.

As a consequence of this, if you have very long keys (e.g. a full URL), and a lot of columns associated with that key, you could be wasting some space. This is ameliorated somewhat by turning compression on.

See:http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html

for more information on HBase storage

database hadoop database-schema hbase sparse-matrix

Sparse in respect to HBase is indeed used in the same context as a sparse matrix. It basically means that fields that are null are free to store (in terms of space).

I found a couple of blog posts that touch on this subject in a bit more detail:

http://blog.rapleaf.com/dev/2008/03/11/matching-impedance-when-to-use-hbase/

http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable

CodeHunter

What is meant by sparse data/ datastore/ database?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last