Hadoop sequential data access Hadoop sequential data access hadoop hadoop

Hadoop sequential data access


This is not really specific to Hadoop.

Sequential Access pattern is when you read your data in sequence (often from start to finish). Consider a book example. When reading a novel, you use sequential order: you start with page 1, then move to page 2 and so on. The other common pattern is called Random Access. This is when you jump from one place to another, and possibly even backwards when reading data. For a book example, consider a dictionary. You don't read it like you read a novel. Instead, you search for your word in the middle somewhere. And when you're done looking up that word, you may perhaps go look for another word that is located hundreds of pages away from where you have your book open to at the moment. That searching of where you should start reading from is called a "seek".

When you access sequentially, you only need to seek once and then read until you're done with that data. When doing random access, you need to seek every time you want to switch to a different place in your file. This can be quite a performance hit on hard drives, because seeking is really expensive on magnetic drives.