Resources for Database Sharding and Partitioning Resources for Database Sharding and Partitioning database database

Resources for Database Sharding and Partitioning


I agree with the other answers that you should look at your schema and indexes before resorting to sharding. 10 million rows is well within the capabilities of any of the major database engines.

However if you want some resources for learning about the subject of sharding then try these:


I agree with Mike Woodhouse's observation that the current size should not be an issue - and the questioner agrees.

Most of the commercial DBMS provide support for fragmented tables in some for or another, under one name or several others. One of the key questions is whether there is a sensible way of splitting the data into fragments. One common way is to do so based on a date, so all the values for, say, November 2008 go in one fragment, those for October 2008 into another, and so on. This has advantages when it comes time to remove old data. You can probably drop the fragment containing data from October 2001 (seven years data retention) without affecting the other fragments. This sort of fragmentation can also help with 'fragment elimination'; if the query clearly cannot need to read the data from a given fragment, then it will be left unread, which can give you a magnificent performance benefit. (For example, if the optimizer knows that the query is for a date in October 2008, it will ignore all fragments except the one that contains the data from October 2008.)

There are other fragmentation techniques - round robin distributes the load across multiple disks, but means you cannot benefit from fragment elimination.


10 million rows is really not large in DBMS terms and I'd be looking first at my indexing and query plans before starting to plan a physical distribution of data with shards or partitions, which shouldn't really be necessary until your table's grown by a couple of orders of magnitude.

All IMHO, of course.