Is InnoDB (MySQL 5.5.8) the right choice for multi-billion rows? Is InnoDB (MySQL 5.5.8) the right choice for multi-billion rows? mysql mysql

Is InnoDB (MySQL 5.5.8) the right choice for multi-billion rows?


A simple answer to your question would be yes InnoDB would be the perfect choice for a multi-billion row data set.

There is a host of optimization that is possbile.

The most obvious optimizations would be setting a large buffer pool, as buffer pool is the single most important thing when it comes to InnoDB because InnoDB buffers the data as well as the index in the buffer pool. If you have a dedicated MySQL server with only InnoDB tables, then you should set upto 80% of the available RAM to be used by InnoDB.

Another most important optimization is having proper indexes on the table (keeping in mind the data access/update pattern), both primary and secondary. (Remember that primary indexes are automatically appended to secondary indexes).

With InnoDB there are some extra goodies, such as protection from data corruption, auto-recovery etc.

As for increasing write-performance, you should setup your transaction log files to be upto a total of 4G.

One other thing that you can do is partition the table.

You can eek out more performance, by setting the bin-log-format to "row", and setting the auto_inc_lock_mode to 2 (that will ensure that innodb does not hold table level locks when inserting into auto-increment columns).

If you need any specific advice you can contact me, I would be more than willing to help.


optimizations

  • Take care not to have too many indexes. They are expensive when inserting
  • Make your datatypes fit your data, as tight fit you can. (so don't go saving ip-adresses in a text or a blob, if you know what i mean). Look in to varchar vs char. Don't forget that because varchar is more flexible, you are trading in some things. If you know a lot about your data it might help to use char's, or it might be clearly better to use varchars. etc.
  • Do you read at all from this table? If so, you might want to do all the reading from a replicated slave, although your connection should be good enough for that amount of data.
  • If you have big inserts (aside from the number of inserts), make sure your IO is actually quick enough to handle the load.
  • I don't think there is any reason MySQL wouldn't support this. Things that can slow you down from "thousands" to "millions" to "billions" are stuff like aforementioned indexes. There is -as far as i know- no "mysql is full" problem.
  • Look into Partial indexes. From wikipedia (quickest source I could find, didn't check the references, but I'm sure you can manage:)

MySQL as of version 5.4 does not support partial indexes.[3] In MySQL, the term "partial index" is sometimes used to refer to prefix indexes, where only a truncated prefix of each value is stored in the index. This is another technique for reducing index size.[4]


No idea on the MySQL/InnoDB part (I'd assume it'll cope). But if you end up looking at alternatives, PostgreSQL can manage a DB of unlimited size on paper. (At least one 32TB database exists according to the FAQ.)

Can you tell me what sort of optimizations i can do to help speed up things?

Your milage will vary depending on your application. But with billions of rows, you're at least looking into partitioning your data, in order to work on smaller tables.

In the case of PostgreSQL, you'd also look into creating partial indexes where appropriate.