What are the implications of of SSD usage on fundamental database assumptions? What are the implications of of SSD usage on fundamental database assumptions? database database

What are the implications of of SSD usage on fundamental database assumptions?


First, SSDs don't make random access free. Just cheaper. In particular, random writes remain very expensive, though that's mitigated in small random writes by a durable write-back cache.

WAL would be very expensive on SSDs if the SSD truly flushed it to the underlying media - but it doesn't. It accumulates it in write-back cache and periodicaly flushes it in whole erase-block sized chunks. So WAL actually works really well on SDDs, as there's never any need for a read/modify/write cycle for a partial erase-block write.

I'm sure there are opportunities to be had in tree structure storage for indexes on SSDs. That's not something we've really explored in PostgreSQL yet.

Most of the SSD-based DB servers I work with remain thoroughly disk I/O bound for normal operation. SSDs are fast, but not magic. Even PCI-E integrated SSDs can't compete with RAM, and big workloads tend to quickly saturate the SSD's write-back cache and queues.

Similarly, walking an adjacency list in a RDBMS is still far from free in computational terms, the on disk representation is less compact than in a graph DB, etc. There's a lot to be gained from specialization where you need it.

To truly look at what ultra-fast storage does to DBs you need to go a step further and look at PCIe RAM-based storage devices that're insanely, ridiculously fast.

BTW, in a great many ways an SSD isn't that different to a SCSI HBA with a big battery-backed write cache. These have been around for a long time. An SSD will tend to have better random reads, but it's otherwise pretty similar.