Database sharding on Heroku Database sharding on Heroku database database

Database sharding on Heroku


As the author of the first article happy to chime in further. When it comes to sharding one of the very key components is what key are you sharding on. The complexity of sharding really comes into play when you have data that is intermingled across different physical nodes. If you're something like a multi-tenant app then modeling all your data around this idea of a tenant or customer can fit very cleanly in this setup. In that case you're going to want to break up all tables that are related to customer and shard them the same way as other tenant related tables.

As for doing this on Heroku, there are two options. You can roll your own with Heroku Postgres and application logic, or using something like Citus (which is an add-on that helps manage more of this for you.

For rolling your own, you'll first create the various application logic to handle creating all your shards and knowing where to route the appropriate queries to. For Rails there are some gems to help wtih this like activerecord-multi-tenant or apartment. When it comes to actually moving to sharding and that migration, what you'll want to do is create a Heroku follower to start. During the migration you'll have it start un-following. Then you'll remove half of the data from the original primary and the other half from the follower you separated accordingly.


I am not sure I would call this "sharding."

In LedgerSMB here is how we do things. Each company (business entity) is a separate database with fully separate data. Data cannot be shared between companies. One postgreSQL cluster can run any number of company databases. We have an administrative interface that creates the database and loads the schema. The administrative interface can also create new users, which can be shared between companies (optionally). I don't know quite how well it would work to share users between dbs on Heroku but I am including that detail in terms of how we work with PostgreSQL.

So this is a viable approach.

What you really need is something to spin up databases and manage users in an automated way. From there you can require that the user specifies a company name that you can map to a database however you'd like (this mapping could be stored in another database for example).

I know this is fairly high level. It should get you started however.