Switching between multiple databases in Rails without breaking transactions Switching between multiple databases in Rails without breaking transactions ruby-on-rails ruby-on-rails

Switching between multiple databases in Rails without breaking transactions


This is a tricky problem, because of tight coupling inside ActiveRecord, but I've managed to create some proof of concept that works. Or at least it looks like it works.

Some background

ActiveRecord uses a ActiveRecord::ConnectionAdapters::ConnectionHandler class that is responsible for storing connection pools per model. By default there is only one connection pool for all models, because usual Rails app is connected to one database.

After executing establish_connection for different database in particular model, new connection pool is created for that model. And also for all models that may inherit from it.

Before executing any query, ActiveRecord first retrieves connection pool for relevant model and then retrieves the connection from the pool.

Note that above explanation may not be 100% accurate, but it should be close.

Solution

So the idea is to replace the default connection handler with custom one that will return connection pool based on provided shard description.

This can be implemented in many different ways. I did it by creating the proxy object that is passing shard names as disguised ActiveRecord classes. Connection handler is expecting to get AR model and looks at name property and also at superclass to walk the hierarchy chain of model. I've implemented DatabaseModel class that is basically shard name, but it is behaving like AR model.

Implementation

Here is example implementation. I've used sqlite database for simplicity, you can just run this file without any setup. You can also take a look at this gist

# Define some required dependenciesrequire "bundler/inline"gemfile(false) do  source "https://rubygems.org"  gem "activerecord", "~> 4.2.8"  gem "sqlite3"endrequire "active_record"class User < ActiveRecord::BaseendDatabaseModel = Struct.new(:name) do  def superclass    ActiveRecord::Base  endend# Setup database connections and create databases if not presentconnection_handler = ActiveRecord::ConnectionAdapters::ConnectionHandler.newresolver = ActiveRecord::ConnectionAdapters::ConnectionSpecification::Resolver.new({  "users_shard_1" => { adapter: "sqlite3", database: "users_shard_1.sqlite3" },  "users_shard_2" => { adapter: "sqlite3", database: "users_shard_2.sqlite3" }})databases = %w{users_shard_1 users_shard_2}databases.each do |database|  filename = "#{database}.sqlite3"  ActiveRecord::Base.establish_connection({    adapter: "sqlite3",    database: filename  })  spec = resolver.spec(database.to_sym)  connection_handler.establish_connection(DatabaseModel.new(database), spec)  next if File.exists?(filename)  ActiveRecord::Schema.define(version: 1) do    create_table :users do |t|      t.string :name      t.string :email    end  endend# Create custom connection handlerclass ShardHandler  def initialize(original_handler)    @original_handler = original_handler  end  def use_database(name)    @model= DatabaseModel.new(name)  end  def retrieve_connection_pool(klass)    @original_handler.retrieve_connection_pool(@model)  end  def retrieve_connection(klass)    pool = retrieve_connection_pool(klass)    raise ConnectionNotEstablished, "No connection pool for #{klass}" unless pool    conn = pool.connection    raise ConnectionNotEstablished, "No connection for #{klass} in connection pool" unless conn    puts "Using database \"#{conn.instance_variable_get("@config")[:database]}\" (##{conn.object_id})"    conn  endendUser.connection_handler = ShardHandler.new(connection_handler)User.connection_handler.use_database("users_shard_1")User.create(name: "John Doe", email: "john.doe@example.org")puts User.countUser.connection_handler.use_database("users_shard_2")User.create(name: "Jane Doe", email: "jane.doe@example.org")puts User.countUser.connection_handler.use_database("users_shard_1")puts User.count

I think this should give an idea how to implement production ready solution. I hope I didn't miss anything obvious here. I can suggest couple of different approaches:

  1. Subclass ActiveRecord::ConnectionAdapters::ConnectionHandler and overwrite those methods responsible for retrieving connection pools
  2. Create completely new class implementing the same api as ConnectionHandler
  3. I guess it is also possible to just overwrite retrieve_connection method. I don't remember where it is defined, but I think it is in ActiveRecord::Core.

I think approaches 1 and 2 are the way to go and should cover all cases when working with databases.