Introduction
By default activerecord works well with single db, it's applicable for most of websites with small/medium traffic, but if you website grows fast and gets much more reads than writes, you should definitly set up master slave replication for your databse. All inserts/updates are sent to master db, and reads are sent to slave db, it will reduce read load on your master db.
Master slave replication allows to set up as many slave dbs as you need, it's scalable, that means you can easily increase you db read throughput by adding more slave dbs. It also allows you to move some tasks like analytics on slave db without affecting your master db.
Replication in rails
How do we config master slave replication in rails app? There are a lot of choices, pick up one and setup according to its document. I don't want to discuss about these tools here, I will tell you how to use master slave replication in rails above these tools.
Problems
Master slave replication looks well, but it has a big problem in practice - replication lag. There is a lag between data inserted in master db and sync to slave db, let's see a case.
a user create a post on your application.
the post is inserted to master db.
your application redirects user to post show page.
your application read from slave db, but the post is not sync yet.
a 404 page is shown. :-(
the post is sync to slave db. (too late)
Lots of similar issues will raise after you applying master slave replication, how to solve them?
Solution
The solution is send some reads to master db to promise get fresh data.
By default all reads will be sent to master db in one db transaction, like
BEGIN
SELECT * from users where id = 1;
INSERT INTO posts(title, user_id) VALUES('test', 1);
COMMIT
In the following cases I will send reads to master db as well
- queries in background job, like delayed_job, resque, workling, etc.
clas Post < ActiveRecord::Base
after_create :notify
protected
def notify
Delayed::Job.enqueue(DelayedJob::NotifyAdmin.new(self.id))
end
end
class DelayedJob::NotifyAdmin < Struct.new(:post_id)
def perform
post = Post.find(post_id)
......
end
end
It's probably the post does not exist when reading it from slave db in background job.
- queries in the request which follows a redirect reponse
class PostsController < ApplicationController
def show
@post = Post.find(params[:id])
end
def create
@post = Post.new(params[:post])
if @post.save
redirect_to post_path(@post)
else
render :new
end
end
end
This case is too common, creating/updating then redirecting, if the resource is not sync to slave db before next request, user will get a 404 page or get some fake data.
We know when we should explictly send reads to master db, but how can we do that. It's
ActiveRecord::Base.with_master {
User.find(post.user_id)
}
Almost all of replication gem provide with_master method, any queries in the block will be sent to master db. I added a monkey patch to background job, wrapping it with with_master.
I added add a monkey patch to action controller as well, adding a parameter if the response is a redirect, then add a around_filter to controller to check if the reads in such request should be sent to master or slave db.
class ApplicationController < ActionController::Base
around_filter :manage_slaving
def manage_slaving
if force_master?
ActiveRecord::Base.with_master { yield }
else
yield
end
end
end
force_master? is a convenient way to manage your master/slave db on controller levels, you can also enable/disalbe master/slave for some specfied requests.
Finally test your application and add ActiveRecord::Base.with_mater {} if necessary.