master slave replication in rails

Introduction

By default activerecord works well with single db, it's applicable for most of websites with small/medium traffic, but if you website grows fast and gets much more reads than writes, you should definitly set up master slave replication for your databse. All inserts/updates are sent to master db, and reads are sent to slave db, it will reduce read load on your master db.

Master slave replication allows to set up as many slave dbs as you need, it's scalable, that means you can easily increase you db read throughput by adding more slave dbs. It also allows you to move some tasks like analytics on slave db without affecting your master db.

Replication in rails

How do we config master slave replication in rails app? There are a lot of choices, pick up one and setup according to its document. I don't want to discuss about these tools here, I will tell you how to use master slave replication in rails above these tools.

Problems

Master slave replication looks well, but it has a big problem in practice - replication lag. There is a lag between data inserted in master db and sync to slave db, let's see a case.

  1. a user create a post on your application.

  2. the post is inserted to master db.

  3. your application redirects user to post show page.

  4. your application read from slave db, but the post is not sync yet.

  5. a 404 page is shown. :-(

  6. the post is sync to slave db. (too late)

Lots of similar issues will raise after you applying master slave replication, how to solve them?

Solution

The solution is send some reads to master db to promise get fresh data.

By default all reads will be sent to master db in one db transaction, like

BEGIN
SELECT * from users where id = 1;
INSERT INTO posts(title, user_id) VALUES('test', 1);
COMMIT

In the following cases I will send reads to master db as well

  • queries in background job, like delayed_job, resque, workling, etc.
clas Post < ActiveRecord::Base
  after_create :notify

  protected
  def notify
    Delayed::Job.enqueue(DelayedJob::NotifyAdmin.new(self.id))
  end
end

class DelayedJob::NotifyAdmin < Struct.new(:post_id)
  def perform
    post = Post.find(post_id)
    ......
  end
end

It's probably the post does not exist when reading it from slave db in background job.

  • queries in the request which follows a redirect reponse
class PostsController < ApplicationController
  def show
    @post = Post.find(params[:id])
  end

  def create
    @post = Post.new(params[:post])
    if @post.save
      redirect_to post_path(@post)
    else
      render :new
    end
  end
end

This case is too common, creating/updating then redirecting, if the resource is not sync to slave db before next request, user will get a 404 page or get some fake data.

We know when we should explictly send reads to master db, but how can we do that. It's

ActiveRecord::Base.with_master {
  User.find(post.user_id)
}

Almost all of replication gem provide with_master method, any queries in the block will be sent to master db. I added a monkey patch to background job, wrapping it with with_master.

I added add a monkey patch to action controller as well, adding a parameter if the response is a redirect, then add a around_filter to controller to check if the reads in such request should be sent to master or slave db.

class ApplicationController < ActionController::Base
  around_filter :manage_slaving

  def manage_slaving
    if force_master?
      ActiveRecord::Base.with_master { yield }
    else
      yield
    end
  end
end

force_master? is a convenient way to manage your master/slave db on controller levels, you can also enable/disalbe master/slave for some specfied requests.

Finally test your application and add ActiveRecord::Base.with_mater {} if necessary.

Posted in  rails mysql


bullet 2.3.0 released

bullet is a gem to help you increase your application's performance by reducing the number of sql requests it makes. Today I released bullet 2.3.0 to better support rails 3.1 and 3.2 and performance improved. It's a long time I didn't do any changes to bullet, let me tell you the story I work for bullet 2.3.0.

At the beginning of this month, bullet got its 1000th watcher on github, I realized it's time to improve it e.g. speed up and compatible with edge rails.

The first thing I did is to refactor tests. Before I created several rspec tests, but they are more like integration tests instead of unit tests, so I move them to spec/integration/ directory. Then I added a bunch of test units to cover all codes, which can promise the correctness of further code refactors. I also use guard instead of watchr to do auto tests, why I preferred guard? It's much easier and has more extensions, like guard-rspec.

Then I moved AR models, which are used for integration tests, from integration tests to spec/models, and I also moved db connection, db schema and db seed to spec/support/, moved test helpers to spec/support/ as well. Now my tests looks much cleaner and run much faster (only connect db once).

After refactoring tests, I tried to improve the bullet performance, I already created a benchmark script before, bullet 2.2.1 with rails 3.0.12 spent 30s to complete

bullet 2.2.1 with rails 3.0.12
                                                                             user     system      total        real
Querying & Iterating 1000 Posts with 10000 Comments and 100 Users       29.970000   0.270000  30.240000 ( 30.452083)

Then I used perftools.rb to measure cpu time for methods, the result is garbage_collector, String#=~ and Kernel#caller

  1. garbage_collector, it depends on how many objects allocated
  2. String#=~, bullet use regexp to check if caller contains load_target
  3. Kernel#caller, bullet uses caller to tell what codes caused n+1 query

I found the easiest is to mitigate String#=~, as bullet only check regexp with constant string load_target, so I simply used .include?("load_target") instead.

bullet 2.3.0 with rails 3.0.12
                                                                             user     system      total        real
Querying & Iterating 1000 Posts with 10000 Comments and 100 Users       26.120000   0.430000  26.550000 ( 27.179304)

another change is to store object's ar_key instead of object itself.

{<#Post id:1, title:"post1", body:"post body", created_at:..., updated_at:...> => [:comments]}

to

{"Post:1" => [:comments]}

it speeds up hash comparison time and save the hash size.

I also hacked ActiveRecord::Associations::SingularAssociation#reader instead of ActiveRecord::Associations::Association#load_target for rails 3.1 and 3.2, it fixes activerecord 3.1 and 3.2 compatibility, there is no need to call caller in Association#load_target, it runs much faster in rails 3.1 and 3.2, the following is the benchmark result

bullet 2.3.0 with rails 3.2.2
                                                                             user     system      total        real
Querying & Iterating 1000 Posts with 10000 Comments and 100 Users       16.460000   0.190000  16.650000 ( 16.968246)

bullet 2.3.0 with rails 3.1.4
                                                                             user     system      total        real
Querying & Iterating 1000 Posts with 10000 Comments and 100 Users       14.600000   0.130000  14.730000 ( 14.937590)

Enjoy the new bullet gem!

Posted in  rails activerecord bullet


multiple_mailers - send emails by different smtp accounts

I use gmail to send email notifications on my website, it's really easy to build based on actionmailer

ActionMailer::Base.smtp_settings = {
  :address => 'smtp.gmail.com',
  :port => 587,
  :domain => 'railsbp.com',
  :authentication => :plain,
  :user_name => 'notification@railsbp.com',
  :password => 'password'
}

But I found it does not allow to setup 2 different smtp accounts, e.g. I want to send notification email with notification@railsbp.com and send exception notifier email with exception.notifier@railsbp.com, after googling, I hacked my mailer classes with

class NotificationMailer < ActionMailer::Base
  if Rails.env.production?
    class <<self
      def smtp_settings
        options = YAML.load_file("#{Rails.root}/config/mailers.yml")[Rails.env]['exception_notifier']
        @@smtp_settings = {
          :address              => options["address"],
          :port                 => options["port"],
          :domain               => options["domain"],
          :authentication       => options["authentication"],
          :user_name            => options["user_name"],
          :password             => options["password"]
        }
      end
    end
  end
end

then add a new config file config/mailers.yml

production:
  common: &common
    address: 'smtp.gmail.com'
    port: 587
    domain: 'rails-bestpractices.com'
    authentication: 'plain'

  notification:
    <<: *common
    user_name: 'notification@rails-bestpractices.com'
    password: 'password'

  exception.notifier:
    <<: *common
    user_name: 'exception.notifier@rails-bestpractices.com'
    password: 'password'

that allows me to setup one smtp account per actionmailer class, keep in mind that you should only hack smtp_settings for what environment you really want to send emails (here is production), if you don't check Rails.env, it will send email even in development and test environments.

Now it works fine, I can send emails by as many smtp accounts as I like, but it looks ugly, I don't like hacking codes all over my mailer classes. So I abstract it to a new gem multiple_mailers, like the hack above, you should define config file config/mailers.yml and for each mail class, what you only need is to declare its mailer account name

class NotificationMailer < ActionMailer::Base
  mailer_account "notification"
end

class ExceptionNotifier
  class Notifier < ActionMailer::Base
    mailer_account "exception.notifier"
  end
end

Posted in  rails actionmailer


passenger with http_gzip_static_module

Rails 3.1 has been released for a long time, asset pipeline becomes more and more popular, I also upgraded my rails website.

I used nginx + passenger for my rails projects, but nginx only supports dynamic gzip support (compress in runtime), there is a http_gzip_static_module for nginx, which can make full use of rails asset pipeline.

I don't like the way to customize my Nginx installation during passenger installation, I found there is a pull request to add http_gzip_static_module, so I changed to source code of passenger gem, then installed nginx as default. :-)

Posted in  rails passenger


rake arguments

Long ago I began to write some rake tasks, it's simple but doesn't have an instruction about how to add arguments to a rake task. What I did before is to use ruby environment variables.

task :try_argument do
  ENV['GLOBAL_ARGUMENT1'] or ENV['GLOBAL_ARGUMENT2']
end

GLOBAL_ARGUMENT1=xxx GLOBAL_ARGUMENT2=yyy rake try_argument

As you seen, I have to set the global environment variable to pass the arguement to a rake task.

But there is another way to pass the arguments to rake task via []

task :try_argument, [:key1, :key2] do |t, args|
  args.with_defaults(:key1 => value1, :key2 => value2)
  args[:key1] or args[:key2]
end

rake try_argument[xxx, yyy]

It looks like the difference between hash arguments and normal arguments.

Both of them have disadvantage:

ENV arguments also changes the system env variables normal arguments do not make sense when calling, difficult to remember the meanings of arguments.

Both work fine, it depends on you to use which one.

Posted in  rake


Fork me on GitHub