ActiveRecord performance vs self-executing statements

One of the battles of Rails development is dealing with performance problems. If you have a database-intensive operation, using ActiveRecord can be an incredible slowdown to your system. You will know this because the CPU usage of your Rails app during the operation will be pegged at 100% (this means your server is busy running Ruby code), while your MySQL daemon will be coasting along at 5% or less.

What I have found is that, for an INSERT query, directly executing the query using ActiveRecord::Base.connection.execute (alternately, use the model name of the table you’re using, e.g., Post.connection.execute) will yield roughly a 10 times speedup in performance versus using create!. Consider the following steps that Rails will execute to run create!:

  • Initialize and build a new ActiveRecord model from the data you passed to create!
  • Validate the model using any validations you have placed on it
  • Before and after each step, check to see if there are any callbacks, and if so, execute them
  • Build the query using several levels of abstraction, passing through many functions, etc.
  • Hundreds of calls to #read_attribute to read the data out of the ActiveRecord model.

Compare this to what Ruby has to do if you create the query yourself:

  • One call to sprintf.
  • One call to ActiveRecord::Base.connection.execute, which then invokes the Mysql adapter.

Be wary, of course, that you get none of the benefits of ActiveRecord: no validations, no callbacks, no quoting, etc. I do not recommend doing this if the data is coming from the end-user unless it is carefully validated and escaped first. Note also that you will not get the resulting ActiveRecord objects.

To summarize, if you need to create 100 objects, change this:

for d in data
  Something.create!(d)
end

To this:

for d in data
  Something.connection.execute(sprintf(
    "INSERT INTO somethings (a, b, c, d) VALUES (%d, %d, %d, %d)", 
    d[:a], d[:b], d[:c], d[:d]))
end

Taking It Further: Multi-inserts

Pundits may realize that there is even a better way to do this: MySQL supports inserting multiple records at once, known as a multi-insert or extended insert. This can yield a 2x to 5x increase above executing the queries singly, depending on the speed of your database. Using the previous code as a base, you could change it to a multi-insert by doing the following:

values = data.collect do |d|
  sprintf("(%d, %d, %d, %d)", d[:a], d[:b], d[:c], d[:d])
end
Something.connection.execute(
  "INSERT INTO somethings (a, b, c, d) VALUES " + values.join(", "))