ActiveRecord pitfall that could cause a disaster
Today I was trying to optimize a SidekiqJob which was consuming a lot of memory and causing troubles.
I quickly identified the source of memory bloat. It looked like
contracts = scope1 + scope2
contracts.each do |contract|
scope2 are methods that are supposed to return an
ActiveRecord::Relation . By doing
scope1 + scope2 you load all the contracts and consume a huge amount of memory if you have lots of contracts. Even if that wasn’t a problem,
contracts.each would still load all the contracts and cause memory bloat.
This is somewhat typical. I’ve seen and fixed many of these, and if I’m honest, I’ve written these myself too without thinking much about how many records my code would be dealing with. But that’s not the pitfall I’m talking about.
So I’ve changed the code to something like
contracts = scope1.union(scope2)
contracts.find_each do |contract|
Tests were passing. Easy-peasy.
Before opening a pull request to have my colleagues review this two-line change, I wanted to quickly check these queries on staging and make sure the default batch size of
find_each was appropriate in this case.
I was trying different scenarios and in one case
scope2.count were both
scope1.union(scope2).count was hundreds of thousands. That makes no sense.
It turns out that the method
scope2 could return an empty array
 in some cases, which was fine when
scope2 were later concatenated by
+ because it would be just a concatenation of two arrays.
But now with
union, it’s a different story.
First of all, I was a bit surprised that
union accepted an array as an argument. But apparently it did, and I expected it to add nothing to the result. So what I expected was something like this:
# => 0
But in reality, it returns all contracts (plus whatever is in the first part):
# => <number of all contracts>
If you look at the SQL query generated by ActiveRecord, it becomes more obvious what’s going on:
Contract.none.union().to_sql=> "SELECT \"contracts\".* FROM ( (SELECT \"contracts\".* FROM \"contracts\" WHERE (1=0)) UNION (SELECT \"contracts\".* FROM \"contracts\") ) \"contracts\""
As you can see, the part after
UNION is literally querying all contracts.
So why does this happen?
relation_or_where_arg, which is the first argument (and the only one in this case) to
union is not an
ActiveRecord::Relation. It’s basically equivalent of
Contract.where() in this case.
Now, what does ActiveRecord do? Sure enough, it doesn’t do anything to the scope when the argument is empty:
# If the condition is any blank-ish object, then #where is a no-op and returns
# the current relation.
elsif args.length == 1 && args.first.blank?
In this case,
Contract.where() would be equivalent of
Don’t use an empty array as a replacement for
ActiveRecord::Relation. If you want one that’s guaranteed to result in an empty result, use
none instead (e.g.
Also, it shows how important it is to test for things that “should not happen”, not just what “should happen”. Because if you had tests that make sure that nothing is happening to contracts that are not subject to whatever the Sidekiq job does, you’d realize something was wrong.