ActiveRecord pitfall that could cause a disaster

ITOH Akihiko
3 min readJun 22, 2022

Today I was trying to optimize a SidekiqJob which was consuming a lot of memory and causing troubles.

I quickly identified the source of memory bloat. It looked like

contracts = scope1 + scope2
contracts.each do |contract|
...
end

scope1 and scope2 are methods that are supposed to return an ActiveRecord::Relation . By doing scope1 + scope2 you load all the contracts and consume a huge amount of memory if you have lots of contracts. Even if that wasn’t a problem, contracts.each would still load all the contracts and cause memory bloat.

This is somewhat typical. I’ve seen and fixed many of these, and if I’m honest, I’ve written these myself too without thinking much about how many records my code would be dealing with. But that’s not the pitfall I’m talking about.

So I’ve changed the code to something like

contracts = scope1.union(scope2)
contracts.find_each do |contract|
...
end

Tests were passing. Easy-peasy.

Before opening a pull request to have my colleagues review this two-line change, I wanted to quickly check these queries on staging and make sure the default batch size of find_each was appropriate in this case.

I was trying different scenarios and in one case scope1.count and scope2.count were both 0, but scope1.union(scope2).count was hundreds of thousands. That makes no sense.

It turns out that the method scope2 could return an empty array [] in some cases, which was fine when scope1 and scope2 were later concatenated by + because it would be just a concatenation of two arrays.

But now with union, it’s a different story.

First of all, I was a bit surprised that union accepted an array as an argument. But apparently it did, and I expected it to add nothing to the result. So what I expected was something like this:

Contract.none.union([]).count
# => 0

But in reality, it returns all contracts (plus whatever is in the first part):

Contract.none.union([]).count
# => <number of all contracts>

If you look at the SQL query generated by ActiveRecord, it becomes more obvious what’s going on:

Contract.none.union([]).to_sql=> "SELECT \"contracts\".* FROM ( (SELECT \"contracts\".* FROM \"contracts\" WHERE (1=0)) UNION (SELECT \"contracts\".* FROM \"contracts\") ) \"contracts\""

As you can see, the part after UNION is literally querying all contracts.

So why does this happen?

We have to first look at the active_record_union gem where the union method is coming from. And it does this:

@klass.where(relation_or_where_arg, *args)

in case relation_or_where_arg, which is the first argument (and the only one in this case) to union is not an ActiveRecord::Relation. It’s basically equivalent of Contract.where([]) in this case.

Now, what does ActiveRecord do? Sure enough, it doesn’t do anything to the scope when the argument is empty:

# If the condition is any blank-ish object, then #where is a no-op and returns
# the current relation.
def where(*args)
if args.empty?
WhereChain.new(spawn)
elsif args.length == 1 && args.first.blank?
self
else
spawn.where!(*args)
end
end

In this case, Contract.where([]) would be equivalent of Contract.all.

Conclusion

Don’t use an empty array as a replacement for ActiveRecord::Relation. If you want one that’s guaranteed to result in an empty result, use none instead (e.g. Contract.none).

Also, it shows how important it is to test for things that “should not happen”, not just what “should happen”. Because if you had tests that make sure that nothing is happening to contracts that are not subject to whatever the Sidekiq job does, you’d realize something was wrong.

--

--