Taming Large Rails Applications with Private ActiveRecord Models
A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system. — Gall’s Law
A new ActiveRecord instance has 373 public methods.
Scaling a Rails application is difficult, but not for the reasons you might expect. The framework does well at horizontally scaling for many businesses, especially small data businesses. But small data businesses might have a lot of complexity where precision counts and cents matter. (Hello, Gusto!).
All of this business complexity cannot be built in up front. The product would never ship. So what do we do? We start small and keep our aspirations in check. As we see success, our technology patterns need to grow with the business demands.
I still love Rails. But the Rails defaults can be difficult to grow with.1 I’ve spent the last few years at Gusto watching a Rails monolith scale far beyond expectations and still be okay to work in.
One of the crucial elements to doing so is improving the default Rails notion of data privacy. Defaulting everything to public—especially the models—does not fly in large code bases. The tangle of dependencies, callbacks, and relationships can grind your entire engineering organization to a halt.
To solve this, a few folks at work and I have been experimenting with private ActiveRecord models. We’ve gotten some great mileage out of this, and have been able to accomplish some tricky data refactors using this technique.
Let’s see how it works.
Motivation
As a system grows, we should aim for more encapsulation. We don’t need microservices or Rails engines to accomplish this; some little known Ruby primitives will do just the trick.
Why do we want encapsulation? As an organization grows, any public interface will eventually be used. This is Murphy’s Law as applied to data hiding. An explosion of public interfaces means that changing behavior is difficult, because all permutations of call sites need to be considered. By default, a column-less ActiveRecord model has 373 public methods and the ability to traverse the application’s object graph.
So we start to take a defensive approach to modeling, only making public what we are sure that we can support. This makes these models easier to support because they have a smaller surface. They are easier to change internally, too, because there are fewer different uses.
At Gusto, we use this practice to enable significant changes to underlying data structures without changing behavior. Changing the structure allows us to keep more precise records of the state of the system at certain points in time. We call this effective dating. If a business changes its name, different tax agencies might have different ideas of when that should be reported. We need to know both answers, but that complexity should not leak into all parts of the application. Hiding the data of the effective dates of those changes is one way of accomplishing this.
Reducing the surface area of our ActiveRecord models by making them private makes the act of effective dating the information—a task with some complicated database schemas and migrations—much easier.
It has also helped teams collaborate and create contracts for interactions. This is the software equivalent of “Trying to make everyone happy only makes yourselve miserable.” I’ve seen specific and enforced interactions in the code lead to more constructive professional relationships. By creating a layer above the ActiveRecord models, we are able to have precise business-level validations that go beyond the ActiveRecord validations framework.
Finally, there are many other libraries that will get you much closer to this data access pattern like ROM or sequel. Unfortunately, swapping out a technology so fundamental to an application is not an easy technical migration. Chances are, your Rails application was started with ActiveRecord. This blog post assumes you have a growing or mature Rails application where a technical migration to swap out ActiveRecord would be too expensive.
So if you find yourself chanting “The database is not an interface!” with your colleagues, this might be the pattern for you.
Implementation
First, let’s talk about a very simple model, Account
. This has the columns email
, name
, and number_of_licenses
. It has no relationships. Here’s its definition:
class Account < ApplicationRecord
end
Let’s see how many public methods it has:
irb(main):005:0> Account.public_methods.size
=> 685
685! Boy howdy! That is quite a large surface area!
For the purpose of this example, let’s assume this is a popular model. Let’s pretend it’s used all over the application and the backing table has millions of rows.
So let’s make this model private. Here’s the goal of how this might look:
module Accounts
# Some important stuff up here, which will get to in a bit
class Model < ApplicationRecord
self.table_name = 'accounts'
end
private_constant :Model
# Some important stuff down here, which will get to in a bit
end
This is our end goal. We are still using the same table name as our original model, because it has millions of rows. We’re using the name Model
here, but whatever we choose doesn’t matter because it’s private. It’s made private by the private_constant
keyword after the model definition. The self.table_name=
assignment allows us to tell Rails what should be the backing table.
Let’s fill in a bit more detail.
When we were looking at usages of the original Account
model, we noticed that users of the model were only performing two actions: Creating them and retrieving them.2 All usages could be explained by this, so we have our public API. Let’s see how that looks:
# app/models/accounts.rb
module Accounts
def self.fetch(id:)
# To be implemented
end
def self.create(name:)
# To be implemented
end
class Model < ApplicationRecord
self.table_name = 'accounts'
end
private_constant :Model
end
Here we’ve created two public methods Accounts.fetch
and Accounts.create
that will handle this behavior. Let’s fill in the implementation:
# app/models/accounts.rb
module Accounts
def self.fetch(id:)
Model.find(id)
end
def self.create(name:)
Model.create!(name: name)
end
class Model < ApplicationRecord
self.table_name = 'accounts'
end
private_constant :Model
end
So we could ship this, but this implementation is less than ideal. We’re leaking a rich ActiveRecord model with its large surface area through our API. We’re giving users the keys to our house, car, and object graph. We need something else. Enter in the Plain Old Ruby Object (PORO). In Domain-Driven Design parlance, we would call this an Entity.
Let’s see how that plugs in:
# app/models/accounts.rb
module Accounts
# --- Public APIs
def self.fetch(id:)
db_object = Model.find(id)
Account.new(
id: db_object.id,
name: db_object.name,
)
end
def self.create(name:)
db_object = Model.create!(name: name)
Account.new(
id: db_object.id,
name: db_object.name,
)
end
# --- Private ActiveRecord model
class Model < ApplicationRecord
self.table_name = 'accounts'
end
private_constant :Model
# --- Entity for the outside world
class Account
attr_reader :id, :name
def initialize(id:, name:)
@id = id
@name = name
end
end
end
Whenever returning the concept of an account, we now return our new Account
object. This is a simple object with two attributes, id
and name
. In our example, these are the only attributes the outside world needs.
This object does not have access to the ActiveRecord API and therefore cannot run amok. It has 2 public methods.
Expanding This Model
Our product likely isn’t going to stay still for very long. Let’s see how this behaves when augmenting this with new behavior. Let’s say our sales team wants to get notified if an account adds their 5th seat.
We might implement a new public method and model like so:
module Accounts
def add_seat(id:)
Model.transaction do
db_object = Model.find(id)
db_object.number_of_licenses += 1
db.object.save!
if db_object.number_of_licenses == 5
SalesNotification.create!(account_id: id)
end
end
end
class SalesNotification
belongs_to :account
end
private_constant :SalesNotification
# Rest of implementation...
end
In traditional Rails, we would probably add a method to the original ActiveRecord model and maybe a callback. But doing that enough times will quickly create a Ball of Mud. Every model will know about another model.
Using the approach outlined here, there’s a clean separation between data-related APIs and behavior-related APIs. It’s easy to see where transactions begin and end. The control flow is crystal clear.
Writing tests for this code is also straightforward, because there is effectively only one way to add a seat to the an Account
in our system. With vanilla Rails, we could do what we did above or use #update_column
, .update_all
, or many other things that would give me a headache.
Things Not Said
This blog post cannot cover everything to do with data hiding in Rails in one go. Here are a few things to consider, which I may take the time to write future posts about:
- This pattern is not the Rails default. You will likely grow out of just using the
app/
folder for organization. As such, it may look foreign to some folks with Rails experience. - This pattern is not suitable for all ActiveRecord models in your application. My recommendation: don’t apply this immediately and instead wait for the design to emerge. A potential design is emerging when you say, “Hm, we always modify an
Account
and aUser
at the same time. Maybe those should be part of the same module.” - Molding your current models into the structure above is hard and takes time. It can take dozens, hundreds, or thousands of small steps. For some complicated models, I’ve seen this take a great team 6 months. The recipes from Refactoring will be your friend.
- This work is all but required as your team looks to extract separate applications. Creating value-based boundaries is a great pit stop on the way to a service-oriented architecture. Because of this, many consider this activity to be “no regret” because it is architecture-agnostic.
If you have other considerations here, please let me know!
Conclusion
This pattern feels so simple it’s hardly worth writing about. My colleagues and I have gotten a lot of mileage of this so far, and are curious to hear other teams’ approaches as well. If you have any feedback or war stories, please drop me an email or send me a note on Twitter.
This idea has really been spearheaded by my coworkers: Matan Zruya, Bo Sorensen, Brian Buchalter, and a few others. I’m just condensing some of their great work into a blog post.
If you would like to pull down this code to play around with it, I’ve pushed a copy to my GitHub.
Special thanks to Matan Zruya, Wenley Tong, David Mitchell, Jeff Federman, Cody Sehl, Brian Buchalter, Bo Sorensen, Kent Beck, Hector Virgen, Jeff Carbonella, and Edward Ocampo-Gooding for reading early drafts of this post and providing feedback.