A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system. — Gall’s Law
A new ActiveRecord instance has 373 public methods.
Scaling a Rails application is difficult, but not for the reasons you might expect. The framework does well at horizontally scaling for many businesses, especially small data businesses. But small data businesses might have a lot of complexity where precision counts and cents matter. (Hello, Gusto!).
All of this business complexity cannot be built in up front. The product would never ship. So what do we do? We start small and keep our aspirations in check. As we see success, our technology patterns need to grow with the business demands.
I still love Rails. But the Rails defaults can be difficult to grow with.1 I’ve spent the last few years at Gusto watching a Rails monolith scale far beyond expectations and still be okay to work in.
One of the crucial elements to doing so is improving the default Rails notion of data privacy. Defaulting everything to public—especially the models—does not fly in large code bases. The tangle of dependencies, callbacks, and relationships can grind your entire engineering organization to a halt.
To solve this, a few folks at work and I have been experimenting with private ActiveRecord models. We’ve gotten some great mileage out of this, and have been able to accomplish some tricky data refactors using this technique.
Let’s see how it works.
As a system grows, we should aim for more encapsulation. We don’t need microservices or Rails engines to accomplish this; some little known Ruby primitives will do just the trick.
Why do we want encapsulation? As an organization grows, any public interface will eventually be used. This is Murphy’s Law as applied to data hiding. An explosion of public interfaces means that changing behavior is difficult, because all permutations of call sites need to be considered. By default, a column-less ActiveRecord model has 373 public methods and the ability to traverse the application’s object graph.
So we start to take a defensive approach to modeling, only making public what we are sure that we can support. This makes these models easier to support because they have a smaller surface. They are easier to change internally, too, because there are fewer different uses.
At Gusto, we use this practice to enable significant changes to underlying data structures without changing behavior. Changing the structure allows us to keep more precise records of the state of the system at certain points in time. We call this effective dating. If a business changes its name, different tax agencies might have different ideas of when that should be reported. We need to know both answers, but that complexity should not leak into all parts of the application. Hiding the data of the effective dates of those changes is one way of accomplishing this.
Reducing the surface area of our ActiveRecord models by making them private makes the act of effective dating the information—a task with some complicated database schemas and migrations—much easier.
It has also helped teams collaborate and create contracts for interactions. This is the software equivalent of “Trying to make everyone happy only makes yourselve miserable.” I’ve seen specific and enforced interactions in the code lead to more constructive professional relationships. By creating a layer above the ActiveRecord models, we are able to have precise business-level validations that go beyond the ActiveRecord validations framework.
Finally, there are many other libraries that will get you much closer to this data access pattern like ROM or sequel. Unfortunately, swapping out a technology so fundamental to an application is not an easy technical migration. Chances are, your Rails application was started with ActiveRecord. This blog post assumes you have a growing or mature Rails application where a technical migration to swap out ActiveRecord would be too expensive.
So if you find yourself chanting “The database is not an interface!” with your colleagues, this might be the pattern for you.
First, let’s talk about a very simple model,
Account. This has the columns
number_of_licenses. It has no relationships. Here’s its definition:
class Account < ApplicationRecord end
Let’s see how many public methods it has:
irb(main):005:0> Account.public_methods.size => 685
685! Boy howdy! That is quite a large surface area!
For the purpose of this example, let’s assume this is a popular model. Let’s pretend it’s used all over the application and the backing table has millions of rows.
So let’s make this model private. Here’s the goal of how this might look:
module Accounts # Some important stuff up here, which will get to in a bit class Model < ApplicationRecord self.table_name = 'accounts' end private_constant :Model # Some important stuff down here, which will get to in a bit end
This is our end goal. We are still using the same table name as our original model, because it has millions of rows. We’re using the name
Model here, but whatever we choose doesn’t matter because it’s private. It’s made private by the
private_constant keyword after the model definition. The
self.table_name= assignment allows us to tell Rails what should be the backing table.
Let’s fill in a bit more detail.
When we were looking at usages of the original
Account model, we noticed that users of the model were only performing two actions: Creating them and retrieving them.2 All usages could be explained by this, so we have our public API. Let’s see how that looks:
# app/models/accounts.rb module Accounts def self.fetch(id:) # To be implemented end def self.create(name:) # To be implemented end class Model < ApplicationRecord self.table_name = 'accounts' end private_constant :Model end
Here we’ve created two public methods
Accounts.create that will handle this behavior. Let’s fill in the implementation:
# app/models/accounts.rb module Accounts def self.fetch(id:) Model.find(id) end def self.create(name:) Model.create!(name: name) end class Model < ApplicationRecord self.table_name = 'accounts' end private_constant :Model end
So we could ship this, but this implementation is less than ideal. We’re leaking a rich ActiveRecord model with its large surface area through our API. We’re giving users the keys to our house, car, and object graph. We need something else. Enter in the Plain Old Ruby Object (PORO). In Domain-Driven Design parlance, we would call this an Entity.
Let’s see how that plugs in:
# app/models/accounts.rb module Accounts # --- Public APIs def self.fetch(id:) db_object = Model.find(id) Account.new( id: db_object.id, name: db_object.name, ) end def self.create(name:) db_object = Model.create!(name: name) Account.new( id: db_object.id, name: db_object.name, ) end # --- Private ActiveRecord model class Model < ApplicationRecord self.table_name = 'accounts' end private_constant :Model # --- Entity for the outside world class Account attr_reader :id, :name def initialize(id:, name:) @id = id @name = name end end end
Whenever returning the concept of an account, we now return our new
Account object. This is a simple object with two attributes,
name. In our example, these are the only attributes the outside world needs.
This object does not have access to the ActiveRecord API and therefore cannot run amok. It has 2 public methods.
Expanding This Model
Our product likely isn’t going to stay still for very long. Let’s see how this behaves when augmenting this with new behavior. Let’s say our sales team wants to get notified if an account adds their 5th seat.
We might implement a new public method and model like so:
module Accounts def add_seat(id:) Model.transaction do db_object = Model.find(id) db_object.number_of_licenses += 1 db.object.save! if db_object.number_of_licenses == 5 SalesNotification.create!(account_id: id) end end end class SalesNotification belongs_to :account end private_constant :SalesNotification # Rest of implementation... end
In traditional Rails, we would probably add a method to the original ActiveRecord model and maybe a callback. But doing that enough times will quickly create a Ball of Mud. Every model will know about another model.
Using the approach outlined here, there’s a clean separation between data-related APIs and behavior-related APIs. It’s easy to see where transactions begin and end. The control flow is crystal clear.
Writing tests for this code is also straightforward, because there is effectively only one way to add a seat to the an
Account in our system. With vanilla Rails, we could do what we did above or use
.update_all, or many other things that would give me a headache.
Things Not Said
This blog post cannot cover everything to do with data hiding in Rails in one go. Here are a few things to consider, which I may take the time to write future posts about:
- This pattern is not the Rails default. You will likely grow out of just using the
app/folder for organization. As such, it may look foreign to some folks with Rails experience.
- This pattern is not suitable for all ActiveRecord models in your application. My recommendation: don’t apply this immediately and instead wait for the design to emerge. A potential design is emerging when you say, “Hm, we always modify an
Userat the same time. Maybe those should be part of the same module.”
- Molding your current models into the structure above is hard and takes time. It can take dozens, hundreds, or thousands of small steps. For some complicated models, I’ve seen this take a great team 6 months. The recipes from Refactoring will be your friend.
- This work is all but required as your team looks to extract separate applications. Creating value-based boundaries is a great pit stop on the way to a service-oriented architecture. Because of this, many consider this activity to be “no regret” because it is architecture-agnostic.
If you have other considerations here, please let me know!
This pattern feels so simple it’s hardly worth writing about. My colleagues and I have gotten a lot of mileage of this so far, and are curious to hear other teams’ approaches as well. If you have any feedback or war stories, please drop me an email or send me a note on Twitter.
This idea has really been spearheaded by my coworkers: Matan Zruya, Bo Sorensen, Brian Buchalter, and a few others. I’m just condensing some of their great work into a blog post.
If you would like to pull down this code to play around with it, I’ve pushed a copy to my GitHub.
Special thanks to Matan Zruya, Wenley Tong, David Mitchell, Jeff Federman, Cody Sehl, Brian Buchalter, Bo Sorensen, Kent Beck, Hector Virgen, Jeff Carbonella, and Edward Ocampo-Gooding for reading early drafts of this post and providing feedback.