I’ve Made a Huge Mistake — GoRuCo 2018 Talk

You can also see the slides on SpeakerDeck.

The following is text of the talk with some editing. It’s not an exact transcription.

This talk tells the story of a small team that failed to break apart its Rails monolith and the lessons they learned along the way. It covers some of the pitfalls that you might experience when trying to modularize a large application.

The talk concludes with 5 concrete tips you can employ in your day job to help ease the transition to a more modular world.

Part I: About Me

My name is Kelly Sutton, I work as a software engineer in San Francisco. I live with my fiancé, Amelie, and our dog, Greta. I work for a company called Gusto, which does payroll, benefits, and HR software for small businesses.

Gusto serves about 1% of small businesses in the US. We operate mostly out of a single monolith and have over 80 engineers. The platform moves more than $1 billion per month for tens of thousands of businesses.

Payroll software is a careful combination of time, geography, money, and people. Folks need to be paid on time for the time they worked. Single tax jurisdictions can be as small as a city block, or as large as the country. Employees need to be paid accurately to the cent, otherwise the you’ll make the IRS unhappy.

Finally, all of this is being run by people. We are all human and to be human is to make mistakes. A payroll system must be designed to be malleable, and to be able to correct mistakes.

Which brings us to an important caveat in this talk: Gusto might make different tradeoffs than your business. One of the biggest tradeoffs we make is the tradeoff between correctness and performance. We will always choose accuracy and precision over speed. This comes into play when modeling our data or choosing caching schemes.

Every business is different, so some of the things ahead may not apply to you.

Part II: Breaking the Monolith

Before we discuss how to break a monolith, we need to describe it. I like to call a large monolith a Swamp. You may know it as the Ball of Mud.

I find Swamp a better terminology, because it is something you’re in. A Ball of Mud is something that is just over there, on a pedastal. As you work in a Swamp, things become slower and slower. What used to take milliseconds now takes seconds. What used to take seconds now takes minutes.

Here’s what our Swamp looks like. It’s all in the same monolith with different parts in charge of Payroll, Benefits, HR, and general Infrastructure.

The seams between each of these domains is fuzzy. Models and database tables are shared. None of the interaction patterns are well-defined. Oftentimes you just reach directly into the other domain’s tables if you need information from them.

It’s not long of operating in a world like this before someone comes back from a conference and says, “Let’s extract a Service!”

You want to get back to that rails new feeling, so your team talks it over and you decide you’ll extract a new HR service.

So you new up an HR v2 service and get to work.

You work with the team to slowly connect the existing functionality to the new service. You meticulously move over behaviors one by one.

But this ends up taking a lot longer than you thought. Furthermore, the HR responsibilities were much larger than you thought. It soon becomes someone’s full-time job to track “What’s in the original HR system and what’s in HR v2?”

Time drags on and the project feels like it will never ship. PMs and other stakeholders are breathing down your neck, so you eventually call things done enough. You decide to ship what you’ve got and move onto the next project.

But while you’ve moved over 90% of the behavior to the new system, the old HR system still is doing critical work for the company. You need to keep the old HR system around.

And now you’re in a worse situation that you started with. You now have 2 HR systems!

This is exactly where tribal knowledge comes from. You have 2 ways of doing something, and knowing which does what is a learned skill instead of something that is self-evident. The question Where are Social Security Numbers stored? now is something that has a learned answer. Oh, we didn’t get around to moving those so those are still in the Old HR system.

These questions make it more difficult to onboard as a new engineer, and make maintaining and reasoning about the system difficult for seasoned engineers.

To talk about our better way, we first need to make the distinction between Applications and Services. These are distinctions we use at Gusto, and may differ from the nomenclature at your own company.

For us, Applications live on their own host, have their own datastore, can scale independently, and could be written in another language entirely. These are what results from a rails new command. This is often what folks mention when discussing microservices. It’s implied that the operating environments are distinct from each other.

Services, on the other hand, could just be different modules within the same application. They share the same process and resources, and might use in-process calls for communicating among each other. They may use the same underlying datastore, like the monolith’s MySQL instance, but they don’t reach directly into another Service’s tables or models.

Armed with this nomenclature, we now can define what we want to do. Our goal is to extract Services first, and then Applications. Turning a Service into an Application should be a small amount of work, and it should be the last thing that we do. Migrating to a Service without extracting an Application still lives the whole system in a better-organized state.

How well we draw our Service boundaries can be measured by how easy it is to extract an Application.

So let’s move back to our Swamp and the same scenario: We want to extract an HR Application.

First, we’ll sit down with the 2 teams responsible for these domains and have a conceptual discussion. What should the boundaries between these domains or Bounded Contexts be? Where should responsibilities lie? We’ll want to reach deep into our knowledge of Domain-Driven Design to help sculpt the answers to these questions.

Once we have a conceptual agreement between our 2 teams, we need to figure out how that maps to the current state of the code. There’s a good chance that there will be a large gap between your conceptual agreement and the state of the code.

Our next job is to make these interactions more explicit between our domains in the code. Rather than reaching into database models and tables, define service classes that return Value Objects.

These service classes that communicate with Value Objects help us to be strict between domains. With cross-domain communication, we do not want to use ActiveRecord or any other rich objects. These objects have too much state, and give the keys to the kingdom away. Instead we want to reach for simple values or Plain Old Ruby Objects (POROs).

This will feel unlike traditional Rails development. You will find yourself mapping to and from these Value Objects, but you will have the well-defined behaviors between domains.

As you work through hammering out these relationships, they will over time be the only way that these domains communicate. These strong interfaces become your de facto Services.

You’ll be using in-process method calls, but you will now have the flexibility to swap in different transport mechanisms. Whether you want to switch to JSON over HTTP, GraphQL over HTTP, or gRPC will be a small change because all communication can be easily serialized.¹

Once we have the boundaries between our Services well-defined, extracting an Application is an easier effort.

Which brings us to the concept of Conceptual Compression. @dhh mentioned this term in his Railsconf 2018 keynote. Rails does a great job of “compressing” concepts that we don’t need to care about until necessary.

Chances are, your Rails app doesn’t require a DBA or anyone else with a strong knowledge of SQL. ActiveRecord and its underlying technologies give you safe defaults that will work for the first few years of your application. Your app may never grow large enough to have concerns with ActiveRecord’s ability to work with a database.

This is a great thing! Worrying about things before you need can distract a company from focusing on the needs of the business.

But if you are in the process of breaking apart your monolith, you likely need to embark on some Conceptual Expansion. You and your team will need to carefully examine the different concepts that Rails compresses to see which are compatible with your architecture goals.

The following are some of the non-obvious tips we’ve found to expand some of the concepts that Rails compresses as you break apart your monolith.

Part III: 5 Concrete Tips

These 5 tips are things that we’ve found at Gusto to be important considerations as our Rails app grows. Take them with a grain of salt, they might not be appropriate for your situation. As always, discuss these with your team before enacting them outright.

As your code base grows, question Rails bidirectional relationships. For many of us, wiring up bidirectional relationships is muscle memory.

Here’s how that might look for a simple Company and Employee relationship.

But we’ve found ourselves asking if we really need relationships in both directions.

Although we are not importing and exporting code, we still draw dependency graphs with every line of code we write in Ruby and Rails.

Each bidirectional relationship creates a circular dependency. Circular dependencies can flatten your layered architecture and muddy your object graphs. It’s for this reason we ask if we keep our object graph simple by omitting some bidirectional relationships.

When traversing the edges or arrows between domains or layers of your application, use Value Objects.

Let’s take a look at an example:

Here we have a simple service class, CompanySignedUp, that handles an operation of what to do after a company has signed up for our app. This looks like a pretty normal service class in a growing Rails app.

This service class sends off an email using CompanyMailer and tracks some stats via StatsTracker.

But we’ll find that each of these 3 classes, CompanySignedUp, CompanyMailer, and StatsTracker are all coupled to the shape of company. This means that should the Company class ever change, we will need to make changes in 3 parts of our code. This is Shotgun Surgery.

So instead we want to “bail out” of our rich objects into values at the right times. At Gusto, we’ve found the sooner the better, generally.

Here we want to peel off just the values that CompanyMailer and StatsTracker need to do their job. In this case, we only need user_first_name, email, and company.id.

Now should the shape of company ever change, we only need to make a change in a single class: CompanySignedUp.

Callbacks are a crucial part of Rails. They give us an expressive interface to perform an action after another. Unfortunately, Rails callbacks can flatten layered architecture.

Let’s see how:

In this code we have a simple callback on a Company class that sends an email using CompanyMailer. (Let’s not worry about the fact that sending an email should be a background job here.)

If we look at the implicit dependency graph that we’ve drawn here, we’ve got a circular dependency. Company knows about CompanyMailer, and CompanyMailer knows about the shape of Company. These circular dependencies are exactly what makes an app difficult to maintain.

If you zoom into the Ball of Mud, you’ll see thousands of these circular dependencies. Callbacks make it very easy to create these circular dependencies.

Rather than reaching for callbacks, we look to use simple service classes instead to encapsulate behavior. Here, we’ve moved the the logic that was contained in a model and a mailer into a CreateCompany service class that wraps our behavior.

If we now look at the dependency graph we’ve drawn, we have another node! But we have also eliminated the cycle.

We’ve found that it’s better to introduce a new node into your dependency graph than to introduce a cycle when designing your applications. This keeps the structure of the application simpler and easier to change.

Hopefully the little story above convinced you: Extract Services first and Applications last!

Finally when breaking apart your monolith: move slowly. For Gusto, extracting a single service required hundreds of Pull Requests over several months. It takes time and trust in your team.

Part IV: Wrap Up

Breaking apart a Rails monolith is a risky endeavour. We derisk the activity by constantly breaking down our work into the smallest possible units. We set a vision and set sail. We trust our teams to do the right thing, and we have buy-in from the business.

Thanks!

References

Bernhardt, Gary. “Boundaries.” 2012.
Bernhardt, Gary. “Functional Core, Imperative Shell.” 2012.
Evans, Eric. “Domain-Driven Design: Tackling Complexity in the Heart of Software.” 2003.
Fowler, Martin, et al. “Refactoring: Improving the Design of Existing Code.” 1999.
Feathers, Michael. “Working Effectively with Legacy Code.” 2004.
Hickey, Rich. “Simple Made Easy.” 2011.
Scott, James C. “Seeing Like a State.” 1999.
Searls, Justin. “My Favorite Way to TDD.” 2017
Spolsky, Joel. “Things You Should Never Do, Part I.” 2006.

Special thanks to Noa Elad, Matan Zruya, Sihui Huang, Natalie Wong, Karlo Hoa, Amelie Meyer-Robinson, and Quentin Balin for providing feedback on early drafts of this presentation.

Another thank you to Goruco for having me out to present!

An important note: Switching from an in-process method call to something that uses the network introduces new failure scenarios. Do not forget to accomodate for those! ↩