At Gusto, we’ve been knee-deep in a substantial refactor of our system for running payrolls.
Running a payroll requires taking several different inputs such as how much an employee should get paid, where did they work, how much did they work, how much taxes should they pay, how much taxes have they paid this year, and so on and so on.
As a company that offers a payroll service, keeping this piece of the system in tip-top shape is important for the business. Customers love Gusto for its simplicity and speed when it comes to running payroll.
Over the years, this system grew beyond its original mandate. Rather than just serve payroll for one state, it now serves them for all 50 states and the District of Columbia. Although customers love our payroll, internally new engineers had a difficult time understanding the code and making changes safely. This system needed a tune up, so we embarked on a sizable refactor.
Because the process of calculating what you need for a payroll is one big formula, we set the goal of making this system “more functional” as in functional programming. We wanted to take the process of calculating a payroll and make it one big stateless operation.
The server-side code at Gusto is written in Ruby, a language usually known for its object-oriented and metaprogrammatic roots. Nonetheless, we wanted to integrate some more functional concepts into our code in the hopes of increasing the system’s safety and clarity. The result has been maintainable code that is easier to reason about and safer to change.
Embracing the PFaaO
Ruby is an expressive language, but it does not lend itself to some common functional practices. Although Ruby allows for closures and first-class functions via
Procs, one does not see many
Procs passed around as objects in idiomatic Ruby.
Throughout our work, we discovered that you can create expressive interfaces with clean internals by embracing both OO and functional aspects of Ruby.
A pure function is a function without observable side effects that always returns the same value for a given set of inputs. That means no talking to the database, no modifying the state of other objects, no accessing the system clock, etc. When we write a PFaaO in Ruby, we want to build an object that has no side effects.
A simple PFaaO might look like the following:
class PayrollCalculator def self.calculate(payroll) new(payroll).calculate end def initialize(payroll) @payroll = payroll end private_class_method :new def calculate PayrollResult.new( payroll: payroll, paystubs: paystubs, taxes: taxes, debits: debits ) end def paystubs # ... end def taxes # ... end def debits # ... end end
There’s quite a bit going on here, so let’s break it down bit by bit.
First, our class has only one effective public interface:
PayrollCalculator.calculate. Because we’ve declared the constructor private using
private_class_method :new, the instance method
#calculate is effectively private.1
This means that all of the other instance methods we declare are implicitly private, even though there is no explicit
private block within this class. Because there’s no way to
.new up an instance, there is not a vector to call any instance methods.
Our method only has one public interface and its designed operation is effectively stateless, therefore we only need to exercise one interface in our tests. Put some data in, assert that the data coming out is what we expected.
Referential Transparency for Free
In our above example, let’s say that the process of calculating taxes is expensive from a time perspective.2 Thus, we want to make a time/space tradeoff to consume more memory to minimize the number of times we need to compute taxes. In our example, calculating both
#debits will require the result of
Now because each of these private methods is a pure function, we have referential transparency. This means we can replace a method and its parameters with its return value. Think of it like algebra: Given the function
f(x) = x + 5, you can safely replace any occurrence of
f(2) with the value
What does this mean for a Rubyist? Free and safe memoization:
def paystubs calculate_paystubs(taxes, ...) end def debits calculate_debits(taxes, ...) end def taxes @taxes ||= calculate_taxes(@payroll) end
Memoization is a form of caching, and can be fraught with issues if the memoized value does not actually come from a pure function. But because we make everything within the PFaaO pure, we can safely memoize this method call.
This is interesting because it looks like this class is no longer stateless: it now assigns local values. However the only interface is the single
.calculate class method, each instance of our PFaaO is single-use. Any intermediate state can never be accessed by externally. Because this cached state is not observable externally, our function is still technically pure.
Much in the way a developer can abstract synchronous and asynchronous behavior, you can do the same with functional purity. Any local state changes are irrelevant in the lifecycle of the PFaaO. These local state changes are not observable from the outside world.
As I’ve grown in my career, I have become less interested in how software is written but how it is maintained. Software maintenance is the blessing and the curse of any successful project: Congratulations! You have a business with lasting value. Our condolences! You must now pay for all of your mistakes. Nonetheless, it is always preferred to have a business that exists with technical debt, than to have a bankrupt company with a pristine code base.
PFaaOs in Ruby are great because they are easy to maintain. Not only are they easy to test, but they are predisposed to healthy growth.
What do I mean by that?
Let’s again take the example of our
#taxes method. Early in Gusto’s history (back when it was still known as ZenPayroll), we only offered payroll services in California. Thus, we only needed to worry about payroll taxes for California.
In the grand scheme of things, California is a simple state when it comes to payroll taxes. Our taxes method might have looked like nothing more than the following:
def taxes federal_taxes(@payroll) + california_taxes(@payroll) + local_taxes(@payroll) end
Now let’s say we expanded into a new state, New York. Now our method grows a little bit:
def taxes federal_taxes(@payroll) + california_taxes(@payroll) + new_york_taxes(@payroll) + local_california_taxes(@payroll) + local_new_york_taxes(@payroll) end
As we expand into every state,3 this method will grow to be quite large! Furthermore, each of these methods adds to the length of our
PayrollCalculator class. Without constant gardening, the class could become difficult to understand.
But because each of our methods within a PFaaO is itself a pure function, we are able to extract classes as we see fit and make each one a new PFaaO. We can safely replace our growing methods with new PFaaOs:
def taxes PayrollCalculator::Taxes.calculate(@payroll) end
As we tease apart these different PFaaOs, we also get a much better idea of the input requirements for these service classes. Our
@payroll is a large parameter object, and each extracted PFaaO probably only needs a subset of its data.
So we can get away with something like:
def taxes PayrollCalculator::Taxes.calculate( @payroll.only_pay_and_location_data ) end
Here we assume that the
Payroll#only_pay_and_location_data returns a slice of the total data within the instance as a new Value Object. This Value Object represents only the data required to calculate the taxes part of running a payroll.
Data is Immutable by Default
Another important ingredient for scalable PFaaOs is the requirement that all data be immutable by default. This is a drastic change from how most folks traditionally write Ruby.
Every time you reach for your
=, you’ll need to replace it with a
#put. Rather than modifying objects in place, you will get used to returning new copies with new values. (Hamster, which provides great immutable data structures, can help you from having to hand-roll FP functionality.)
What does this mean for Rails? It will often mean creating functions or classes that take
ActiveRecord objects and convert them into immutable value objects. For us, we carve out these value objects into the namespace of what we’re doing. For example, here are the two representations of a payroll in our system:
# app/models/payroll.rb class Payroll < ActiveRecord::Base end # app/services/payroll_calculator/payroll.rb class PayrollCalculator::Payroll < ValueObject end
ActiveRecord version of a payroll represents the data that lives in the database. It is a superset of the data required for actually running a payroll. Although they have the same name, they do not have the same attributes. For example, the
ActiveRecord version of
Payroll will have a
processed_at attribute, whereas the
Payroll that lives in the calculation domain does not.
In the words of Domain-Driven Design, each namespace here is a different Bounded Context. We implement adapters to take
ActiveRecord payrolls and turn them into
PayrollCalculator payrolls, and vice versa.
The upside of this is the same that you might see in any other large system with well-defined abstractions; changes in models don’t cross domains. In our example, we can change the structure of the
Payroll in our database without needing to change the calculation code. We would only need to change our adapter. Furthermore, this context is entirely separate from the machinations of Rails. We could easily and safely pull this into its own gem or a separate service entirely.
ActiveRecord objects be parameters to our calculator, adding or removing columns from the
ActiveRecord objects could cause a series of cascading, painful, and dangerous changes.
For young Rails apps, this level of indirection is overkill. As apps grow and multiple teams begin contributing to the same application, Bounded Contexts like these are necessary.
We’ve been slowly refactoring our payroll calculators toward this model and use it to safely process upwards of $1 billion per month.
The results have been remarkable: adding or changing payroll code is now a much safer operation. Because each change is much more isolated, a developer only needs to concern herself with the local implementation.
Although this post does not cover it, testing PFaaOs with immutable data is a breeze. We find ourselves performing less setup for each method and class. Our tests remain fast as they do not hit the database.
It’s not all sunshine and rainbows, though. This approach does result in a larger volume of code. My rough estimate would peg it at about a 1.5x - 2x increase in code volume. Some developers dislike the the sprawling nature of the many PFaaOs that result. Although the total lines of code will increase, you should develop a better understanding of the data requirements of each Bounded Context. Put another way: you don’t need to pass around whole
ActiveRecord objects, but just small bundles of their attributes.
Before embracing this completely, discuss with your team to set up a few ground rules. We typically shoot for about 100 lines per class, but your team might decide on something different. Make sure to get on the same page and agree that your app is at the size where it might benefit from this style of thinking.
For some teams, the extra layers of abstraction between
ActiveRecord and doing interesting things with the data might seem like overkill. In many situations, it will be. Again, I encourage you to have a healthy discussion with your team to decide if the benefits of this approach outweigh the negatives.
For us, we’re employing it everywhere appropriate. Give this pattern a shot and let me know how it goes!
If you’re interested in receiving blog posts like this regularly, join hundreds of developers and subscribe to my newsletter.
Keen writers will know that nothing is ever really private in Ruby. There is always
At Gusto, calculating taxes is expensive! Did you know that there are more than 6,000 payroll taxes within the United States? Each one may or may not need to be applied for a given payroll, based on the different parameters of the payroll itself. ↩
Today, Gusto provides payroll services in every state including D.C. with some of the lowest error rates in the industry. ↩