Quantifying Technical Debt

In May of this year, I posed a question in a quiet channel in the Rands Leadership Slack. “How do you estimate the size of technical debt projects?”

Roy Rapoport, an engineering manager at Netflix, responded with the following video clip:

Gusto is a growing organization, with plenty of work still yet to do. During the early years, we accumulated a few chunks of tech debt as we went along. These pieces vary from something small like an extra layer of abstraction to a data model that no longer serves its original use.

In conversations and writing around technical debt, folks stretch the metaphor. We dive deep into the finance comparisons, we analyze its softer costs, or we just throw our hands up and claim it doesn’t exist.

This post will focus on quantifying technical debt projects, particularly the economic cost. The focus on economic costs is simple: To get others in your organization, whether it be your manager or her manager or someone in finance, you need to have a common language. This becomes especially important when discussing opportunity cost. How do you rank technical debt projects against growth projects? At a certain point, we need to quantify “Slowing down development time.”

As they say, money talks.

Everything Successful has Debt

First, some context. Having a discussion about technical debt is a luxury. Being able to talk about technical debt implies success. You cannot immediately blow up the product in question because people are using it and paying for it.

You have something successful and now the hangover is setting in. We did what to make the database survive load? Or, We used how many global variables in that subsystem?

To frame it in Kent Beck’s 3X terms, technical debt is a luxury of the Extract team. You are looking to optimize for the long-term.

It is better to have a company that exists with some technical debt than to have a dead company with a pristine code base.

Where to begin?

When dealing with estimating technical debt, you need to measure two things:

The cost of the debt today and in the foreseeable future (~6 months), sometimes called the interest
The price of paying off the debt, sometimes called the principal

We are interested in quantifying both the principal and interest to come to a decision of “Should we pay off this technical debt?” and more specifically “When should we pay off this technical debt?”

When estimating the ongoing cost of technical debt, I look for something that can be counted easily. This usually maps to one of the following:

Occurrences of Debt Symptoms in Code
Person-Hours
Money

In some ways, each one of these can provide a bridge to the next. If I can count the occurrences of debt symptoms in code, I can estimate how long it takes to fix one instance. I can then draw lines to person-hours and then to monetary costs.

As we bridge from one measurement to the other, we lose confidence. Dollar costs derived from static code analysis will be the least reliable numbers we can generate. Nonetheless, the goal here is to have a rough data point to weigh against others.

Let’s cover each one of these in a little bit of depth.

Occurrences of Debt Symptoms in Code

Recently, we’ve been playing with this one a bit recently at Gusto. If there’s a practice that we determine can be caught with static code analysis (for us that would be a Rubocop “cop” or an eslint rule), we follow the following steps:

Write the linter rule
Enable the rule
Whitelist all existing breakages of the rule with a linter pragma
Fix the breakages

This process is interesting for a few reasons. Most importantly, these linter rules are suggestions and not hard-and-fast rules. After completing steps (2) and (3), you have stopped the bleeding on the practice. Interestingly, (4) is optional. Once the bleeding has stopped, you can regroup to estimate how much (4) will take and when to tackle it. For now, you can rest easy that at least the problem will not be getting worse.

Linting is a double-edged sword. Stylistic linters are less useful than those that guide away from dangerous or deprecated practices. A linter that flags behavior that will make upgrading to the next version of React smoother is useful; one that enforces the number of spaces in a file is just taste.

This method of measurement is also inadequate when it comes to measuring higher-order levels of technical debt. You can’t write a linter rule to discover if your data model isn’t reflecting the customer’s needs.

Person-Hours

Measuring the person-hours for the interest payments is more difficult than counting instances of a deprecated practice. I find two different types of person-hours measurements: one larger-scale and one smaller-scale.

Measuring the larger-scale incidents of person-hours cost is painful. It’s most realized in consistently and utterly blown estimates. At Gusto, we can tell if our system has technical debt if senior engineers are unable to accurately estimates stories.

The smaller-scale person-hours costs are measured in seconds or minutes. This might be things like app start or compile times. If the app or test suite is started hundreds of times per day by each developer, a few seconds add up.

When we veer into measuring the cost of technical debt in person-hours, we veer into the territory of Lifehacker math. (“If I save 3 seconds every time I fold a shirt, I’ll have 4 more weeks of life!”) When aggregated, the math just doesn’t stack up. For the smaller-scale person-hours costs, I am more interested in the thresholds.

These thresholds are the limits of human attention. Specifically: how long can this thing take before the developer context switches to check Slack, Twitter, or Hacker News? My rough threshold is about 5 seconds. If something takes 5 seconds or longer, I round it up to a minute because it’s caused a context switch. So let’s say we have a developer that loves TDD and runs their test suite 100 times a day but each test run takes 15 seconds to start, passing our 5-second threshold. The math would be:

100 instances * 1 minute per instance = 100 minutes

This math is shaky at best but may suffice for back-of-the-napkin estimates. This example is also a bit extreme and doesn’t account for test run times as potential thinking time.

Money

Sometimes technical debt will manifest directly as a monetary amount. At Gusto, we have a few bugs or design flaws that we know cost us money. These flaws are usually so small that it’s easier to front the money than to dedicate effort to correct the deeper design flaws. (Think a $1 discrepancy for a single customer in a sparsely populated state.)

The direct tie to money will not be present in most software. Instead, you’ll need to back into the cost using the bridge of person-hours. To do so, I usually do some back-of-the-napkin math using person hours and a rough estimate of salary.

So let’s say we know we have something that costs 1 week of an engineer’s time per year. Let’s assume this engineer is being paid $100,000 per year and works 48 weeks out of the year. We’ll want to multiply the salary of the engineer by 1.5x to get closer to the fully-loaded cost of the engineer in the eyes of the company (employer-side payroll taxes, equipment, space, lunches, etc.) This amount may be lower for organizations at scale.

From there, our math is pretty simple:

$100,000 * 1.5x * 1/48 = $3,125

The numbers here are not designed to make a decision, but just to be piece of data to consider. Take these numbers with a grain of salt and only pay attention to the order of magnitude.

Chances are, the technical debt in an organization will apply evenly to an engineering organization. For larger organizations with many departments, this will hold less true. The technical debt may be more concentrated to certain teams. Thus, that 1 week of lost time in our above example could get multiplied by the total number of engineers in the organization or department.

Should We Pay It Off?

Once you have quantified both the interest and the principal, we can answer the question of “Should we pay off this technical debt?”

For the purposes of this example, let’s assume that we have something that is costing us 1 EngineerWeek per engineer per year (the interest). Let’s say we have a team of 10 engineers. Thus, interest payments cost us the following assuming $100,000 engineer salaries and our 1.5x multiplier:

$150,000 * 10 * 1/48 = $31,250 per year

So let’s say this problem costs the organization about $30,000 per year. We should also note that this problem will get worse if we hire more engineers.

Now let’s say that we estimated fixing the underlying issues would take a pair of engineers a quarter to address, or 12 weeks. So the cost of paying of the principal is:

$150,000 * 12/48 * 2 = $75,000 one-time cost

If we know this is going to continue to be an issue for the next 2 years, we should seriously consider paying this off because the $75,000 cost is a one-time cost.

But these figures also give us another useful tidbit: How much could we pay a vendor to fix this problem? Perhaps there’s a vendor that has a magical solution to this issue and they’ve quoted us $25,000 for a perpetual license. Assuming that the vendor is honest and integration efforts will be minimal (heh), the vendor could be a clear choice here.

Please keep in mind that this type of math plays fast-and-loose with the MythicalManMonth, assumes engineering estimates are mostly accurate, and does not attempt to account for any softer benefits like increased engineer happiness.

Failures of this Model

This model has at least one huge hole in it. This model fails to account for the most costly form of technical debt: design debt. This is when your domain model is insufficient for new use cases and requires an upgrade. Because the domain model is at the heart of the application and the business, changing it has far-reaching repercussions. Michael Feathers covers exactly this topic in his talk, “Escaping the Technical Debt Cycle.”

We have 2 common deficiencies that exist in parts of our payroll domain models at Gusto. Namely,

We delete or mutate data that should be immutable and
We have not effective-dated information that should be effective-dated.

When first starting out, these were not constraints on our data model. As we’ve grown and needed to optimize for auditability within the domain model, it’s not enough to log this information. These things need to become first-class citizens of the domain.

In some ways, these deficiencies do not need to be estimates because they block feature work. Case in point: we were not able to allow customers to schedule address changes until we applied the Temporal Object pattern to addresses within our system.

This technical debt was difficult to quantify in economic terms but becomes part of the estimation process for a new project. Design debt like this is usually tackled because it enables new feature work. For this reason, paying it off is a bit less happenstance than other diffuse forms of technical debt because it lies in the critical path of shipping.

This model also does not attempt to address the human costs of technical debt.

Conclusion

As we push the limits of the technical debt metaphor, we need to be able to tie it back to economic impact. Quantifying the impact of the interest payments vs. principal payments can help answer the question: Should we pay off this technical debt?

Nonetheless, this model fails to capture some of the most damaging type of technical debt: design debt.

Hopefully you find this post a useful addition to your toolbox.

If you have any ideas around improving this post or criticisms of it, I am all ears. Please find me on Twitter or send me an email.

Special thanks to Bryn Jackson, Oliver Hookins, Roy Rapoport, Ken Bantoft, and Eddie Kim for providing feedback on early drafts of this post.