Why Finish Technical Migrations?

“Why finish the migration?”

This is a question I’ve been getting asked more at work. Like many software companies, we have many technical migrations in various stages of completion. We have a few that used to have momentum, but have fallen into a stasis of sorts due to re-orgs, changing priorities, or lost interest.

But I don’t have a good answer to the above question. I thought it was a given that migrations should be finished, but I’m realizing that this is not a commonly-held belief. I have also realized that there may be times to deliberately not finish technical migrations.

This post is designed to put my own thoughts down on paper to answer the question “Why finish?” Hopefully, you find it useful too.

Why Disagree?

Behind every technical migration lies a reason. Usually it’s to pay off technical debt. We need to migrate off of an unsupported technology, customer growth is pushing the boundaries of existing technology, or orders-of-magnitude cost savings are possible with something different.

However, there are certain yellow flags of technical migrations as well. Using technical migrations to address problems within a team’s control, like improving the developer experience, tend to raise my eyebrows. The best migrations have some quantifiable impact, and “improving the developer experience” can be more a measure of taste than anything else.

Every technical migration could be thought of in benefits, cost, and total value. These graphs explore how different projects look over time. Linear provides constant payback, and is slightly net positive. The second column is a project with a high upfront cost and fixed ongoing costs higher than the value it delivers. The “No Regrets” is a project that pays back quickly and superlinearly.

Underestimation of Cost

Most unfinished technical migrations share common properties. One of those properties is a drastic misunderestimation of cost. For example, It turns out switching from MySQL to MongoDB was much more difficult than we anticipated. We forgot how many stored procedures and triggers we had.

In the best scenario, this cost underestimation can be attributed to an “oopsie.” Tribal knowledge means we forget exactly how intertwined we were with a specific technology. Uncoupling ourselves from specific technologies through higher-level frameworks might help improve the separation of layers in our stack. That’s a nice upside.

But many fundamental changes can take years. Gusto is on our 6th year of a Backbone to React migration. It took us 4 years to migrate from sprockets to webpack. What we thought would be quick affairs have taken a good deal more time than we had planned.

Technical migrations are not unique from other forms of engineering work: We always seem to underestimate them.

Misalignment on Cost of Unfinished Business

But so what? We’ve got 2 ways of doing things. Developers just need to learn Way A and Way B. How bad could it be?

This is where I’ve stopped pressing the issue in the past: “How bad is it really?” The business still exists, we’re collecting paychecks, and customer growth is on the right track. But we can do better.

We might require many different vantage points to fill in the whole picture. An engineer might not realize how much effort it is to change onboarding material. A manager or VP might not understand that their engineers are moving slowly because they need to do everything twice.

The technical debt metaphor can also get some exercise here. Specifically, what happens to a borrower when the cost of servicing their debt matches their income? If I make $6 per year but have a $100 loan with 6% annual interest, I have zero take home pay. I will be forever $100 debt. It is unlikely I’ll ever be able to take on an additional loan.

When the carrying cost of unfinished technical migrations stack like this, we limit our options for the future. We may effectively be saying, “These are the last technical migrations we are planning to do.”

Beyond hand-wavy financial metaphors, unfinished technical migrations have more precise costs. Each will be explored in their own section below:

Security vulnerabilities
Decreased end-user performance
Decreased developer productivity
Confusing documentation and onboarding material

Security Vulnerabilities

Counter-intuitively, in my career I’ve seen a fair share of security vulnerabilities where the exploits are through new technology paths.

This makes some sense; the organization has not yet internalized the security defaults of the new technology. In Rails’ ActiveRecord, one generally does not need to worry about SQL injection. A newer ORM might be less-well-known, less-widely-used, have different defaults, or make different tradeoffs. It takes some time to build up organizational knowledge around the sharp edges and guardrails. Until that knowledge exists, new security vulnerabilities may be introduced.

There might also be an implementation cost of integrating new technology with existing primitives. Much like a Big Rewrite, there’s more knowledge captured in the code than anyone might realize.

Decreased End-User Performance

Every new layer of technology we introduce creates some overhead. This overhead is part of the tradeoff calculus. We don’t write in assembly because we want our programs to be multi-platform and expressive. We choose JavaScript frameworks so we don’t need to care as much about the idiosyncrasies among browser APIs.

But each layer comes at a cost, usually in terms of performance. Most of the time we happily pay these costs and things remain “good enough.”¹ But unfinished technical migrations can harm end-user performance in ways that are difficult to predict and measure.

When there are redundant technologies occupying the same layer of the stack, the interop between them becomes complicated and usually slow. Sometimes the different technologies possess different operational semantics, getting the worst-of-both-worlds during the long cross-over period.

The result always trends towards surprising. We take the inherent complexity of each technology individually and then find the cross product. We might accidentally create N+1’s that are difficult to identify and even harder to remove.

Exploring more interesting payoff curves, we see a few familiar shapes. Many technical migrations might have high upfront costs with payoffs over time. Other technical migrations might not reap their true benefits until the migration is completed, like the final cutover to a new database. Other migrations might have immediate benefit but growing costs over time, slowly erasing the initial value over time.

Confusing Documentation and Onboarding Material, Decreased Developer Productivity

Medium to larger engineering organizations have some technical onboarding component: bootstrap your dev machine, learn about some of the infrastructure primitives, learn how to open Pull Requests, and so on.

Nothing is stranger than spending a day learning something only to realize that that thing is one half of a technical migration. Either it’s:

Deprecated and “on it’s way out” but still accounts for 98% of the way to do things, or…
It’s brand new, the organization is figuring out how it works, and using it will require an adventurous spirit.

This isn’t just the cost for new engineers. It will happen every time an engineer changes teams or shifts roles.

Unfinished migrations complect what would otherwise be a single word answer. Rather than the following dialog:

“How does the front-end retrieve and organize state from the backend?”
“Redux.”

The conversation becomes:

“How does the front-end get state from the backend?”
“Well, Redux is deprecated. We’re moving to fetch with local state in components. But there’s not really a place where we’re using it entirely. If you’re in [legacy part of the code base that powers 80% of the business], that might be using our custom fork of Ember Data…”

Somewhere around that time, a head explodes. 🤯

The Benefits of Finishing Migrations

Finishing a technical migration is a simplifying force. Simplicity in software can be difficult to appreciate if you have not experienced it before. Not all software development needs to get slower and more difficult over time.

Entire categories of questions melt away. Org-wide, there are fewer “Where’s the documentation for that?” or “Is there a document outlining the process?”

Instead, a code base with finished migrations answers its own questions. How do we enqueue background jobs? Well, there’s only one way and that must be it. Copy-pasting in this world is safe, accepted, and often the best way of doing things.

Applications are secure by default. Mistakes are harder to make.

The application sings, even over spotty WiFi.

Handling Unfinished Migrations

Shit or get off the pot.

—A grandfather, somewhere.

Unfinished technical migrations are albatrosses around the necks of engineering organizations. They make everything a bit more complicated and engineers a bit grumpier. They are carrying costs.

We have 3 choices when it comes to dealing with unfinished migrations:

Roll forward
Roll back
Live in stasis

While living in stasis is the easiest choice to make—there are no difficult conversations, no budget reallocations, etc.—try to make a crisp decision. Staff a team to finish or unwind half-done migrations. If a migration is 2% complete, price out finishing the remaining 98%. Would the effort benefit from dog-piling, or is it something best handled by a small team over a few months? If the migration is only 2% complete, would it make sense to can it and go back to the old way of doing things? If a technical migration lingers for more than a few months and the business still exists, it’s possible the migration was never needed in the first place.

We roll all of our learnings into our next decisions. For the next technical migration, how can we ensure that we better calculate the costs and benefits of the migration?

Incentivize Finishing Migrations

To make sure you don’t find yourself in this situation again: incentivize finishing migrations. An exercise here would be to take engineers’ performance reviews and redact all mentions of specific technologies.

Thus, the performance review bullet point:

Introduced SQS as a background job processing mechanism.

Would become:

Introduced SQS as a background job processing mechanism.

That doesn’t tell me much. Instead, I need to add another sentence describing the impact of the work:

Introduced SQS as a background job processing mechanism. This change has decreased background job-related alerts by 90%, improved job throughput by 500%, decreased job latency by 123%, and grants enough headroom to 10x the customer base.

The goal here is to engage honest self-reflection about the consequences of our actions. If the second sentence is difficult or impossible to write, we should not expect a good performance review.

For those extrinsically motivated, this should help align individual engineers with global objectives. Over time, we might become more thoughtful about which migrations we choose.

Conclusion

As companies grow, the number of ongoing migrations will grow too. Some migrations may outlast the average tenure on your engineering team. Each unfinished technical migrations has carrying costs and restricts the business’s ability to make further change.

Hopefully this post equips you with the vocabulary and tools needed to navigate the complexity of unfinished technical migrations, as well as provide entry points into conversations around technical migrations.

References

Special thanks to Kristin Smith, Stephan Hagemann, Sam Soffes, and Matt Wilde for reading early drafts of this post and providing feedback.

Boring technologies can often go much further than we think. There is a Rails monolith now powering an ecommerce experience 1/4th the size of Amazon. ↩