While undertaking a substantial refactor of a core piece of the system at Gusto, our team came up with an interesting practice. We call them Test Vices.

A Test Vice is a an exhaustive set of Pinning Tests used to ensure confidence in changes being made, and meant to be thrown away once the refactor is done. A Pinning Test is “an automated test that locks down the behavior of existing, otherwise un-tested code.” A Test Vice can be thought of as many Pinning Tests.

A Test Vice is much like traditional Pinning Tests except it is meant to be exhaustive as a means of safety. This is best used in legacy code that is not tested or only partially tested. The goal is to bring the code up to 100% coverage before changing it. A Test Vice will not provide Design Pressure or other similar feedback that normal test-driven development might.

There will be some pieces of code where writing a Test Vice will not be practical.

Using a Test Vice in a refactoring flow is pretty simple, and we’ve gotten pretty good at it:

  1. Create your Test Vice.
  2. Assert your Test Vice works by changing the source code to ensure that any change to the program is caught by the vice. We usually will incrementing a random integer or adding an extra character onto a string. If you can change the program and no test fails, tighten the vice.
  3. Delete and replace the code within the vice, paying special attention to the Deletability and minding the Design Pressure of the new code being written. You will generate new tests in the process, separate from the vice.
  4. With the new production code, corresponding new test suite, and the existing Test Vice, deploy the new code.
  5. Finally, remove the Test Vice.

This practice is extremely conservative and may only be useful in a handful of applications. For Gusto, we deal with payroll and health benefits, so regressions must be avoided at all costs. We employ Test Vices to almost entirely eliminate the risk of the changes we make.

In Practice

Let’s take a look at an example, using a modified version of the frequent flier builder from Count the Contexts blog post.

That looks sufficiently fabricated for our purposes. For the purposes of this example, let’s assume that there are zero tests for this class.

Now let’s take a quick walk through this code to make some observations:

  1. There’s some conditional logic in here, but nothing too hairy. Thankfully, no loops.
  2. We have a few side effects, namely logging and sending an email to the company admins to signal that we have a new frequent flier sign up.
  3. We return a hash.

First, let’s start off by sketching out the things we’ll want to test. For these examples, we’ll use RSpec.

We’re starting out here by just getting our method signature dialed in. Importantly, we want to use doubles (or later instance_doubles) to model our positional parameters. Why doubles? The purpose of a Test Vice is to be strict and doubles are just that: they will throw an exception and fail the test if we do anything unexpected.1

Next up, let’s start filling out some of the things we’ll want to test by just filling in some of the context blocks:

This looks like a pretty good sketch of the tests that we will want to begin to fill in. There is one thing that I’d like to call attention to here, and that’s

it 'logs a message'

You might read this and ask, “Are we really going to test the logging of a message?” The answer: Yes. Creating a Test Vice is a brute force operation and we do not want to begin creating and maintaining a list of caveats.

Everything gets tested. Who knows of a downstream system is expecting us to log this message exactly so that it can deliver some business value. (Yes, I’ve seen that before at previous gigs.)

Now let’s draw the rest of the owl:

This isn’t going to win any awards, but it exhaustively tests the previously untested FrequentFlierCreator. We now have something work with as we try to discover a better structure to this code and achieve better modularization. These test vices can get pretty gross with dozens of lets, excessive stubbing, and so forth. Take comfort in that they are made to be deleted.

An interesting thing about a Test Vice is that you use the existing code as the backstop. While writing a Test Vice you do not change the existing code. This may require you write some wacky assertions. Here’s my favorite assertion from a Test Vice a few years ago:

expect(response.status).to eq(500)

This is a test you would never write during a normal test-driven development session. But with legacy or under-tested code, we sometimes have to do weird things. Do not be afraid to add testing of log messages or assertions of wacky behavior to your Test Vice.


Creating a Test Vice can be a pain, especially as you experiment with the practice. It does pay off quickly, though. The best feeling of using a Test Vice is deleting it.

We achieve a high level of safety by being able to operate 2 different styles of tests for the same production code. As you refactor your code, your Test Vice may become an integrated test while your new suite becomes a series of collaboration and unit tests. Once our new code is safely in production (and not kicking off exceptions), the Test Vice is safe to delete.

Using Test Vices, we have been able to make substantial changes to existing code that processes payroll for a good chunk of the United States without any regressions. We methodically follow the steps above to remove as much risk as possible from the process of changing code.

Hopefully you can find a reason to use a Test Vice on an upcoming project.

Good luck!

Do you employ Pinning Tests or Test Vices in legacy code you refactor? Let me know on Twitter how your team approaches this problem.

Special thanks to Iheanyi Ekechukwu, Justin Duke, Sihui Huang and Sam Soffes for providing feedback on early drafts of this post.

If you’re interested in receiving blog posts like this regularly to your email inbox, join hundreds of developers and subscribe to my newsletter.

  1. Test doubles are great way of making sure you only use the surface area of an object you expect to use. They really force you to keep which attributes or methods you depend on well-defined. They work well with London-school, top-down TDD.