Exercises to Deepen Your Understanding of Legacy Code

Most software engineers will work in a legacy code base for most of their career. Unless you’re on a founding team or starting a greenfield project, you’ll be dealing with code someone else wrote. In a year or two, you’ll forget how you wrote your code as well.

Why try to understand legacy code at all? You want to understand it because your job—in some form—depends on it. Whether you need to blow it up and replace it with something new or tweak it slightly, there’s still plenty of learning in even the worse piece of spaghetti code. A deep misunderstanding of existing code might result in a Big Rewrite.

The following are a few strategies I use to improve my own understanding of code that I didn’t write or that I wrote long ago.

Find the Original Author

When diving into a new domain of the code base, I look for the historians. I want to find folks who can share the oral history of the project.

Hopefully you’ve got a good team and they write good commit messages. Or, they use something like GitHub’s Squash and Merge to ensure (slightly) better commit messages and clear paths to the history.

The code and its version-controlled history is only the first part of the story. Understanding the context and the purpose is the next step. Oftentimes, the class names will decay or the comments will drift. Is the complexity of the code due to tight timelines or a shallow understanding of the problem? Or maybe a little bit of both?

These oral histories help guide any changes you might want to make and will prevent you from making the same mistakes as the original authors.

It’s for this reason that I try to over-document my own process: I will likely be wondering “what was I thinking?” when I see git blame returns myself.

Remove It, See What Breaks

One of my favorite things to do is to comment out some code in question and then run the tests. It’s the quickest and most surefire way to let me know if a Test Vice or a few Pinning Tests will be required.

As I open a class or a section of code for the first time, I’ll sometimes create a branch with code commented out. I push the results to CI.

While I’m waiting for the CI build to complete, I start reading the code itself. The CI failures help me build a better picture of where test coverage might be lacking. Do we have ample unit test coverage? Could we use another integration test or two?

Try to Write New Tests

My ability to write a good test tells me how well I understand a problem. If I find myself struggling to write a test, it’s often because I don’t quite grok exactly what the class’s purpose is. Even writing a few simple tests to fill in any gaps help climb the hill of understanding.

Try to Refactor It

Much like ease of testing, you may not understand things quite as well until you try to refactor. Again, a strong test harness here will be important.

As you try to Extract Method or Extract Class, you might uncover some subtle but important retained state. This attempted refactor saved you a huge headache down the line.

There’s a long checklist¹ of “mechanical” refactorings I’ll try to run through on a piece of code. These refactorings often don’t make it to the Pull Request stage. They are for my own edification.

Only by exercising the code do you start to answer questions such as, what are its strengths? What are its weaknesses?

Toy with it in a REPL

My colleagues, Julia and Matan, are extremely good at this.

Whenever we encounter an important piece of code, they’ll put together some sanitized data pulls to feed through the subsystem in question. This often looks like a list of parameters, sometimes serialized to JSON, sometimes with expected return values.

They will then feed parameter sets through one at a time and observe them working. Doing this will often result in interesting regression tests: “Oh! Here’s an important case we don’t have a test for. Let’s write a quick regression test and land it on master before we continue.”

Through this process, we not only improve our existing tests with real-world values but we also deepen our understanding of the code at hand.

Conclusion

Reading code is one way to understand it. Earlier in my career, I often thought it was the only way. These days, I try to jump straight to playing with it.

By doing this, you understand what the code actually does. There might be a conditional case that is never really hit. Or maybe there’s a comment that is stale or just wrong.

Hopefully you can use some of these tactics the next time you see a piece of gnarly code.

Special thanks to Justin Duke, Matan Zruya, and Matt Lewis for providing feedback on early drafts of this post.

I’ll try to publish this list in a forthcoming post. ↩