While in college, I was lucky enough to attend a conference called the Future of Web Apps in Miami in February, 2008. The conference featured a wide range of speakers, but I most enjoyed a talk by Cal Henderson. He was between Flickr and Slack, and was giving a talk summarizing his upcoming Building Scalable Web Sites.

During the talk, Cal discussed a bit about Continuous Delivery which was a relatively new concept at the time.

What stuck with me we a single slide in which he talks about “The Teeth.” I find myself referencing that one slide many times throughout the year. This blog post is an homage to that single slide in his talk at FOWA 2008 Miami.

I will try to summarize it for you.

Customer Value over Time

When building software products, whether they are web apps, desktop applications, mobile apps, or server appliances, they only survive if they deliver value to your customer. Does the customer receive more benefit that the cost?

The web changed the process of building signficantly. Rather than waiting for a car to roll off the line or a CD-ROM to finish installing your latest video card drivers, it was just there. All the time, every day. It was there and it was new. The idea of versioning a web site seems so quaint today.

[ TK: Empty illo of the Value vs. Time graph ]

So this is the world that we play in as software developers. Time is the most precious asset. If we don’t move fast enough, our competitors will get there first. Speed is a competitive advantage. Most aspects of software can be reduced to “How quickly can we deliver customer value?”

A Tale of Two Processes

Developing software effectively is always a careful balance between two competing ideas: “Do we incrementally change the product in place?” or “Do we blow the thing up and rewrite it?”

[ TK: 2-up graphs of Value vs. Time, (1) is Incremental, (2) is Rewrite ]

Blowing the whole thing up and rewriting it in place is listed as one of the things you should never do, according to Joel Spolsky. The problem with a rewrite is that you’re often subjected to “second 90%” of the work. A rewrite is not re-integrated into the original system until it’s “done.” Large rewrites are dangerous and risky. But what if it’s just a smaller, known component in a larger system? Is the rewrite from scratch okay?

In the other school lies incremental change. Set a vision and iterate toward it. Ship as often as possible. By shipping as often as possible, you constantly pay the price of integration. Should company politics get in the way or the project run late, you at least delivered some value to the customer. It does not matter how slowly you go as long as you do not stop.

The Important Thing About Teeth

After laying out this argument in his original talk, here is where Cal spoke about teeth. “The problem with teeth is that the larger they are, the more likely they are to break the skin. Keep your teeth small.”

[ TK: Image of a face drawn over the graphs, making the step functions look like teeth of varying menace ]

Breaking the skin can be taking down production, shipping regressions, or something else equally catastrophic that requires a sudden unship.

Failure Modes of Menacing Teeth

There are many failures modes to a project with large teeth. Let’s enumerate a few of them:

  1. The project may never ship
  2. The project gets canceled
  3. The project destroys morale, or becomes a Death March
  4. The project becomes unsustainable due to expectations of parallel development
  5. The project may ship and then need to be rolled back
  6. The project may ship but be so foreign when compared to the original that it doesn’t get used

A modern mouth is full of many different types of teeth, each with their own distinct purpose. When we get bitten by our projects, often times it’s by a combination of different teeth and not just one.

[ This area is wishy-washy and unrefined. It might need to be a blog post on its own. ]

The project may never ship

The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time. —Tom Cargill, Bell Labs

Projects with long times between development and delivery are often subject to the “second 90%” of project estimation. Once you’re done with the first 90% of development, you have the second 90% of integration.

That second 90% was not accounted for in estimates. The second 90% is not budgeted for. When the second 90% shows up, there are some difficult discussions as to whether the project should have been undertaken in the first place.

Although many developers practice parts of Continuous Delivery, we often find ways of sneaking around that. Yes, we deploy our code daily but we use feature flags.1

Sometimes the second 90% is so long that the project never ships and the company goes under, like Netscape.

The project gets canceled

Because of the second 90%, the project will sometimes get outright canceled. The engineers working on the project have been scheduled to roll off onto a higher priority project. Them’s the breaks, and the team just spent the last few months building something that never makes it in front of customers.

Talk about a morale killer.

The project destroys morale and/or becomes a Death March

As the toothen project enters the second 90% of development, it can be tough to keep the team together. Signs of burnout among the team start to appear, and what seemed like a project that would be shipping soon now has no end in sight.

This point is a good time to take a nice vacation, because there’s still plenty of work left to do.

The project becomes unsustainable due to expectations of parallel development

Legacy systems are funny. No one likes them. The word legacy evokes the outdated, the crufty, the frustrating.

But that legacy software exists for a reason, otherwise you’d just delete it.

There’s a good chance that that legacy software is the thing that is paying your paycheck. This leads to an interesting property in many companies where the least-useful products for the company happen to be built in the newest technologies.

When attempting a Rewrite, this becomes a problem. There’s a good chance the existing legacy system has customers and has bugs. Someone has to shovel the coal into the furnace. As you’re developing the New System, maintaining feature parity becomes an impossible task. The lines between a new feature and a bugfix in legacy systems are incredibly blurry. As you maintain the legacy system, you painstakingly duplicate the work into the New.

How does this look to the customer? It looks like you’re moving at half-speed since your team’s time is split between two projects.

The project may ship and then need to be rolled back

The Rewrite requires a lot of overhead. Through parallel development tracks, forward-ported bug fixes, and long timelines, it can be difficult to make sure that there are no regressions in the eyes of the user. Did we remember everything?

Every few Rewrites, the team forgets something big. That one feature that was deprioritized for release turned out to be important to a lot of customers. So important that you need to perform a sudden rollback. Before rolling the Rewrite out again, you need to implement that thing-that-you-now-know-is-important.

When you ship a Rewrite, wait a few weeks before celebrating.

The project may ship but be so foreign when compared to the original that it doesn’t get used

The early development of Facebook was fascinating. The world was just switching gears from boxed, major-versioned software to iterative products. This crossroads led to some interesting product releases.

Once every 6-12 months, you’d log onto Facebook and everything would be totally different. What was once a right-hand nav was now on the left. The Wall as a large <textarea> would be gone. There is now a thing called a News Feed and it’s the only thing you need.

The user base would collectively revolt. “Join my Group to send a message to Facebook that we DON’T want the News Feed.” Or, “Like this page to tell Facebook that this new design SUCKS!” Large releases colliding with the web made for sudden and extreme changes often.

Over time, Facebook developed a more graceful way of rolling out changes. “Opt in to our new News Feed design to get the latest design!” a polite banner would read. The users were once again in control. We decided when we wanted to make the trip to CompUSA to pick up that box and slide the CD-ROM into our 4x drive.

These days, the process of rolling out features at Facebook is so sophisticated it likely cannot be comprehended by a single person. The teeth grow slowly.

Conclusion

Keep your teeth small


Special thanks to X for providing feedback on early drafts of this post.

  1. This is not a knock on feature-flagging in all cases, but just a warning for times when it’s used to delay true integration.