This blog post is a reflection on the slide titled “What’s in Your Toolkit?” from Rich Hickey’s excellent talk, Simple Made Easy. The single slide contained some of the most programming wisdom I’ve seen in my career. This is the second blog post in this series, and dives into Imperative Loops/Fold vs. Set Functions.

It’s one of the earliest powerful things we learn in school: the for loop.

You loop through a list, array, or set of items, doing something to or performing a calculation on each one. All of a sudden, you feel the power of the processor. Rather than simple logic trees, you can now operate on thousands, millions, or billions of things in seconds. Achievement unlocked.

Yet a decade into my career as a software engineer, I rarely use a traditional for loop. Why?

This blog post will dive into the difference between imperative loops and set functions as Rich Hickey described them. Another term that I’ll use in this context is “collection pipeline,” coined by Martin Fowler.

A Dangerous Pretense

In previous posts reflecting on Simple Made Easy, many of the recommendations are ways of avoiding state in your program. “State complects,” Hickey says. This was explored in the posts State vs. Values and Methods vs. Functions.

As we discuss more complicated structures in our programs, they become more difficult to reason about.

The discussion between imperative loops compared to collection pipelines has its roots again in state. Specifically, which patterns encourage or discourage the use of state?

It’s this injection or assumption of state that we’ll use to explore imperative loops and collection pipelines throughout the blog post.

Imperative Loops

Imperative loops come in many shapes and sizes. They have evolved over the years. For those learning programming from the mid-2000’s on back, you likely saw a for loop with an incrementing variable:

// Dropping into JS to make a point
for (var i = 0; i < words.length; i++) {
  console.log(`Reading "${words[i]}"`);
}

We iterate through an array (or list or set) of elements, using the index i to keep track of progress. More recently, you might see a forEach-style of iteration. This type of iteration cuts down on the bookkeeping of the index variable:

words.each do |word|
  puts "Reading \"#{word}\""
end

Neat! Less code is better, right?1

These two loop styles are decidedly imperative. They crunch through each element in the array and do something. In this case, print a word. Printing words is not that interesting, so let’s invent a more complicated example for the remainder of this post. We want to write a program that tells us all of the unique emails from an array of users.

First, we will write it imperatively:

def unique_emails(users)
  emails = []

  users.each do |user|
    if !emails.include?(user.email)
      emails << user.email
    end
  end

  emails
end

Here we iterate through our users, carefully documenting an email if we haven’t seen it before. For the purposes of this example, let’s declare the performance characteristics of Array#include? out of scope.

Now this code is perfectly fine, but let’s make a few observations:

  1. The loop only does something interesting by inflicting state into emails.
  2. There’s some bookkeeping of declaring and then returning emails.

Let’s instead take a look at how this code might look with a collection pipeline.

Set Functions and Collection Pipelines

A Set Function comes from mathematics. Its input is a set (think array, list, or a first-class set) and spits out something else.

A collection pipeline is defined by Fowler as “a programming pattern where you organize some computation as a sequence of operations which compose by taking a collection as output of one operation and feeding it into the next.” A collection pipeline is a few set functions strung together. I often think of collection pipelines as using Array#map instead of Array#each (or your language or data structure’s equivalents).

Let’s see how this changes the unique_emails function above:

def unique_emails(users)
  users.map { |user| user.email }.uniq
end

Neat! A one-liner!

Now this example is a little unfair, because Ruby is exceptionally well-suited for operations exactly like this.2 Its syntax and standard library functions make writing code like this a breeze.3 This function is so terse, that you might even wonder if it warrants a separate method at all.4

This collection pipeline approach also eliminates plenty of intermediate state. We do not need bookkeeping with an increment variable and we don’t need a temporary array to hold our results. Although in such a small example, this intermediate state is not that big of a deal. As this method grows and remains unfactored, programmers have to maintain a mental map of the intermediate states of emails. “Alright, is this before or after we’ve done a check for the uniqueness?”

Collection pipelines and set functions (like Array#uniq in this case), increases the semantic meaning of our programs without reducing clarity. Each character in the code has a higher “density” of meaning. By having blocks or functions only be defined by their return values—keeping them pure—tracking the intermediate state by debugger or by mental map becomes much easier. In the above example, we can see where we “have the emails” vs. when we “have the unique emails.

We have once again warded off the spectre of complexity.

Back to Performance

I said we would ignore performance here. Let’s just say we lazily-evaluated our performance concerns.

Now in previous posts in this series, performance was not much to discuss. In the first post, we made the point that at some level everything becomes imperative because that’s how CPUs work. But loops and set functions are where we go from one-to-many.

This is where performance needs to be a concern when dealing with large amounts of data. The imperative loops will have difference performance characteristics than their declarative cousins. For example, imperative loops may require less memory for certain operations.

Nonetheless, this is usually the last thing you should optimize for in your code. If this is a concern, be sure to measure the results. Dynamic languages in particular do some mind-bending stuff, so even your declarative loops may be more performant than you think.

Conclusion

Prefer set functions and collection pipelines because they keep your programs simple. They cut down on bookkeeping and tighten up the code. Keep in mind the performance characteristics, but only when those become a problem.

Imperative loops open up Pandora’s Box because they must change state to do something useful. Otherwise, they would be a no-op. With side effects, we often never stop at just one. It’s not long before the imperative loop is doing many things and it’s hard to keep them all straight. Set functions and collection pipelines, on the other hand, can be pure. Pure functions and components make for simpler, easier to maintain programs.

That’s it for this blog post. Happy programming!


Special thanks to Hector Virgen, Justin Duke, and Iheanyi Ekechukwu for providing feedback on early drafts of this post.

If you enjoyed this post, you may also enjoy another post in this series: State vs. Values.

If you’re interested in receiving blog posts like this regularly, join hundreds of developers and subscribe to my newsletter.

  1. As always, It Depends™. 

  2. Justin Duke, when reviewing this post, provided the feedback that the chosen example tips the scales a little too much in Ruby’s favor. I can’t come up with a better example here. If you’ve got one, I’m all ears. 

  3. TC39 noticed this elegance, likely through the rise of Coffeescript, and introduced the array operator with implicit returns in JavaScript ES2015. Great consortiums steal. 

  4. You can further condense this code down into users.map(&:email).uniq