Fragile Software Development

(Fr)Agile Software Development. Ha ha!

Detractors of Agile Software Development sometimes use the term Fragile Software Development. It’s an easy pun, you don’t have to think too hard about it. And then you can dismiss the whole idea of agile software development without a further thought.

This accusation is mostly aimed at Extreme Programming. XP has a set of interlocking practices that reinforce and balance each other. Take one away and the whole house of cards comes crumbling down.

XP is fragile. And that’s how it’s supposed to be.

Huh? Agile: good! Fragile: bad! Surely?

What if that fragility contributes to XP’s agility?

Let’s look at another fragile method and all the ways it can break: Lean. Let’s count the ways we can shut down a Lean plant:

  • Just in time delivery: if one delivery from one supplier is late, what little inventory you have runs out very quickly.
  • Everyone can stop the line: if someone discovers an error, the whole line is shutdown while the cause of the error is removed (actually, the line shuts down incrementally: first one cell, then a group of cells, then the line, depending on how long it takes to fix the problem).
  • Pull: there are very small buffers of work in progress between cells, regulated by Kanban cards. If one cell fails to replenish the buffer, the other cells quickly shut down due to lack of input. Ideally, you have “one piece flow”, no buffers!
  • Small batches: each production run produces only small batches, which are consumed quickly. If the process that produces the batches breaks down, the consumers quickly break down due to lack of resources.

And yet… every metric and statistic available clearly indicates that Lean plants are more efficient and production lines break down less often than “traditional” plants with lots of safety margin.

Making the system more fragile: lowering the water level to expose the rocks

In between cells or parts of the production line, there are small buffers of unfinished work (see the “Drum-buffer-rope” entry) that protect a consumer from variations in output of its producers. The Toyota Way describes how the size of these buffers is determined: by using the lower the water level to expose the rocks technique.

Whenever a process runs smoothly, the size of the buffer is reduced. If the system keeps working smoothly, the buffer is reduced again. This continues until the system breaks down. The buffer is enlarged to the previous value and the root cause of the system breakdown is researched. When that problem has been solved, the buffer is reduced again.

ground_swell_by_edward_hopperThe metaphor that gave this technique its name goes like this: imagine you’re sailing a boat. Below the water surface, there are rocks, but you don’t know where they are. If the water level is high enough, you’ll never hit any rocks. If you gradually lower the water level, you will hit the highest rock. Now you can remove that rock and you can sail freely again. To hit (and find) more rocks you lower the water level again.

Fragility keeps you on your toes

How would you act if a mistake could shut the whole plant down? You would make damn sure that everything is in place to avoid mistakes or to fix them very quickly. You’d try to find and fix root causes of problems. You would constantly look to improve and refine your processes. You would actively search out problems. That is… IF you worked in an extraordinary, learning organisation.

If you work in the average company, you would be scared to do anything for fear of being blamed (and fired). That’s one of the reasons that the Toyota Way says “Whatever the problem, it’s never a people problem, it’s always a process problem

A while ago, I had a discussion about how to handle a daily build failure. I advocated letting the whole team work on fixing the build problem, before they started to work on new features. Someone else proposed to let one developer fix the build, so that the others could get on with their work. He thought I was a bit excessive, extreme even. Now why would I want to waste good developer’s time? Maybe to make our process a little more fragile, to make the cost of a build break a bit higher, so that we would think that little bit harder about how to avoid the root causes of broken builds.

XP is fragile. XP is Lean

XP has even more ways to break down than a Lean plant. Let’s list a few:

  • What if the customer can’t supply stories fast enough (just in time)?
  • What if your story cards get lost/blown away/stolen? (I get this question a lot)
  • What if your tests don’t catch all the errors?
  • What if the build breaks a lot?
  • What if developers don’t sign up for stories/tasks?
  • What if… (your favourite method of breaking down here)

What keeps XP projects alive?

  • A team with a common vision and open communication.
  • Discipline to practice all the agreed practices, not just the easy ones. Even when there’s pressure. Especially then.
  • A blame-free environment, so that you can solve problems and learn.
  • Reflecting on what you do.
  • Constantly looking for ways to improve.
  • Being open-minded to accept solutions when you find them, even if that means you’re not doing XP by the book. Especially then.

And that’s just for starters. Who knows where you would end up if you kept doing that? Just imagine… How fragile can you make your process?

Tags: agile, theory of constraints, lean, Toyota Way