Dec
08

Things I didn’t learn pt.3

Backup soldiers

I just don’t seem to learn, do I?

High availability soldiers

ToySoldierToySoldierAs described in the previous entry, the CSM had to schedule soldiers to perform all sorts of tasks. What happens when someone isn’t available to perform a task? That would create chaos and we can’t have that, can we? The solution: assign two soldiers to each task: the primary and the backup. If the primary isn’t available, the backup performs the task. That should increase the availability of the soldier resources.

Of course, this made the planning job even more difficult for the CSMs: they had to schedule double the number of soldiers, with even more potential for scheduling conflicts.

Did it work?

Not really. There were lots of primary/backup scheduling conflicts, which made the plans even more slippery. But having double the number of required soldiers assigned should have made it easier to form complete work-teams each day, shouldn’t it? No, because of a very annoying dynamic.

Soldiers had a tendency to become ill or otherwise unavailable on those days that they were assigned as primary. That meant their backup had to perform the job, but who cares? They didn’t know Joe Random Soldier who’d been assigned as their backup. It’s everyone for themselves here!

Of course, when a backup had to take on an extra job, this would have to be compensated. That meant even more schedule churn. And soldiers started to get ill when they were backups too. Imagine the chaos as the officer on duty tries to form complete teams, with all the people on his duty roster calling in sick.

A simple solution: pair soldiering

We faced the same issues when we made our own schedules. We solved it by grouping the soldiers in pairs. Each soldier paired up with someone with whom they’d been in basic training. Whenever one of them was primary, the other was backup. And vice versa.

When it came to choosing tasks, the pair chose together. This way, they could avoid primary/backup schedule clashes. We formed pairs out of soldiers who had been through basic training together because:

  • they had the same seniority and thus the same priority in choosing tasks
  • we reasoned that you wouldn’t deliberately welsh on someone you knew and worked with every day. If a backup ever had to replace a primary, it was easy to compensate by switching roles on the next task (of the same value). If one of the pair defaulted on their duty, the other could easily retaliate.

How did that work?

Very well. Scheduling wasn’t difficult at all: each pair just had to make sure they didn’t introduce any conflicts for the both of them. Primaries never got sick (except for one case in 1 year), so backups never had to fill in. Whenever there was a problem, we could simply “swap” tasks or roles, with minimal disruption to the schedule.

The “pair with someone from basic training” rule was useful for scheduling, but not necessary as a deterrent. With tasks swaps (see the previous story), you got other soldiers (from the same division) as backup anyway. That’s not a problem, as long as it’s someone you know and work with (and thus trust).

And what have we learned from this?

Back in the real world, there were no backups. Everyone worked on the modules they had been assigned. Some people got very possessive of their code: if the owner wasn’t available (or if they were in a bad mood), the module didn’t get changed, no matter how badly you needed the change. No-one else was allowed to (or could) change these modules. And at the end of the year, you’d get a performance appraisal for your work.

In one case (I’ll tell the full story later), I was leaving this company. I wanted to tell someone about the last project I’d worked on, but nobody wanted to listen. Finally, someone told me why: “If ‘they’ know I understand that module, they’ll assign it to me. And I’ll be stuck maintaining it, for the rest of my life!

And things were back to normal: work had to wait to be completed, because the one programmer who knew some module was unavailable. No two modules worked alike; each of them reflected the eccentricities of their (sometimes many) authors. When people left, they took all their knowledge with them. Taking over someone else’s module felt like archaeology, even when the module was documented… That’s how it’s supposed to be, isn’t it?