Bugs are bad. And especially for software engaged in serious business – such as saving lives. A few reasons why:
- Quality medical devices have few known bugs, and of low severity. Thus, having bugs prevents you from shipping. You need to be able to ship at the end of every iteration to your customers (to get feedback), or more realistically, considering integration and product registration, to a system integration team.
- Bug backlogs are inventories and as such are a form of waste. Lean Software Development advocates having low inventories since a bug left open will incur in additional costs:
- Bugs in the software might provoke very complex system bugs when mixed with hardware and bioware. These bugs take an awful lot of time to investigate and are often blocking the entire project plan. You don’t want this to happen with a bug known to the software team that could have been fixed a long time ago.
- Workarounds elicitation and teaching (by documentation or face-to-face) take time.
- It is always more expensive to fix bugs in the future, when knowledge fades away in people’s heads or vanishes when they go.
- Bug backlog engineering (prioritization, endless reviews, risk analysis…) typically takes the time of several experts at the same time. A tremendous waste of energy when the bug backlog is large.
- Bug duplicates imply wasteful investigations. A closed bug has no duplicates.
- Bugs in medical devices have the potential to do harm to people. As such they must be considered with horror and dealt with accordingly. And the best way is to fix them ASAP. Don’t let them a chance to slip through your processes.
- I’ve seen projects with a huge bug count (close to 1000) a couple of times. You know what? They never recovered. They stayed at 1000 bug count forever. Maybe because of the cost of all this waste, maybe because it spread a sense of bad quality and failure in everybody’s hearts.
Morale of the story: never get high on bug count, or your feet may never touch the ground again.
Conclusion: a good bug is a bug killed. Bug count should be close to zero.
I won’t write about bug detection here, but only about what you do when you know them. Let’s assume you already have a good testing system in place.
Bug count threshold and two-phase iteration
So how do we actually manage known bug count? Simple. Set a threshold. Respect it.
- Set the threshold at the start of the project. Write it down in your Project Plan and have everybody sign it. You’ll still be able to change it, but it’s motivating to give it some official existence.
- Recommended values for the threshold:
- More than zero (or you might seriously delay shipping for minor issues)
- Inferior to a couple of dozens. The maximum threshold will depend on the size of the team and its ability to fix bugs. I suggest the max bug threshold doesn’t exceed what your team is able to fix in a few days if it’s its sole focus.
- Split limits by bug severity.
- For example, typical thresholds I use: 0 blocking bugs, 3 majors, 20 minors.
- Bug count evolution is easier to understand inside the two-phase iteration framework. During construction, you take risks, you build, you refactor: bug count gets high. During stabilization, you stop taking risks, you fix bugs: bug count gets down. There will be a delay due the lengthy manual testing processes: you will discover the real extent of the bug count some time after they are introduced in your code.
- What’s important is what you do when the threshold is not respected. My advice: don’t deliver. Hold the version back until more bugs are fixed. You can’t leave into the wild a version that will waste precious integration time or harm patients. You would be ashamed of it. Take the blame for the delay. Put bug count in your information radiators so that everybody gets used to the fact that it’s important. When you have trouble respecting the threshold, talk about it around you and in your team retrospectives. It’s serious. Find solutions.
- What’s also important is what happens to the iteration following the iteration that went wrong. Here’s where the two-phase iteration gets handy. If iteration N has too many bugs and if Stabilization phase N takes 3 more days, Construction Phase N+1 will be 3 days shorter. It means that a few user stories will have to be removed from iteration N+1. It also means that since iteration N+1 is smaller, it should be a little easier to get right, so Stabilization N+1 should run more smoothly. There is an automatic-short-term regulation effect in the two-phase iteration framework.
- On the long run, if you encounter this situation on a regular basis, consider increasing Stabilization phase proportion. That is the beauty of the two-phase iteration: it also embeds a long-term regulation system. Take an extreme example: 1 day of construction, 29 days of stabilization. Plenty of time to fix bugs and get the doc right, no? It should not be a problem. This means that there exists a good proportion between construction and stabilization phase durations that will allow you to finish iterations with bugs below the threshold and documentation in good shape. Your job is to find that proportion.
- This regulation system is vital to any project. What do you when your car engine gets hot and spits steam? You slow down. The same with a team. If a project pace is so fast that quality gets out of control, you must slow down. Remember the agile belief that quality is not negotiable? Now show your true colors. Negotiate time.
- By the way, it is quite a logical for a project to slow down after a while, as maintenance effort increases. You might expect an increase in Stabilization size over time.
I’ve used these techniques on projects of a respectable size (several years, several dozen people, several thousand bugs created and fixed) and they have proved to work well: known bug count never exceeded the threshold for a long time.