Project pace regulation

Faith Dickey in the rain with high heels
Amazing Faith Dickey demonstrating was caution is

Good project pace results of two conflicting forces: market or financial pressure to go fast (typically relayed by management) and technical pressure to do things right (typically relayed by architects and developers).

This conflict is not symmetrical, for several reasons.

  • Management has organizational power and excellent communication skills – compared to developers who tend to emerge from the ideal world of their IDE after hours lost in abstraction in a kind of semi-conscious, almost hungover state, barely able to talk to human beings 🙂 And management is always interested in having more bang for the buck and in shipping earlier to build or strengthen a market position.
  • Technical issues make your product explode on the long run, not today, and as such are easy to sacrifice. By technical issues I mean not only rotting architecture, but also documentation and regulatory issues: taking them too lightly will not cause an earthquake today, but years later. Example of disasters: technical bankruptcy (throwing away that entire unmaintainable code base and starting again from scratch), denial of authorization to sell your medical device by the regulator of a certain market, a patient gets killed using your product. And I have seen managers who, either because of incompetence or of sheer cynicism, are perfectly able to take decisions that have catastrophic long-term consequences for the sake of short-term political advantage – some people are amazingly able to lie their way out of any situation.

This conflict is dissymmetrical but it doesn’t mean that technical people are always right. A friend of mine worked in a startup without adult supervision: developers happily spent three years refactoring the code without adding any new feature. No joking. Three years of getting high with code. Gold plating can go very far. Another story from the trenches: I knew a software architect who convinced his management that building a tool to refactor the code was required (ugly Borland Delphi 6 was really unstable and unproductive) and he spent two years writing this tool alone instead of taking care of the codebase he was responsible for, in particular database concurrency issues that caused much trouble at the end of the project – he clearly worked for interests that were not those of the company, but of personal pleasure. The thing is that usually technical people don’t care much about their organization making money: they just want to enjoy coding well-done stuff and avoid becoming obsolete. If the company goes bankrupt or the project fails, they just move to another company where they shine with these new skills they honed instead of working on what was required to get that project done. Don’t get me wrong: I’m not saying that all developers are selfish and not interested in moving projects forward, but that some of them are, and that there exists a natural tendency to privilege thrill over duty that must be contained.

Fast vs careful

 

A simple model to help find the balance

So how we find the appropriate balance between conflicting forces? It’s not easy to find, it has to be managed. Over the years, I have come up with a model designed to understand and manage each force, and it has proved useful.

Real quality won’t happen by chance, or only thanks to the sacrifice of your teammates workings spontaneously nights and week-ends, but because time has been allocated for it (assuming the right processes and mindset are already in place). On the other end of the spectrum, there has to be a clear focus on delivering story points to avoid getting lost in gold plating. My preferred approach is to set up an iteration model where I allocate time for quality-related activities (stabilization phase size, refactoring proportion during construction phase) and set a story points goal for each iteration to keep everybody focused on delivering customer value.

  • Model for quality forces.
    • Duration of construction and stabilization phases. They might not be constant all project long: as the maintenance burden increases, stabilization phases may get longer. Remember, stabilization is time devoted to bug fixing and documentation. It’s quality time.
    • Proportion of construction time allocated to technical tasks. I’m not alluding to the mandatory technical tasks that deserve user stories of their own (such as sending error reports, writing an installer, have that load test pass), but to unpredictable refactorings. Be careful to think it through. Without dedicated time, refactoring won’t happen at the magnitude required for quality projects. This technical time is also the oxygen skilled technical people breathe: its helps you hire and retain them. Set that value to zero, and you likely will accumulate technical debt and scare away gifted technicians. On the other hand, set it to 100% and the project will stop moving forward. And once again, this value should not be constant: high at project start (say 50%) when frameworks and practices are not established, medium in the middle of the project (say 25%), low at project end when everybody struggles to finish that version (say 15%).
  • Model for production
    • I find it necessary to have a target of accepted story points for each iteration. This is the sum of user stories and technical stories (refactoring and anything related to inner quality that no user will see).
      • It is best measured as the average of the total story points of accepted user stories (accepted by testers, with a little allowance for a few minor bugs) on the last few iterations. This measurement is essential to feed the model with reality (total team velocity captures a great deal of variables that are impossible to model: estimate errors, organizational overhead, tooling problem, motivation, quality of the personnel, maintenance burden, architectural issues…).
      • Aligning the goal to the measurement is a delicate choice if the team produces less than expected: it can be invaluable to predict an accurate project end date, but maintain expectations help fight gold plating tendencies and maintain commitments firmly. In my experience, the target should be maintained just a little above average measured velocity – insufficient production must be fought by the team, not too easily accepted.
      • This target and the proportion of construction time devoted to refactoring make it easy to calculate the estimated user story points target and the budget for technical stories.

 

Simple iteration model

Here’s a sample spreadsheet to help clarify my intent. I’m not saying it’s perfect: you should probably design yours; just consider mine a starting point. Simple iteration management model

 

A model for unknown team velocity

 

Measuring actual team velocity and feeding it into the model is very powerful. But sometimes it’s not practical:

  • When the team is just starting and has no history
  • When projects are long (several years in medical devices development). In particular, the maintenance burden will likely get heavier and turnover will happen.
  • When there are lots of variations in team size. This is especially the case when management asks: “I want this project to be ready by date X, what do you need to make it happen?”. Tricky question: team dynamics are way more complex than a simple multiplication – such as thinking that doubling the headcount will double the throughput. Choose your answers wisely. This is a very strategic issue: if you can predict very early that a project is late and add the necessary staff soon enough so that it is profitable (people should stay at least one year to offset training costs), you will save the deadline. Do it too late, and haphazard late staffing efforts will terminate the team: “adding manpower to a late software project makes it later“.
  • When projects have not started yet. To approve a project, management needs to know its scope, its duration and its cost. The best way to do this with an agile mindset would be to build a product map to have a backlog and estimate user stories to have a project size, then estimate the velocity of the team to deduce time and cost.

 

So I have designed another model to estimate team velocity when there is no empirical data. Here are the main variables:

  • Real time
    • Average daily velocity by developer. The number of ideal days in a real day. Could be 100% if you worked alone in a monastery with absolute concentration, no task switching, and perfect estimation skills. In reality, there are useful and useless meetings, coffee breaks, errors of all kinds. I have made many measurements and find that people are often around 70% in this respect.
    • Working day factor. You have to take into account that a week day is not a working day: people are sick, have holidays, get trained. In my current environment (France, where holidays are plentiful and sacred), people work around 220 days a year. That’s not 52*5=260.
  • Real workforce. Don’t just count people. Take into account:
    • Turnover. I usually count that one person out of ten goes every year, and that it takes six month to replace them (so I loose 5% of the workforce). In other environments, you might have higher attrition rate or shorter recruitment delays.
    • Training. Newbies are not as productive as historical team members. Beginners are often not as productive as principal engineers. There has to be some ramp-up in the workforce when someone arrives (50% productivity the first month, complete productivity after 3 to 6 month depending on experience and the complexity of the work environment).
    • Communication and management overhead. My rule of thumb: every new person in the team eats 20% of the time of the equivalent of a person. One person is as productive as one. Two persons are as productive as 1,8. 6 persons are as productive as 5. This factor is very important when you start computing the effect on the deadline of various staffing scenarios. For very big teams, this factor might be higher.

 

Team velocity estimation model.PNG

Here’s the sample spreadsheet: Team velocity estimation model

 

This might sound a little complicated and over-engineered (don’t complain, I spared you the spreadsheet where I mix both models with release burnup graphs and macros to generate user story cards 🙂 – you won’t need it if you have decent agile tooling, which was not my case). But having sound predictions of what will happen in the future is a prerequisite to act upon that future. When budgets are scary and the deadline is years away, a spreadsheet with many parameters and experimental data will prove a good way to negotiate with top management. And once people realize that what you predicted one year ago proved true, they will listen very carefully on what you say will happen in two years, and maybe grant those two additional developers you need. This might also help you to slow down if quality gets out of hand: the automatic adjustment feature of the two-phase iteration (automatic stabilization phase extension which leads to decreased team velocity on the long run) will help justify why team throughput decreases – quality is just a priority.

Putting bugs under control

Whats that bug - Anegela DiTerlizzi & Brendal Wenzel
Whats that bug – Anegela DiTerlizzi & Brendal Wenzel

Bugs are bad. And especially for software engaged in serious business – such as saving lives. A few reasons why:

  • Quality medical devices have few known bugs, and of low severity. Thus, having bugs prevents you from shipping. You need to be able to ship at the end of every iteration to your customers (to get feedback), or more realistically, considering integration and product registration, to a system integration team.
  • Bug backlogs are inventories and as such are a form of waste. Lean Software Development advocates having low inventories since a bug left open will incur in additional costs:
    • Bugs in the software might provoke very complex system bugs when mixed with hardware and bioware. These bugs take an awful lot of time to investigate and are often blocking the entire project plan. You don’t want this to happen with a bug known to the software team that could have been fixed a long time ago.
    • Workarounds elicitation and teaching (by documentation or face-to-face) take time.
    • It is always more expensive to fix bugs in the future, when knowledge fades away in people’s heads or vanishes when they go.
    • Bug backlog engineering (prioritization, endless reviews, risk analysis…) typically takes the time of several experts at the same time. A tremendous waste of energy when the bug backlog is large.
    • Bug duplicates imply wasteful investigations. A closed bug has no duplicates.
  • Bugs in medical devices have the potential to do harm to people. As such they must be considered with horror and dealt with accordingly. And the best way is to fix them ASAP. Don’t let them a chance to slip through your processes.
  • I’ve seen projects with a huge bug count (close to 1000) a couple of times. You know what? They never recovered. They stayed at 1000 bug count forever. Maybe because of the cost of all this waste, maybe because it spread a sense of bad quality and failure in everybody’s hearts.
Bug count graph
Bug count graph of a real-life project. After a quick phase of exponential growth, bug count stayed in the six hundreds. In spite of two heroic campaigns of bug fixing, the end was inevitable: the flat part of the curve on the right is the clinical death of the project (brutal end, no production, millions lost).

 

Morale of the story: never get high on bug count, or your feet may never touch the ground again.

Guy swllowing bugs
What happens when you let bugs free…

Conclusion: a good bug is a bug killed. Bug count should be close to zero.

I won’t write about bug detection here, but only about what you do when you know them. Let’s assume you already have a good testing system in place.

 

Bug count threshold and two-phase iteration

So how do we actually manage known bug count? Simple. Set a threshold. Respect it.

  • Set the threshold at the start of the project. Write it down in your Project Plan and have everybody sign it. You’ll still be able to change it, but it’s motivating to give it some official existence.
  • Recommended values for the threshold:
    • More than zero (or you might seriously delay shipping for minor issues)
    • Inferior to a couple of dozens. The maximum threshold will depend on the size of the team and its ability to fix bugs. I suggest the max bug threshold doesn’t exceed what your team is able to fix in a few days if it’s its sole focus.
    • Split limits by bug severity.
    • For example, typical thresholds I use: 0 blocking bugs, 3 majors, 20 minors.
  • Bug count evolution is easier to understand inside the two-phase iteration framework. During construction, you take risks, you build, you refactor: bug count gets high. During stabilization, you stop taking risks, you fix bugs: bug count gets down. There will be a delay due the lengthy manual testing processes: you will discover the real extent of the bug count some time after they are introduced in your code.
Iteration 0 regulation
Total bug count (orange curve) is below the threshold at the end of iteration 0 (inside the green circle)

 

  • What’s important is what you do when the threshold is not respected. My advice: don’t deliver. Hold the version back until more bugs are fixed. You can’t leave into the wild a version that will waste precious integration time or harm patients. You would be ashamed of it. Take the blame for the delay. Put bug count in your information radiators so that everybody gets used to the fact that it’s important. When you have trouble respecting the threshold, talk about it around you and in your team retrospectives. It’s serious. Find solutions.

 

Iteration 1 regulation
Bug threshold is not respected at the end of stabilization of iteration 1 (red circle). An extra stab is added until the quality criteria is met (new green circle).

 

  • What’s also important is what happens to the iteration following the iteration that went wrong. Here’s where the two-phase iteration gets handy. If iteration N has too many bugs and if Stabilization phase N takes 3 more days, Construction Phase N+1 will be 3 days shorter. It means that a few user stories will have to be removed from iteration N+1. It also means that since iteration N+1 is smaller, it should be a little easier to get right, so Stabilization N+1 should run more smoothly. There is an automatic-short-term regulation effect in the two-phase iteration framework.

 

Iteration 2 regulation
Iteration 2 has a shorter construction, with less features, refactorings and bug creation than usual.

 

 

  • On the long run, if you encounter this situation on a regular basis, consider increasing Stabilization phase proportion. That is the beauty of the two-phase iteration: it also embeds a long-term regulation system. Take an extreme example: 1 day of construction, 29 days of stabilization. Plenty of time to fix bugs and get the doc right, no? It should not be a problem. This means that there exists a good proportion between construction and stabilization phase durations that will allow you to finish iterations with bugs below the threshold and documentation in good shape. Your job is to find that proportion.
  • This regulation system is vital to any project. What do you when your car engine gets hot and spits steam? You slow down. The same with a team. If a project pace is so fast that quality gets out of control, you must slow down. Remember the agile belief that quality is not negotiable? Now show your true colors. Negotiate time.
  • By the way, it is quite a logical for a project to slow down after a while, as maintenance effort increases. You might expect an increase in Stabilization size over time.

I’ve used these techniques on projects of a respectable size (several years, several dozen people, several thousand bugs created and fixed) and they have proved to work well: known bug count never exceeded the threshold for a long time.

Agile medical device system design

The Agile revolution has definitely transformed the way software is built, to such an extent that it has become mainstream because it just works better. There are several factors to such a success: empowerment that helps get the best out of the people; automation that reduces costs, cycle time and errors. But to me, the most powerful practice of the agile toolbox is the incremental product design that reduces risks at all levels:

  • Integration risk: you integrate sooner (all the time, in fact), so the long-dreaded integration phase of the eighties (that could last for years and often end in project failure) is an everyday, routine task.
  • User needs risk: by implementing the most important features first and putting them into the hands of end users ASAP, you gain field feedback on what the users really need and want. You decrease the risk of creating totally useless or partially usable feature (80% of features in software are said to be never or seldom used).
  • Projects risks: by finishing the product often and measuring the team velocity, you know your real project pace and can adjust to it. Your team’s average velocity of the last three iterations is a good predictor of the team’s pace until the end of the project. I’m a big fan of this down-to-earth wisdom of measuring what’s too complex to be predicted and changing course accordingly.

When working on medical device projects with my colleagues from the hardware, electronics, reagent or system teams, I’ve often wondered why they wouldn’t use iterative development to their advantage. The counter arguments they gave me usually were the following:

  • Our iterations are too long. When designing hardware, the time needed to finish plans, order parts all over the world and receive them, test them and send them back for defects once in a while, assemble them, is ridiculously long – up to six months. The same with electronics if suppliers are expected to design and produce boards. Reagent teams may perform stability tests that last for years.
  • Our iterations cost too much. Big hardware prototypes can cost the price of several brand-new cars. Moulds are awfully expensive. Reagent production lines are a luxury item. Physical stuff cannot just be made and destroyed without a sizeable monetary footprint.

These hardships entice specialists to optimize their business with a typical waterfall process: long requirements elicitation, one-shot production of what they think should be made, oops we forgot something, some supplier is late, schedule is doomed. Local optimization is the enemy of the global optimization endeavor that is a systems project. I believe systems design must be iterative and thought as such from the very start.

 

System iterations
Hardware V1 and electronics V2 are combined with embedded software V5 to build embedded system V3. After some integration testing, embedded system V3 is combined with non-embedded software V6 and reagents V1 to perform the first round of tests of the complete system. This will lead to new insights and subsequent changes in the next iterations of all sub-components – a long time before the end of the project.

 

Software item iterations are likely to be always shorter. But that doesn’t mean that other specialties can’t plan iterations too. Some techniques that could be used to make it possible:

  • First hardware and electronics iterations can be made with prototyping material (for example: B&R automation products) that has unrealistic production cost or size but that allows fast creation of first versions. If first tests prove that the design is good, next iterations can focus on production cost, maintainability, assembly lines, multi-sourcing of providers, while the overall systems keeps on its journey.
  • Hardware stubs. First iterations can also use the technique we software developers know as stubs. For example, the first version of an automated and temperature-regulated drawer for reagent storage could be made without temperature regulation at all, and without automation (only fixed-position reagents, hard-coded in the code or loaded in the database via a script).
  • Design and usability are a big concern for marketing departments and regulators as well. I would suggest to meet your end-users ASAP by quickly manufacturing prototypes of all external interfaces. For example, you can use 3D printers or cardboard models or foam models. Have end-user representatives execute typical usage scenarios with it. What do they think? I remember using this technique for a device with a bar-code reader: we printed a 3D version of the casing in a matter of days only to realize that the bar-code reader was positioned in such a way that the end user would have to almost break its wrist to use it. So we moved it to the opposite very easily (no need to redesign all the internal parts of the device, no constraints!).
  • Reagents design is complex and slow. Help these guys by giving them ASAP a system prototype to test their stuff. They don’t care about chassis production cost, cybersecurity or electronic components triple-sourcing. They just need good biological performance.
  • Assembling subsystems is difficult. Something that has never been tested never works. So be sure to plan an integration and system debugging session every time you produce a system iteration, before downstream activities (such as biological performance tuning) can start.
  • As explained by the eXtreme Manufacturing movement, to plan for iterative, incremental system design, the priority would be to think carefully about the internal interfaces of the system and divide it into subsystems. Subsystems can evolve independently as long as they respect the interfaces – thus achieving fast-paced design.

 

ScrumInc eXtreme Manufacturing car
The modules that make up an extreme manufacturing build party at ScrumInc

This is no easy task. But a necessary one to tackle the top risks of a medical device project: biological risk and registration risk.

  • You should produce as fast as you can a functional system to tackle the biological risk – living matter is so unpredictable that you are better off observing how it behaves (just as project dynamics, by the way).
  • And once you have a complete system able to perform its biological task (stripped out of the bells and whistles), i.e. once you have tackled the biological risk, consider handling the registration risk by registering this minimalistic system. This will take lots of time (typically 2 years in China). The registration teams should be able to define the contour of a system that could be registered officially all over the world, but that you probably won’t sell (it’s ugly, it can’t be maintained, it has no advanced software features, but yes it performs its core biological mission pretty well). Meanwhile, you will prepare a second version with all the nice-to-have features that will be registered as a simple product evolution, with lower risk and delay, and that might well end up being available on the market little time after the first version is ready.

Lean Startup thinking promotes trying your concept with a Minimal Viable Product that you put into the hands of your end-users. Product registration authorities are a kind of VIP end-user. Maybe you should plan you entire project plan to build them a dedicated MVP to address the registration risk right after the biological risk is under control.