The end

The list of blog posts I intended to write on the subject of agile development for medical devices is now complete.

As my career has moved to other positions and other industries, it’s unlikely that I will write new posts here.

I’m totally convinced that agile development is the way to go to reach true quality and safety. I hope that fellow medical device software builders will find useful guidance in these pages.

Cheers

20180306_145947.jpg

Advertisements

What about SOUPs ?

Regulators of IEC 62304 have put a lot of energy into normalizing how to handle SOUPs (Software Of Unknown Provenance) for software of classes B and C (software that is in a position to potentially harm people in a non-benign way). The definition says: “Software that is already developed and generally available and that has not been developed for the purpose of being incorporated into the MEDICAL DEVICE (also known as “off the-shelf software”) or software previously developed for which adequate records of the development PROCESSES are not available”. To sum up: everything that hasn’t been built according to the norm.

What I’ve seen in the trenches indicates that this distrust in SOUPs is a bit misplaced: in my projects, carefully chosen libraries contain several dozen times less bugs than home-made code before verification. Why?

  • Released libraries are finished software.
  • They are used by many more developers than code being called in only one context and thus have a higher probability that bugs have already been found and fixed.
  • The rise of open source software along with excellent processes (automated builds, TDD, gitflow with systematic reviews of pull request…) and psychological motivators (the name of developers permanently and publicly attached to every commit incentivizes perfection in code) has dramatically increased the quality of free libraries compared to ten years ago, when 62304 was first released.

But I understand the theoretical need of regulators: if there was no SOUP policy, it would be too easy to pretend that a major part of the code is a SOUP and not apply the regulation at all. I can’t imagine a norm that doesn’t think that what’s coming from outside of its jurisdiction could be better.

Norms are norms and auditors are paid to verify compliance to a norm, not to argue about how well or bad the norm was written. I’ve heard that SOUPs are one of the top favorite areas for auditors to look for defects in your implementation of IEC 62304 (the other one being risk analysis): be warned.

eyeballsoup
Indiana Jones and the Temple of Doom. Nice SOUP.

 

So how do we handle this mandatory and not-so-useful activity? Here are a few hints to maximize productivity.

  • You need a list of dependencies and their versions. In some programming environments (nuget, bower, npm…), there is a clear list of these dependencies and their versions (package.config, package.json, bower.config…): try to generate the SOUP list from these files.

    bower-json
    Dependencies in a bower.json file
  • It’s a good idea to take advantage of this list to perform a thorough inventory of licenses and do what’s required to be clear. For example, many open source libraries require your software documentation (typically online help) to quote them. And maybe you’ll find one or two that’s not free for commercial use and that needs to be replaced – the sooner the better.
  • 62304 requires specifications for SOUPs, including performance criteria. This is a tricky business: some of your SOUPs are a lot more complex than your medical device (the base library of your favorite language, the OS): you can’t possibility retro-spec them entirely. My preferred approach is to document the requirements of the main behaviors of the library that you actually use – a projection of its features on your special use case.
  • You should always try to wrap the external dependencies of your code in, well, wrapper classes. This prevents this external namespace to creep all over your code. It helps to easily change the library with another functionally similar implementation someday. In the context of SOUPs, the public interface of the wrapper makes very clear which part of the SOUP you use, and which part you don’t. This can serve as a boundary to limit your SOUP specification effort.

    soupwrapperclass
    The SOUP wrapper acts as a Facade and helps limit the specification and unit testing effort to features that are really used.
  • 62304 requires you to test these requirements. That’s something developers spontaneously do when choosing a library: make sure the library works, test a few edge cases. But you need to do it again every time you upgrade the library. For the latter reason, I strongly suggest unit tests that you can link to the specification (so that they end up in the traceability matrix) and use to test the mandatory performance requirements (for example by using the MaxTime attribute in NUnit). These unit tests will help you make sure the next version of the library works with very little extra effort.
  • When they are available, you could run the unit tests of the library itself and use their results as a proof of quality. You will still have to deal with writing your own requirements and linking them to the tests. In practice my teams often have had problems with libraries having a few failed tests related to features we didn’t use, which triggered cumbersome justification; in this case we just skipped the library unit tests.
  • You are required to perform a risk analysis of your SOUPs and add mitigation strategies as required. This is theoretically a good idea, but I’ve often found it very difficult to put in practice with general-purpose libraries, because their impact cannot be bound to a single feature. In some cases – databases, ORMs, mappers – almost any features could potentially be compromised. As always with risk analysis, there is a temptation to assess every possible failure mode, which would lead to an overwhelming analysis that never gets finished. My advice here would be to trust your gut feeling and choose a selected handful of risks where the brainpower consumed performing risk analysis will be most valuable. There are less failure modes in SOUPs than in your code; use your time on the risks that really threaten patients. Don’t get stuck in an impossible thorough analysis of everything that could possibly go wrong in things that are more complex than what you produce.
  • You are also required to perform a list of known bugs and assess the risk your software incurs because of them. It’s a demanding endeavor: in practice my projects tend to use dozens of libraries, some of them have hundreds of bugs, and others don’t publish bugs at all; when they do, it is often difficult to tell which versions of the library have the bug without testing them. I suggest you don’t waste your time with this before the end of the project because you are likely to upgrade your libraries until then and because more known bugs are likely to be closed with newer versions. The ROI of this activity seems very low. I would be glad if this requirement was stripped out of the next version, or adapted to be more cost-effective.
  • Operating Systems are a special kind of SOUP. Of course you don’t want to retro-specify and test what it took your vendor decades of work with thousands of developers. But there is an alternative approach. These days, a lot of emphasis has been put on cybersecurity for medical devices – and this is good, patient data is sacred and hackers are on the brink of cyberwar to get it. You must harden your OS – and maybe brand it along the way. My recommendation would be for you to specify, document and test the hardened OS and not the base OS. This way, the OS spec is really useful and has a realistic scope.
  • SOUPs of SOUPs. Developers often ask how they should handle SOUPs of SOUPs – the dependencies of the libraries themselves. Of course you can’t handle all dependencies recursively, you would be overwhelmed. Treat your direct dependencies; their own dependencies are an implementation detail. The tests that verify the requirements you wrote for what you use in the SOUP will exercise the lines of code of the SOUP dependencies that you actually use. Their possible failure would be ways to trigger the failure mode of the level 1 SOUP that you already considered in your risk analysis; you don’t need to analyze them separately.
reinvent-the-wheel
Don’t reinvent the wheel. Source: John Kim, https://www.linkedin.com/pulse/how-reinvent-wheel-john-kim

 

Whatever the hardships in producing the required documentation, resist the temptation to code for yourself what others have done. Reinventing the wheel is a waste of time. Remember, your goal in agile development is customer feedback and delight, not library writing. The thrill of writing cutting-edge technical code is what I suspect entices many developers into rolling their own version of existing stuff, and not good project governance; this an area where a responsible mindset – adult self-supervision – is of particular importance. Developing with an agile mindset implies going as fast as you can by removing waste; missed opportunities of good reuse are horrendous waste. Your immature code will always more buggy and more poorly designed than a library written by people dedicated to it, maybe working on it full-time for several years, and that has lived in production for several releases in many different contexts. In this regard I think that the writers of 62304 have done a very dangerous job in discouraging people to use reliable libraries and creating an incentive to write brittle home-made code instead, which would have a very negative effect on overall medical device reliability and safety. A few month ago I stumbled upon a concrete example of this : a developer I know decided to write his own XML generation routine to avoid the lengthy, boring and absurd (according to him) process of documenting an off-the-shelf library. Don’t ever do this. SOUPs are good. Always use SOUPs when they make sense. Accept the pointless burden (automating as much as you can) and write the required doc.

Let me take advantage of this tribune to deeply thank all the open source contributors in the world.

OpenSourceTribute.PNG
Some of my favorite libraries. Thank you guys! You rock!

Medical devices software testing overview

So you wish to produce real quality medical device software? Testing is the time when you will show your true colors. Medical device imply that bugs can’t happen in production, because lives are at stake. As untested code never works, you have to invest a tremendous effort in hunting bugs using a variety of techniques.

Another reason why testing is of special importance for medical devices is that embedded systems are hard to update. I know of environments where it takes up to several years to manually update devices all around the world. Even if you have a good IoT solution to push updates, your clients might have a variety of good reasons to postpone them (such as their own internal, heavyweight validation process). Bugs will thus live for a long time in the wild. They shouldn’t escape. You might take more risks on a server that you can patch in a matter of hours for all your clients.

In addition, medical device regulations (IEC 62304 in particular) impose a test for every requirement and a traceability system to allow auditors to easily spot holes in the test system. But they let you a good degree of freedom about the testing techniques. Be smart in your choices.

The following diagram classifies the kinds of tests I believe should be used to achieve the level of quality and safety required in the field. They grey areas represent the composition of components into a more and more complete medical device system. The green, purple and blue rectangles represent the categories of tests applied to sub-components. Check the color code below.

 

test-classification-white

 

Broadly speaking, the wider the test, the costlier it will be to provide the required resources (from a virtual machine to a mini-lab or hospital setting full of medical devices) and the longer it will take to run them (from 1 ms to one day). For this reason, teams need to use each kind of test to its maximum extent to lower the need of more general tests, both for cost-effectiveness and good test coverage.

In this series of posts, my goal is to provide a return on experience and a general guidance on how to tackle these tests in the context of agile medical devices development. I’ve organized my thoughts in the same 3 categories that I’ve highlighted in the diagram, and that require different kinds of management.

  • Automated tests that run in the build system. One they’re written, apart from maintenance of broken test and build system performance, you need no effort to run them, so they run all the time.
  • Automated tests supervised by a tester. Require some planning, dedicated hardware and somebody to start them and to interpret the results, but they can be highly parallelized with other activities.
  • Manual tests. The costliest kind, requiring a human being during execution.

 

Medical devices software testing: manual tests

This article is part of a series of posts about medical devices software testing:


I deeply believe that automated tests alone cannot be sufficient for embedded systems. The key reason is that you need to test your software on its production hardware, which may include sensors and actuators. Your test runner virtual machines are not representative of this – at least because they have some integration with your CI. Continuous delivery is achievable for highly automatable and low-risk domain models –the consumer web for example. But in the project I run where we put the most effort on automated tests, with more than 3000 of them, manual testers are still able to find more than 150 bugs per iteration. Sure, test coverage could be better, one test should be added for every bug found. But getting too hardcore on this may create waste (some automated tests are just so difficult to write and maintain that it’s simply not worth it). Anyway, even if the automated test system was perfect, a human being would be required at some point. Users are real people, patients might be going through the toughest times of their lives; they should be assured that the medical device they rely upon has been manipulated by a fellow human being, and not by a dark army of cold virtual machines 🙂

test-classification-white-manual-tests

Manual tests classification. Check out the overview for the bigger picture.

 

Before I dig into the specifics of manual testing with or without hardware and bioware, a few general considerations:

  • All kinds of manual tests incur dramatic costs to be run: I’ve seen projects with ten people full-time only for manual regression testing. And it’s a total waste: you could do it again thoroughly at the next commit! So these tests should be handled with care.
  • Their point is that a human being uses the software in the same conditions as the end user. Priceless. Testers will notice so many little details, maybe not directly linked to the test plan being executed, that would annoy end users! They will sport ergonomic inconsistency. They will establish far-fetched correlations between seemingly unrelated things. Testers are smarter than test procedures, they see what lies beneath.

The very fact that manual test are required implies a delivery process for manual testers (for purely manual test execution and supervision of semi-automated tests). That leads us to the two-phase iteration.

Manual tests running on software alone

Structured manual tests are the most classical way of testing that regulators expect: test plans written and validated before test execution, tests plans rigorously executed by human beings as if they were robots. My advice here is to be careful about productivity:

  • Set up an environment that automates trivia work surrounding manual test execution. A real-world counter-example is to print manual test procedures, complete them and sign them with a pen. A better way is follow this process digitally, including the storage of the results in such a way that it can be included in the traceability matrix. Doors is pretty good at this, but I’m sure there are other good tools out there. Just don’t do it manually, it’s pure waste.
  • Measure manual test procedure execution time and maintain a database that allows you to predict the time required to execute a certain strategy (which will guide you in finding a good balance between test coverage and test effort when defining the test strategy answering to an impact analysis).
  • Define indicators related to that execution time and look for strategies to minimize it.
  • One hint for this minimization: have developers help testers by writing scripts to set up environments fast, fill databases with ready-to-use data, replace hardware and networks peers with faster stubs. Testers can also help themselves by automating test prerequisites (preparing the system to reach a certain state before starting the real test) once they master UI test automation tools.

Manuel tests are an excellent way to test for weird system conditions.

  • Suppose your medical device can be used in sites with poor power supply, hence has to resist to unexpected blackouts. Check it out by savagely removing the power cord during load test, restarting, and ensure that the app knows how to spot interrupted stuff, notify it to the end user, clean the mess, restart what has to be restarted. Is that database really ACID? Aren’t there caches somewhere (in the OS or in the disk itself) that retain mandatory data that will get lost when pulling off the cord?
  • A similar condition is operating system failure. It’s possible to provoke on purpose a Windows Blue Screen Of Death with special drivers. Your data should resist it.
  • Suppose there are separate calculation units for real-time automation and for the GUI. What happens when you pull off the cable between both and plug it again a while later? Can your system recover gracefully?
  • What happens when a separate process on your OS goes wild and eats your CPU? Your app should not break – typically queue work items and diminish the processing pace, but not break. This can be simulated with tools (BurnCPU for example).
  • Suppose you have encrypted a few sensitive columns in the database with a unique key stored on the computer. The computer burns. Users have a backup of the database somewhere but they need the recovery key for the encrypted columns. Have you stored a backup key in a safe place? Are you really able to inject it on a new PC with a fresh app install with an old backup (typically with a few schema migrations) and have everything work fine? You don’t want your users to loose their data, do you?
blue-screen-of-death
Windows Blue Screen Of Death

Where manual testers really shine is when performing free testing. They explore the app in crazy ways. They provoke havoc on purpose. They find new issues by looking for undesired behavior outside the requirements. People tend to overlook this technique because it doesn’t exist regulation-wise (no test plan, no traceability matrix) and they already feel overwhelmed with regular formal testing. But if you’ve done a good job automating and working on productivity, if you use manual testing only where it’s effective, you can certainly save some time for free testing and progress towards the greater goal: finding bugs before they kill a patient. As an added bonus, they’re good for tester morale.

Manual tests running on hardware and bioware

I know three categories of tests that imply physical stuff (overlooking clinical trials, which serve a very different purpose and have their own constraints):

  • System testing is about integrating a whole system and testing how well the different parts fit together. It usually involves pushing individual parts outside of their normal operations limits. It may also involve testing the features required to service the system in the field, and generally everything that is done behind the scenes to make sure the device has top performance.
  • Functional testing is about testing the system completely as an end-user would.
  • Analytical testing is about performing large series of tests involving real biological material to assess to biological performance of the whole system.

These tests imply that the software is tested as part of a bigger system including hardware (a device) and biological components. This is mandatory to make sure that the system as a whole works. But from a software perspective, this has a number of drawbacks:

  • Hardware drawbacks:
    • Hardware takes time to develop and produce. Hardware is harder to change, so people struggle more to get it right the first time, and may get stuck into Analysis Paralysis. Feedback loops are awfully slow – maybe a couple of months, even quarters, if components are made by partners and new contracts need to be negotiated. Consequence: there may be no hardware when you need it. And when it finally arrives, guess what? The project is late, there is no time left for bug-hunting the software.
    • Hardware prototypes are expensive. I’ve seen such prototypes exceed the price of a brand-new car. So there may be a shortage of testing stations.
    • Hardware gets broken. This is especially true with prototypes – the hardware guys also have to spot their hardware bugs. While they fix them, the testing session is over.
    • Hardware is slow. Slow to initialize, slow to reach operating temperature, slow to change state. Nothing can be executed faster in real conditions. The tests will be slow.
    • Hardware is dangerous. Engineers can get electrocuted, have fingers cut or crushed. This is especially true with prototypes: safety is a result of design and may not be guaranteed while the design phase isn’t done. So extra care must be taken with hardware.
  • Biological stuff drawbacks
    • Biological material is also dangerous. Extra care need to be taken to clean everything and make sure no tester gets contaminated with some nasty germ. This slows tests down.
    • Biological material may be scarce. Either because it’s costly, or because it’s genuinely rare (e.g. sample coming from patients with certain uncommon pathologies). So the sheer number of test runs may be limited.
medical-trials-2
Image source: http://www.castlemedical.com/blog/

These practical drawbacks have a nasty impact on test productivity, compared to software tests where anything annoying can be mocked. For this reason, tests involving physical stuff should only be used to spot complex issues that cannot be revealed otherwise. For a system to work, every part has to work fine, and all of them must work well together. Spotting the cause of issues can be challenging. For example: does the bad quality control value we measure come from the way the sample has been defrosted or warmed up in the machine, has the heater a defect or has the electronic card gone wild, does the network tamper with the message or has the temperature threshold been truncated when stored in the database? These problems are difficult to solve. They take time. And they arise when the device starts working – this means close to the end of the project, when deadlines are close, budgets are depleted, and everyone is very anxious with getting that nightmarish piece of engineering finally released. You don’t want to add to the confusion by disclosing software problems that could have been spotted earlier, in isolation.

So my main advice about system testing from a software engineering perspective is that system testing is not a time to find software bugs. Use all the techniques discussed in the preceding sections to spot them earlier. If you do your job well, you should hear about very little bugs at these stages.

Medical devices software testing: automated tests that require human supervision

This article is part of a series of posts about medical devices software testing:


Some tests are mostly automated but can’t be run in the build, for two reasons:

  • Somebody needs to start them and interpret the results
  • They need to run on the target computer and OS

They are the purple ones in the classification of the introduction:

Test classification - white - tests that require supervision.png

Haltéprophile.jpg
Source: http://invictus-physicalcoaching.com/2014/02/analyse-descriptive-des-mouvements-halterophiles-33/

Load test. Load tests are typically a special kind of integration tests. The idea is to ensure that your app works with the expected performance criteria at maximum capacity by testing it in conditions an order of magnitude more demanding. Both performance and capacity criteria should be precisely defined in specifications and related to a real user need, otherwise the temptation to lower the bar will be great. These tests are difficult to make pass and as such deserve a user story of their own. But they are an excellent bug finder: you will encounter those rare conditions that otherwise would happen only at client sites. They are a must have. While I recommend running some load tests in the build (which allows you to break the build as soon as a toxic commit kills performance, or to easily monitor the duration of these tests over time), at some point you must run them on the target hardware and OS and guarantee that the performance objectives will be met in production conditions.

Stamina tests. This time, the device operates at normal capacity, but for a long time. For example, I work on a medical device that is meant to run with one shutdown per week: it has to be tested for one week of continuous work (or more). Other kinds of interesting and surprising bugs can be spotted here, such as tiny memory leaks or hidden timeouts – what happens if the app is left open without activity for more than twenty-four hours? A very interesting way of performing stamina testing is to capture real production activity and to repeat it – in which case it also serves as regression testing.

Automated tests written by testers. Automated testing tools such as Ranorex or TestComplete allow testers to set up tests that exercise the GUI as an end user would and check the display. As such they are pretty global integration tests. They can be run in the build or on the same computer as the medical device, to be even more global tests (which is why I’ve classified them here). The easiest way for testers to write them is to record manual test sessions. But on the long run we have found more productive to write large portions of them directly with code, which helps to reuse them across projects. We have also found that such automated GUI tests could be very handy to automate the setup of manual tests (putting the SUT in a certain state before starting the test). Let’s face it: the entry cost is real. You need to study the market and choose the right tool (vendor lock-in is very likely given the cost of your tests and their adherence to the specifics of the tool), to get acquainted with the tool, to create a team responsible for these automated UI tests, to set up good communication channels between this team and the dev teams to avoid breaking tests when controls are renamed or refactored, to tweak your UI to help the tool find controls, to put everything in the software factory, to link the tests to requirement numbers and get their results into the traceability matrix, to provision the maintenance costs of these tests. But the rewards are great: imagine you could run your whole manual test plan in one night on your servers, imagine you could transform wasteful test execution time into an investment (repeatable test procedures), imagine you could thus get enough time for your testers to perform free testing – wouldn’t that be a lot more productive and interesting? Wouldn’t you find more bugs?

testcomplete
TestComplete test editor

 

Test with network peers. Modern medical devices can no longer be off the grid: medical and laboratory staff need them to receive orders and send results through the healthcare organization network (with protocols such as HL7, ASTM, DICOM…), they may interact with non-embedded software that is easier to develop and maintain (Data Managers, for example), and they might interact with an IoT solution that helps keep them in optimal operating conditions and add remote services (such as allowing patients to inspect their medical records, or apply artificial intelligence technique to provide clinical decision support). All these wonderful features relying on networks need to be carefully tested. While low-level interactions (such as protocol handling or basic conversations between servers) should be tested in the build to provide fast feedback to developers and extensive edge case testing, at some point the integration must be tested in more realistic conditions: real hardened operating systems, real network cards, real cables or wifi, real firewalls and NATs, real DNSs, real intrusion detection systems, real production server hardware… I’ve classified network peers testing in the semi-automated category since you can typically script part of the test scenario, but will still need some degree of human intervention to set them up, start them and analyze the results (until the day the healthcare industry has fully moved to continuous delivery, but we still have a long way to go when embedded systems are involved).

Security testing. Network peers lead us to cybersecurity. As anything, security doesn’t work until it has been tested. There is much more to security than intrusion testing, but the latter at least guarantees that your security measures were able to repel one professional attacker. It’s doesn’t guarantee your device can’t be hacked, but it says it’s not that easy. To perform the attack properly, the auditor will need access to the device in real conditions (real computer, real hardened OS with real vulnerabilities). Trying to hack a system involves the expert manipulation of tools (such as MetaSploit) that automate attacks or entire catalogs of attacks and gather results to show vulnerabilities to the dev teams. I wouldn’t run such tools in a build system since they can easily compromise your IT infrastructure – they should be run only in a controlled, totally disconnected environment.

metasploit

 

 

Medical devices software testing: automated tests that can run in the build

This article is part of a series of posts about medical devices software testing:


In my experience, writing and make pass the unit tests that act as guardians of a piece of code will take about twice the time required to write that piece of code alone. But that doubled time will include most of the debugging. As a developer, would you rather like to write unit tests or to debug? Would you rather fix things when they are fresh in your memory or two months later? Do you like to have your work really finished and move on to the next challenge, or be constantly caught up by your past and shameful mistakes? When somebody else breaks something in that functional area you’ve been working on some time ago, would you rather spend one day investigating their blunder to determine if it’s their responsibility to fix it, or instantly see in the CI what commit likely broke your unit test? Would you rather refactor constantly stuff you don’t like in your code or be paralyzed by the fear or breaking things in lesser-known areas of the code without any safety net? When you have to modify some code written by someone gone long ago, do you want to find it equipped with unit tests to help you get acquainted with the beast? I believe that automated tests are profitable, but on the long run. There is a quite high entry cost. Introducing automated tests late in a project is not quite so profitable. It must by a constant effort of everyone in the team with the goals of catching bugs, of inciting to decoupled design, of allowing design evolution by providing a safe environment for refactoring. But isn’t that exactly what we’re pursuing when developing medical devices? Good design, few bugs?

Anyway, the most economically interesting part with automated tests is that running them is cheap. Sure, some time will be required to take care of the production of all the servers and build agents, and to maintain the build scripts and build tools configuration in good shape. But this cannot be compared with the incredible cost of manual testing campains. Automated testing means putting effort in designing stuff, not repeating an effort which outcome is of very temporary value (until the next commit, or luckily the next version); it means climbing higher in the value chain. The very fact that it is so cheap allows the sheer number of tests run for every release to grow dramatically (compared to manual testing), which actually decreases the number of defects found in an end product – if you take advantage of the discipline.

Running tests almost for free is what makes automated testing so paramount to agile development. You can’t deliver often if you can’t test cheap and fast. And you can’t refactor if you have no safety net. A development that’s not iterative and evolutionary can’t be agile at all.

 

Going back to the schema of the introduction, here are the tests we will focus on in this section (green boxes).

test-classification-white-tests-that-can-run-in-the-build

 

Automated tests have the freedom to exercise small software components. To do so, they replace the environment of the System Under Test (SUT) with mocks and stubs, at any level. They are meant to run in virtual machines (or, someday, containers), which means they don’t run with the target embedded computer and they don’t run with the production OS (as they necessarily imply installing at least one testing dependency to run the tests and gather the results, and no dependency should be shipped in a production OS that is not required to operate the medical device, as this might increase risks regarding safety and cybersecurity). This freedom to mock collaborators of the SUT means it is possible to thoroughly test edge cases that are difficult or impossible to test by other means. This is especially important for programs that are meant to operate sensors and actuators and give life to an automaton, something that inherently has state: thanks to automated testing, you can easily test many configuration of the medical device, and simulate every possible error code of every possible sensor and actuator (at least at the level of the layers closest to the hardware, before combinatorics become overwhelming). You can also test what’s happing with a large variety of biological measurements or behaviors – even simulating phenomena that seldom happen in reality). Automated tests rock to achieve reliability.

 

Here are some guidelines on the use of the 3 main categories of tests that can easily be executed in the build:

  • Unit tests. Unit tests are written to exercise a class or a couple of classes. The idea here is to test your basic algorithms decoupled from their environment. They prove worth themselves when the algorithms are not trivial – unit-testing boilerplate code is not profitable, I’d rather leave that to module and integration testing. The point of unit tests is, since you work at so small a scale, you can test all execution path and conditions, even those that are not meant to happen in production – but that eventually will. For example, suppose your write an algorithm for counting cells in an image. You should test that algorithms with a nice quantity of images taken from production condition (say one hundred) with a carefully chosen variety of concentrations for the different cell types. And then error cases: inappropriate formats, corrupted images, pictures of the sea… Once these tests are written, you must be confident enough to say: results will be good and errors will be gracefully detected.
  • Module testing. I’ve found many definitions of a module on the web, but here’s the one I use in this blog: a module is a complete binary entity that can be started and shut down– typically an executable. Module testing hence means testing the module as it will be in production (compiled in Release mode, all relevant compiler optimizations on, signed…). You mock the collaborators of the module (typically making them have a known behavior) and test the latter through its production API. At that scale, you won’t test all execution paths, but ensure that most features of the module that will be delivered to production work.
  • Integration testing. In these tests, you exercise several modules but you still mock some of their collaborators (e.g. network peers, fake machine hardware). Here you can test for interesting things such as complete app startup and shutdown, global error handling (are errors detected, and a nice error report generated, hopefully sent to the dev team?), and maybe run them on the target production OS (along with all hardening and branding features). For example, one of my teams created integration tests where the C++ and Straton automaton code (complete except for the stubbed actuator and sensor level) was running inside a virtual machine with Linux-preemptRT OS and interacted with the business code running in a Windows 7 of the build agent (which was quite different from the hardened production OS). These tests, by running complete biological tests execution, complete startup and shutdown sequences, as well as key maintenance behaviors, guaranteed many things about the execution of the automation code in its target OS and its interaction with the upper layer.

WeNeedToGoDeeper.PNG

Probable source: http://www.quickmeme.com/meme/3rvaum

Organizational factors to make automated testing possible

Working in a regulated industry, to fully take advantage of automated testing, you have to legalize it. To be suitable for use in the traceability system, all kinds of automated tests must be legitimated. This means that management plans and procedures need to introduce them properly, and that every tool in the chain must be documented, put in configuration control and formally validated. Tests have to reference requirement numbers in such a way that they appear in the tests results, which will feed the traceability matrix (for example, write the ids of the requirements covered by each test in NUnit’s Description attribute).

Give time for automated tests. Automated tests take time. If management doesn’t provide this time, they won’t happen by magic. A technique I find useful is to constantly repeat in planning poker sessions that automated tests are a mandatory part of the job, and that they should be part of the estimate. Difficult tests (such as automated load and stamina tests) must have a user story of their own, and with high estimates (these things are always a lot harder than they appear). As with all things related to long-running quality efforts, a constant care must be taken that will affect the sort-term velocity of the team. But everyone (up and down) has to get used to that velocity if it is required to write good, reliable, safe medical device software. One more reason to start it from the beginning: automated testing must be part of the natural rhythm of the team.

Time is the soil in which automated tests can grow. The ideal of quality is the seed. But what are the fertilizers you can use to make them flourish? Apart from technical factors that we’ll delve in soon, what is required is an automated testing culture.

Writing good automated tests must be pervasive in the development culture. Every developer should expect to create them and fix them when they’re broken. In fact, should developers be deprived from this constraint, they should find it unacceptable and fight hard to have automated tests introduced. In addition to this distributed responsibility, a dedicated team is useful to take care of the more difficult work regarding test: maintaining and polishing base test classes and shared stubs, updating libraries and introducing new ones when appropriate, spotting and annihilating sources of randomness in tests results, improving test performance, deleting obsolete or not-that-profitable old tests. Having tests that pass fast, deterministically and successfully (tests that are never left broken for long) makes a long way towards creating and expanding that automated testing culture.

Good KPI visible everywhere (including sprint reviews) can help. I’ve used the raw number of tests (motivating since always increasing, but not very meaningful), the number of tests created by iteration (a good indicator of the persistence of the practice in your team), the test coverage, the status of a few key and difficult tests.

Management should constantly show its care. By ranting about broken or random tests. By praising accomplishments (such as lowering the number of broken tests, fixing randomness, achieving victory with difficult load or stamina tests). By watching the state of the build. By investing the energy in building the culture and overcoming the necessary obstacles. By allocating time and resources to the endeavor.

 

Here are my favorite psychological tools to deal with broken tests.

  • Box. I built a box with funny pictures to be put on the desk of every developed convinced to have broken the build so that they feel peer pressure to fix things fast. But be careful with this one, it has to be fun and light – otherwise it might create a weird atmosphere in the open space or even turn to moral harassment. Broken build box.jpg
  • Build guardian. When there are many tests, many developers, no gated commit and when feedback time is long, someone is required to watch for the build: the build guardian. His or her job is to analyze failed tests and assign investigations (either by determining the likely culprit by looking at the commits since the last time the test passed, or by assigning them randomly). This person must have enough authority that his or her decisions are respected.
  • Instant feedback. Having a nice continuous integration tool with appropriate alerts (by mail, system tray, Slack…) when things need attention will help maintain quality in tests. I’ve appreciated using TeamCity the last few years. We also put a LED indicator in the open space to shout for broken tests (and stand-up meeting time J). If you happen to have a certain percentage of broken tests on a regular basis, the XFD can also be used to display the number of tests passing and failing. Or display silly jokes.
Test status indicator.jpg
The XFD in my open space

xfd-ex
An XFD at a Jenkins Code Camp in Copenhagen. Source: https://jenkins.io/blog/2013/09/05/extreme-feedback-lamp-switch-gear-style/

Technical factors to make automated testing possible

 

A necessary condition for automated tests to exist is to have an architecture that makes writing them easy enough – let’s call it test-friendly architecture. This means a highly decoupled architecture: massive use of dependency injection is a key technique, mocks and stubs are required, onion architecture can also help. The interesting thing here is the reverse effect: once you seriously start your automated testing journey (hopefully at the beginning of the project), you find yourself adding interfaces to write test doubles (to mock the hardware, a remote server, the database…), injecting dependencies, using in isolation just a few classes (to keep the test lights and fast)… and the architecture gets better. Testing the software is much more demanding to the design that all these extension points you consider for the future. As Bob Martin coined it, “The act of writing a unit test is more of an act of design than of verification”.

You need a good continuous integration tool to run the builds. You need enough build agents to handle the load, ideally a cloud with dynamic provisioning. Build agents should be identical, hopefully frequently restored to a reference state to avoid drifting apart. But sometimes differences in configuration between build agents will be necessary to minimize license costs (for example, install expensive tool X used only for C++ static checking on agent A).

build-agents
TeamCity build agents configuration

 

The configuration of the build system (build scripts, build config…) should be kept in source control. Have a policy on this repository so that it is easy to restore the version of the build system to a former release branch.

 

If you want developers to pay attention to broken tests and fix them, they should never break for no reason. A couple of hints to achieve better test predictability :

  • No dependency with time. Since the performance of virtual machine build agents and developers’ machines are likely different, timing will vary in both environments. It’s very frustrating to fix a bug that breaks only in the CI but works on all dev machines. The cure here is to use events to synchronize the parts of your code that need synchronization, but never time (no sleeps). If you really need time, encapsulate it (through some kind of ITimeProvider) so that you can fake it deterministically in your tests (to return always the same known value, for example).
  • Test isolation. Tests MUST be independent from one another. This probably means that the database will have to be re-generated or restored for every module of integration tests. This will take time. But tests failures provoked by other tests are a worse evil.
  • Test predictability. Sometimes an unpredictable behavior enters the code – several consecutive executions of the tests for the same revision of the code don’t yield the same results: test X once passes, once fails. Those unbearable sources of randomness must be mercilessly hunted and annihilated: they will very fast kill the faith of developers in tests – hence the intensity of their effort in maintaining them.
  • Sometimes it’s not easy to spot these sources of randomness, especially if they don’t happen on developer machines. But there’s one thing you can do: put these tests in quarantine – a special build for ill-mannered tests marked by an infamous attribute, a kind of jail where they can be reeducated before they are reintegrated in proper society again. This way, they won’t harm the production of well-behaving tests. And fixing them will be easier if they can be run in isolation, repeatedly, in the build agent environment.
  • Crucial tests. You can identity a couple of crucial tests without which you know your app is broken (such as starting and executing the most typical feature of the app). These tests should be executed with the build and break it when they fail (assuming you have so many automated tests otherwise that they have been distributed over several build projects).

 

The best of all tools to protect tests is the gated commit. Developers submit their branches to the CI, which passes the required build scripts (including static checks and tests). If the build is good, the branch is merged. If not, the developer is notified but nobody else is disturbed (which in itself tames the natural mayhem of large teams with many commits and helps build a more relaxed and focused environment). The amount of tests to put in your gated commit criteria should be as high as your pool of build agents allows you to process in a short enough time – I think 12 hours would be my maximum here, ten minutes would be ideal but difficult to achieve with many tests. If you can put them all and your tests are truly deterministic, the build of the main branch will always be green! But be careful: sources of randomness in tests might block your entire delivery pipeline, make sure they are taken care of.

gated-commit
Gated commit example. Dev 1 and dev 2 create branches from master. Dev 1 tries makes a pull request, which is rejected because the build script detected a problem (for example, a broken architecture rule). Pull request 1’ fixes this. Pull request 1’ is successfully merged into the integration branch, where the build script runs fine: pull request 1’ makes it to the master branch, where everybody can see it. Things are different for dev 2: the build script passes successfully on the branch alone, but fails on the integration branch (for example, a broken test) because of the commits of pull request 1’, which create a nasty side effect (pull requests 1’ and 2 run fine separately, but not together) that dev 2 had no way of knowing, because pull request 1’ had not yet made it to the master branch when dev 2 performed the last rebase onto master. No problem, that’s what the integration branch is for. Pull request 2’ fixes the broken test, and makes it to the master branch.

 

To set up a gated commit, you need a DVCS enough at merging that most automatic merges succeed most of the time (git is pretty good at this) and support from your CI (we used TeamCity and GitHub together to achieve gated commit with limited effort).

Project pace regulation

Faith Dickey in the rain with high heels
Amazing Faith Dickey demonstrating was caution is

Good project pace results of two conflicting forces: market or financial pressure to go fast (typically relayed by management) and technical pressure to do things right (typically relayed by architects and developers).

This conflict is not symmetrical, for several reasons.

  • Management has organizational power and excellent communication skills – compared to developers who tend to emerge from the ideal world of their IDE after hours lost in abstraction in a kind of semi-conscious, almost hungover state, barely able to talk to human beings 🙂 And management is always interested in having more bang for the buck and in shipping earlier to build or strengthen a market position.
  • Technical issues make your product explode on the long run, not today, and as such are easy to sacrifice. By technical issues I mean not only rotting architecture, but also documentation and regulatory issues: taking them too lightly will not cause an earthquake today, but years later. Example of disasters: technical bankruptcy (throwing away that entire unmaintainable code base and starting again from scratch), denial of authorization to sell your medical device by the regulator of a certain market, a patient gets killed using your product. And I have seen managers who, either because of incompetence or of sheer cynicism, are perfectly able to take decisions that have catastrophic long-term consequences for the sake of short-term political advantage – some people are amazingly able to lie their way out of any situation.

This conflict is dissymmetrical but it doesn’t mean that technical people are always right. A friend of mine worked in a startup without adult supervision: developers happily spent three years refactoring the code without adding any new feature. No joking. Three years of getting high with code. Gold plating can go very far. Another story from the trenches: I knew a software architect who convinced his management that building a tool to refactor the code was required (ugly Borland Delphi 6 was really unstable and unproductive) and he spent two years writing this tool alone instead of taking care of the codebase he was responsible for, in particular database concurrency issues that caused much trouble at the end of the project – he clearly worked for interests that were not those of the company, but of personal pleasure. The thing is that usually technical people don’t care much about their organization making money: they just want to enjoy coding well-done stuff and avoid becoming obsolete. If the company goes bankrupt or the project fails, they just move to another company where they shine with these new skills they honed instead of working on what was required to get that project done. Don’t get me wrong: I’m not saying that all developers are selfish and not interested in moving projects forward, but that some of them are, and that there exists a natural tendency to privilege thrill over duty that must be contained.

Fast vs careful

 

A simple model to help find the balance

So how we find the appropriate balance between conflicting forces? It’s not easy to find, it has to be managed. Over the years, I have come up with a model designed to understand and manage each force, and it has proved useful.

Real quality won’t happen by chance, or only thanks to the sacrifice of your teammates workings spontaneously nights and week-ends, but because time has been allocated for it (assuming the right processes and mindset are already in place). On the other end of the spectrum, there has to be a clear focus on delivering story points to avoid getting lost in gold plating. My preferred approach is to set up an iteration model where I allocate time for quality-related activities (stabilization phase size, refactoring proportion during construction phase) and set a story points goal for each iteration to keep everybody focused on delivering customer value.

  • Model for quality forces.
    • Duration of construction and stabilization phases. They might not be constant all project long: as the maintenance burden increases, stabilization phases may get longer. Remember, stabilization is time devoted to bug fixing and documentation. It’s quality time.
    • Proportion of construction time allocated to technical tasks. I’m not alluding to the mandatory technical tasks that deserve user stories of their own (such as sending error reports, writing an installer, have that load test pass), but to unpredictable refactorings. Be careful to think it through. Without dedicated time, refactoring won’t happen at the magnitude required for quality projects. This technical time is also the oxygen skilled technical people breathe: its helps you hire and retain them. Set that value to zero, and you likely will accumulate technical debt and scare away gifted technicians. On the other hand, set it to 100% and the project will stop moving forward. And once again, this value should not be constant: high at project start (say 50%) when frameworks and practices are not established, medium in the middle of the project (say 25%), low at project end when everybody struggles to finish that version (say 15%).
  • Model for production
    • I find it necessary to have a target of accepted story points for each iteration. This is the sum of user stories and technical stories (refactoring and anything related to inner quality that no user will see).
      • It is best measured as the average of the total story points of accepted user stories (accepted by testers, with a little allowance for a few minor bugs) on the last few iterations. This measurement is essential to feed the model with reality (total team velocity captures a great deal of variables that are impossible to model: estimate errors, organizational overhead, tooling problem, motivation, quality of the personnel, maintenance burden, architectural issues…).
      • Aligning the goal to the measurement is a delicate choice if the team produces less than expected: it can be invaluable to predict an accurate project end date, but maintain expectations help fight gold plating tendencies and maintain commitments firmly. In my experience, the target should be maintained just a little above average measured velocity – insufficient production must be fought by the team, not too easily accepted.
      • This target and the proportion of construction time devoted to refactoring make it easy to calculate the estimated user story points target and the budget for technical stories.

 

Simple iteration model

Here’s a sample spreadsheet to help clarify my intent. I’m not saying it’s perfect: you should probably design yours; just consider mine a starting point. Simple iteration management model

 

A model for unknown team velocity

 

Measuring actual team velocity and feeding it into the model is very powerful. But sometimes it’s not practical:

  • When the team is just starting and has no history
  • When projects are long (several years in medical devices development). In particular, the maintenance burden will likely get heavier and turnover will happen.
  • When there are lots of variations in team size. This is especially the case when management asks: “I want this project to be ready by date X, what do you need to make it happen?”. Tricky question: team dynamics are way more complex than a simple multiplication – such as thinking that doubling the headcount will double the throughput. Choose your answers wisely. This is a very strategic issue: if you can predict very early that a project is late and add the necessary staff soon enough so that it is profitable (people should stay at least one year to offset training costs), you will save the deadline. Do it too late, and haphazard late staffing efforts will terminate the team: “adding manpower to a late software project makes it later“.
  • When projects have not started yet. To approve a project, management needs to know its scope, its duration and its cost. The best way to do this with an agile mindset would be to build a product map to have a backlog and estimate user stories to have a project size, then estimate the velocity of the team to deduce time and cost.

 

So I have designed another model to estimate team velocity when there is no empirical data. Here are the main variables:

  • Real time
    • Average daily velocity by developer. The number of ideal days in a real day. Could be 100% if you worked alone in a monastery with absolute concentration, no task switching, and perfect estimation skills. In reality, there are useful and useless meetings, coffee breaks, errors of all kinds. I have made many measurements and find that people are often around 70% in this respect.
    • Working day factor. You have to take into account that a week day is not a working day: people are sick, have holidays, get trained. In my current environment (France, where holidays are plentiful and sacred), people work around 220 days a year. That’s not 52*5=260.
  • Real workforce. Don’t just count people. Take into account:
    • Turnover. I usually count that one person out of ten goes every year, and that it takes six month to replace them (so I loose 5% of the workforce). In other environments, you might have higher attrition rate or shorter recruitment delays.
    • Training. Newbies are not as productive as historical team members. Beginners are often not as productive as principal engineers. There has to be some ramp-up in the workforce when someone arrives (50% productivity the first month, complete productivity after 3 to 6 month depending on experience and the complexity of the work environment).
    • Communication and management overhead. My rule of thumb: every new person in the team eats 20% of the time of the equivalent of a person. One person is as productive as one. Two persons are as productive as 1,8. 6 persons are as productive as 5. This factor is very important when you start computing the effect on the deadline of various staffing scenarios. For very big teams, this factor might be higher.

 

Team velocity estimation model.PNG

Here’s the sample spreadsheet: Team velocity estimation model

 

This might sound a little complicated and over-engineered (don’t complain, I spared you the spreadsheet where I mix both models with release burnup graphs and macros to generate user story cards 🙂 – you won’t need it if you have decent agile tooling, which was not my case). But having sound predictions of what will happen in the future is a prerequisite to act upon that future. When budgets are scary and the deadline is years away, a spreadsheet with many parameters and experimental data will prove a good way to negotiate with top management. And once people realize that what you predicted one year ago proved true, they will listen very carefully on what you say will happen in two years, and maybe grant those two additional developers you need. This might also help you to slow down if quality gets out of hand: the automatic adjustment feature of the two-phase iteration (automatic stabilization phase extension which leads to decreased team velocity on the long run) will help justify why team throughput decreases – quality is just a priority.

JIE 2016 & agile system engineering

Dominique Biraud from the SPECIEF (the french requirement engineering association) kindly invited me to present some of the ideas of this blog, especially those geared towards specification, at the French Day of Requirement Engineering 2016 (JIE 2016).

 

JIE 2016.png
The conference took place in an awesome amphitheater inside La Sorbonne

Here’s the slideshow of what I presented this day (in French): Presentation JIE 2016 – exigences agiles meddev

Worth noting is the fact that several speakers were experimenting with agile methodologies inside their organizations specialized in large systems design (such as turbojet engines or trucks). The driving force is always the same: the software guys went agile and there’s no turning back; the others are wondering what good they can take from these methodologies and how they can adapt to the new situation. Since software is always an important part in systems, this debate will spread and become ubiquitous.

I heard there of a promising attempt to merge agile methodologies and system design: SAFe LSE (Scaled Agile Framework for Lean System Engineering). SAFe is one method for scaling agile practices (others include LeSS and Nexus).

I couldn’t agree more with SAFe LSE manifesto:

 

 

SAFe LSE schema
SAFe LSE overview

Has someone used it in the trenches?

I bet someday this framework (or a similar one) will take over as leading methodology for system design. Definitely a growing field to monitor.

Putting bugs under control

Whats that bug - Anegela DiTerlizzi & Brendal Wenzel
Whats that bug – Anegela DiTerlizzi & Brendal Wenzel

Bugs are bad. And especially for software engaged in serious business – such as saving lives. A few reasons why:

  • Quality medical devices have few known bugs, and of low severity. Thus, having bugs prevents you from shipping. You need to be able to ship at the end of every iteration to your customers (to get feedback), or more realistically, considering integration and product registration, to a system integration team.
  • Bug backlogs are inventories and as such are a form of waste. Lean Software Development advocates having low inventories since a bug left open will incur in additional costs:
    • Bugs in the software might provoke very complex system bugs when mixed with hardware and bioware. These bugs take an awful lot of time to investigate and are often blocking the entire project plan. You don’t want this to happen with a bug known to the software team that could have been fixed a long time ago.
    • Workarounds elicitation and teaching (by documentation or face-to-face) take time.
    • It is always more expensive to fix bugs in the future, when knowledge fades away in people’s heads or vanishes when they go.
    • Bug backlog engineering (prioritization, endless reviews, risk analysis…) typically takes the time of several experts at the same time. A tremendous waste of energy when the bug backlog is large.
    • Bug duplicates imply wasteful investigations. A closed bug has no duplicates.
  • Bugs in medical devices have the potential to do harm to people. As such they must be considered with horror and dealt with accordingly. And the best way is to fix them ASAP. Don’t let them a chance to slip through your processes.
  • I’ve seen projects with a huge bug count (close to 1000) a couple of times. You know what? They never recovered. They stayed at 1000 bug count forever. Maybe because of the cost of all this waste, maybe because it spread a sense of bad quality and failure in everybody’s hearts.
Bug count graph
Bug count graph of a real-life project. After a quick phase of exponential growth, bug count stayed in the six hundreds. In spite of two heroic campaigns of bug fixing, the end was inevitable: the flat part of the curve on the right is the clinical death of the project (brutal end, no production, millions lost).

 

Morale of the story: never get high on bug count, or your feet may never touch the ground again.

Guy swllowing bugs
What happens when you let bugs free…

Conclusion: a good bug is a bug killed. Bug count should be close to zero.

I won’t write about bug detection here, but only about what you do when you know them. Let’s assume you already have a good testing system in place.

 

Bug count threshold and two-phase iteration

So how do we actually manage known bug count? Simple. Set a threshold. Respect it.

  • Set the threshold at the start of the project. Write it down in your Project Plan and have everybody sign it. You’ll still be able to change it, but it’s motivating to give it some official existence.
  • Recommended values for the threshold:
    • More than zero (or you might seriously delay shipping for minor issues)
    • Inferior to a couple of dozens. The maximum threshold will depend on the size of the team and its ability to fix bugs. I suggest the max bug threshold doesn’t exceed what your team is able to fix in a few days if it’s its sole focus.
    • Split limits by bug severity.
    • For example, typical thresholds I use: 0 blocking bugs, 3 majors, 20 minors.
  • Bug count evolution is easier to understand inside the two-phase iteration framework. During construction, you take risks, you build, you refactor: bug count gets high. During stabilization, you stop taking risks, you fix bugs: bug count gets down. There will be a delay due the lengthy manual testing processes: you will discover the real extent of the bug count some time after they are introduced in your code.
Iteration 0 regulation
Total bug count (orange curve) is below the threshold at the end of iteration 0 (inside the green circle)

 

  • What’s important is what you do when the threshold is not respected. My advice: don’t deliver. Hold the version back until more bugs are fixed. You can’t leave into the wild a version that will waste precious integration time or harm patients. You would be ashamed of it. Take the blame for the delay. Put bug count in your information radiators so that everybody gets used to the fact that it’s important. When you have trouble respecting the threshold, talk about it around you and in your team retrospectives. It’s serious. Find solutions.

 

Iteration 1 regulation
Bug threshold is not respected at the end of stabilization of iteration 1 (red circle). An extra stab is added until the quality criteria is met (new green circle).

 

  • What’s also important is what happens to the iteration following the iteration that went wrong. Here’s where the two-phase iteration gets handy. If iteration N has too many bugs and if Stabilization phase N takes 3 more days, Construction Phase N+1 will be 3 days shorter. It means that a few user stories will have to be removed from iteration N+1. It also means that since iteration N+1 is smaller, it should be a little easier to get right, so Stabilization N+1 should run more smoothly. There is an automatic-short-term regulation effect in the two-phase iteration framework.

 

Iteration 2 regulation
Iteration 2 has a shorter construction, with less features, refactorings and bug creation than usual.

 

 

  • On the long run, if you encounter this situation on a regular basis, consider increasing Stabilization phase proportion. That is the beauty of the two-phase iteration: it also embeds a long-term regulation system. Take an extreme example: 1 day of construction, 29 days of stabilization. Plenty of time to fix bugs and get the doc right, no? It should not be a problem. This means that there exists a good proportion between construction and stabilization phase durations that will allow you to finish iterations with bugs below the threshold and documentation in good shape. Your job is to find that proportion.
  • This regulation system is vital to any project. What do you when your car engine gets hot and spits steam? You slow down. The same with a team. If a project pace is so fast that quality gets out of control, you must slow down. Remember the agile belief that quality is not negotiable? Now show your true colors. Negotiate time.
  • By the way, it is quite a logical for a project to slow down after a while, as maintenance effort increases. You might expect an increase in Stabilization size over time.

I’ve used these techniques on projects of a respectable size (several years, several dozen people, several thousand bugs created and fixed) and they have proved to work well: known bug count never exceeded the threshold for a long time.

Startup energy

I attended, thanks to a series of encounters, an incredible pizza brainstorm sessions for startup founders during a CIC Venture Café in Cambridge, Massachusetts.

Thanks a lot to Erik Modahl (a charismatic human catalyst in the CIC) and Randall Bock (Chairman at GainStreet.co and facilitator of the session) for letting me in.

Pizza startup brainstorm

The idea was that every venture founder would express his hottest challenge and gain insights from his fellow entrepreneurs, tips, or even find new partners to move forward.

What struck me was the level of energy, enthusiasm, positive thinking in the room. Ideas were popping everywhere really fast. Everybody was ready to lead a 10,000 people company in ten years.

I’m afraid my teams, in a large company R&D, even though they are pretty skilled and motivated, don’t reach that level of commitment and excitement. What lessons are there to be learned, what could be made better? I see two main angles: autonomy and speed.

Autonomy

The strength of the startup founders (particularly in the early stage, before they get funding) is that they decide everything. In one second, a pivot can be decided, an entire business model can change. No project approval committee to convince. No 8-level approval chain for internal workflows. No game-of-throne-like power struggles to understand and leverage.

Of course, I don’t think that deciding everything alone is a good thing – reviews are good; if you idea can’t resist some criticism, how will it withstand the harsh bite of reality? But large corporations die on spreading decisions (and incidentally accountability ) to too large groups of people. This is especially true in regulated industries, where the norms we abide by have a constant obsession for external reviews, checks and approvals – as if nobody could ever do anything useful in autonomy. This paves the way for a culture of unpowerment (un-empowerment).

What can we do to re-empower people?

normal_camponotus_herculeanus7
Swarms consist of a large number of simple entities that have local interactions (including interactions with the environment). The result of the combination of simple behaviors (the microscopic behavior) is the emergence of complex behavior (the macroscopic behavior) and the ability to achieve significant result as a “team”. http://www.cse.psu.edu/~buu1/teaching/fall06/uuv/papers/Applications%20and%20Programming%20Support/Autonomous_and_Swarms.pdf

I encourage everybody to read again Jürgen Apello’s Management 3.0 book. Autonomy and empowerment are not a hippie thing (even though I believe happiness in the workplace is a powerful driver, but that’s not the topic today): it’s about survival. The best-run complex systems are run by autonomously controlled units. Remember “The best architectures, requirements, and designs emerge from self-organizing teams.” in the agile manifesto? It’s true. It works. The inherent problem with approvals is that people higher in the hierarchy, whatever their talent and experience, have little time; they will always introduce delays, or worse, hasty bad decisions.

As a manager, what can I do concretely to decentralize decisions? It’s very simple. Every time somebody in my teams asks me for a decision on something they should decide alone, I try to refrain my natural impulse to tell them what I think, highlight the fact that it is inside their realm, that they are empowered to decide alone. The are welcomed to ask for advice when they feel unsure (and very often people find alone a solution to their problems by the mere fact of having to explain it clearly to somebody else), but not approval (which means to transfer the decision responsibility to their boss). This is not easy to do. I can’t say I always succeed, especially when I’m tired, stressed or in a hurry. Subordinates have a tendency to act as if you’re right, even when you’re not. It’s rewarding to decide – there must be something in the human psyche that makes power addictive. This tendency must be fought against or every organization will end up hierarchical.

This leaves a delicate question for managers: what do you do with your time when you stop stealing your subordinates’ natural areas of responsibility?

  • Help them. Be a ScrumMaster. Coach. Remove impediments.
  • Think. Now you have time, add value to the whole. Spot roadblocks. Have new ideas. Start new initiatives. Be creative.
  • Reclaim to your boss the areas of responsibility you should have been accountable for while you were micro-managing others.

All I see in this list is geared towards progress. By leaving room for others to grow and provide them the support they need, you naturally are inclined to grow yourself. Jürgen Apello has an excellent metaphor on people management: you should see yourself as a gardener. You choose bulbs and seeds. You design the macro landscape. You provide fertilizer and water, you remove pests and weeds. And then you let the plants grow. Just be sure to treat yourself as a plant in the garden too.

 

Speed

Once you have restored autonomy, half of the work required to speed up things is done. At least the decision process won’t be in the way anymore. So what about the second half?

gryphon_wingsuit_front_view_croped

Automation. Computer programs are fast. They work outside business hours. They don’t catch cold, get hit by a bus or change company overnight (okay, they have bugs and production issues, but these must get fixed). Aside from their benefits in cost and repeatability, I believe their prime benefit is to speed up the enterprise. So automate everything you need. Not only what’s related to the software factory  (continuous integration, automated testing), as I expect more or less everybody these days to be be on this path, but also administrative processes and quality-related records generation.

Speed up the others. On a personal level, I think everybody has a tremendous power in speeding ups its environment and business partners by being disciplined and having a sense of service. Get organized (for example, GTD works well for me). Always make sure you unlock any person waiting for you. When the job is represents a sizeable amount of work, consider splitting it in several iterations that you put to production (and take advantage of getting customer feedback as soon as version 1 is released, by the way).

Simplicity. Have a lean approach on every task – can you make it simpler, faster, cheaper? I heard Gregg Rodne Kirkpatrick say it the other day: “There is always a faster and cheaper way to do things.”

Startups have two special motivations to go fast:

  • Make money before you go bankrupt
  • Take that unique idea to the market before someone else does.

These two drivers are closely linked to survival and trigger very powerful reactions in venture founders’ heads. Large corporations have the second driver, only it’s not as powerful since they usually have several products and will survive if competitors do it first. And what about the first driver? I wonder if it’s possible to create inside larger corporations an environment mimicking startup drivers. Create a temporary group of people around an idea with a fixed budget (including internal staff cost). Name a leader enabled to take all kinds of decisions. When the money runs out, the group is disbanded. If the product gets to the market (or maybe a successful prototype), more funding can take the initiative to the next stage. I heard 3M is more or less organized this way. Has anyone experimented with it in the medical devices industry?