Regulators of IEC 62304 have put a lot of energy into normalizing how to handle SOUPs (Software Of Unknown Provenance) for software of classes B and C (software that is in a position to potentially harm people in a non-benign way). The definition says: “Software that is already developed and generally available and that has not been developed for the purpose of being incorporated into the MEDICAL DEVICE (also known as “off the-shelf software”) or software previously developed for which adequate records of the development PROCESSES are not available”. To sum up: everything that hasn’t been built according to the norm.
What I’ve seen in the trenches indicates that this distrust in SOUPs is a bit misplaced: in my projects, carefully chosen libraries contain several dozen times less bugs than home-made code before verification. Why?
Released libraries are finished software.
They are used by many more developers than code being called in only one context and thus have a higher probability that bugs have already been found and fixed.
The rise of open source software along with excellent processes (automated builds, TDD, gitflow with systematic reviews of pull request…) and psychological motivators (the name of developers permanently and publicly attached to every commit incentivizes perfection in code) has dramatically increased the quality of free libraries compared to ten years ago, when 62304 was first released.
But I understand the theoretical need of regulators: if there was no SOUP policy, it would be too easy to pretend that a major part of the code is a SOUP and not apply the regulation at all. I can’t imagine a norm that doesn’t think that what’s coming from outside of its jurisdiction could be better.
Norms are norms and auditors are paid to verify compliance to a norm, not to argue about how well or bad the norm was written. I’ve heard that SOUPs are one of the top favorite areas for auditors to look for defects in your implementation of IEC 62304 (the other one being risk analysis): be warned.
So how do we handle this mandatory and not-so-useful activity? Here are a few hints to maximize productivity.
You need a list of dependencies and their versions. In some programming environments (nuget, bower, npm…), there is a clear list of these dependencies and their versions (package.config, package.json, bower.config…): try to generate the SOUP list from these files.
It’s a good idea to take advantage of this list to perform a thorough inventory of licenses and do what’s required to be clear. For example, many open source libraries require your software documentation (typically online help) to quote them. And maybe you’ll find one or two that’s not free for commercial use and that needs to be replaced – the sooner the better.
62304 requires specifications for SOUPs, including performance criteria. This is a tricky business: some of your SOUPs are a lot more complex than your medical device (the base library of your favorite language, the OS): you can’t possibility retro-spec them entirely. My preferred approach is to document the requirements of the main behaviors of the library that you actually use – a projection of its features on your special use case.
You should always try to wrap the external dependencies of your code in, well, wrapper classes. This prevents this external namespace to creep all over your code. It helps to easily change the library with another functionally similar implementation someday. In the context of SOUPs, the public interface of the wrapper makes very clear which part of the SOUP you use, and which part you don’t. This can serve as a boundary to limit your SOUP specification effort.
62304 requires you to test these requirements. That’s something developers spontaneously do when choosing a library: make sure the library works, test a few edge cases. But you need to do it again every time you upgrade the library. For the latter reason, I strongly suggest unit tests that you can link to the specification (so that they end up in the traceability matrix) and use to test the mandatory performance requirements (for example by using the MaxTime attribute in NUnit). These unit tests will help you make sure the next version of the library works with very little extra effort.
When they are available, you could run the unit tests of the library itself and use their results as a proof of quality. You will still have to deal with writing your own requirements and linking them to the tests. In practice my teams often have had problems with libraries having a few failed tests related to features we didn’t use, which triggered cumbersome justification; in this case we just skipped the library unit tests.
You are required to perform a risk analysis of your SOUPs and add mitigation strategies as required. This is theoretically a good idea, but I’ve often found it very difficult to put in practice with general-purpose libraries, because their impact cannot be bound to a single feature. In some cases – databases, ORMs, mappers – almost any features could potentially be compromised. As always with risk analysis, there is a temptation to assess every possible failure mode, which would lead to an overwhelming analysis that never gets finished. My advice here would be to trust your gut feeling and choose a selected handful of risks where the brainpower consumed performing risk analysis will be most valuable. There are less failure modes in SOUPs than in your code; use your time on the risks that really threaten patients. Don’t get stuck in an impossible thorough analysis of everything that could possibly go wrong in things that are more complex than what you produce.
You are also required to perform a list of known bugs and assess the risk your software incurs because of them. It’s a demanding endeavor: in practice my projects tend to use dozens of libraries, some of them have hundreds of bugs, and others don’t publish bugs at all; when they do, it is often difficult to tell which versions of the library have the bug without testing them. I suggest you don’t waste your time with this before the end of the project because you are likely to upgrade your libraries until then and because more known bugs are likely to be closed with newer versions. The ROI of this activity seems very low. I would be glad if this requirement was stripped out of the next version, or adapted to be more cost-effective.
Operating Systems are a special kind of SOUP. Of course you don’t want to retro-specify and test what it took your vendor decades of work with thousands of developers. But there is an alternative approach. These days, a lot of emphasis has been put on cybersecurity for medical devices – and this is good, patient data is sacred and hackers are on the brink of cyberwar to get it. You must harden your OS – and maybe brand it along the way. My recommendation would be for you to specify, document and test the hardened OS and not the base OS. This way, the OS spec is really useful and has a realistic scope.
SOUPs of SOUPs. Developers often ask how they should handle SOUPs of SOUPs – the dependencies of the libraries themselves. Of course you can’t handle all dependencies recursively, you would be overwhelmed. Treat your direct dependencies; their own dependencies are an implementation detail. The tests that verify the requirements you wrote for what you use in the SOUP will exercise the lines of code of the SOUP dependencies that you actually use. Their possible failure would be ways to trigger the failure mode of the level 1 SOUP that you already considered in your risk analysis; you don’t need to analyze them separately.
Whatever the hardships in producing the required documentation, resist the temptation to code for yourself what others have done. Reinventing the wheel is a waste of time. Remember, your goal in agile development is customer feedback and delight, not library writing. The thrill of writing cutting-edge technical code is what I suspect entices many developers into rolling their own version of existing stuff, and not good project governance; this an area where a responsible mindset – adult self-supervision – is of particular importance. Developing with an agile mindset implies going as fast as you can by removing waste; missed opportunities of good reuse are horrendous waste. Your immature code will always more buggy and more poorly designed than a library written by people dedicated to it, maybe working on it full-time for several years, and that has lived in production for several releases in many different contexts. In this regard I think that the writers of 62304 have done a very dangerous job in discouraging people to use reliable libraries and creating an incentive to write brittle home-made code instead, which would have a very negative effect on overall medical device reliability and safety. A few month ago I stumbled upon a concrete example of this : a developer I know decided to write his own XML generation routine to avoid the lengthy, boring and absurd (according to him) process of documenting an off-the-shelf library. Don’t ever do this. SOUPs are good. Always use SOUPs when they make sense. Accept the pointless burden (automating as much as you can) and write the required doc.
Let me take advantage of this tribune to deeply thank all the open source contributors in the world.
Good architecture is essential in medical software, where it helps, in particular, to achieve safety. But architecture, whatever its excellence in the origin, will degrade over time, just as inevitably as entropy increases, it’s a law of the universe: disorder naturally increases, and software projects are prone to disorder (people come and go, requirements change, interacting systems upgrade, products are launched and abandoned, technologie. What we need is a force to constantly fix it and make it better suited to current conditions. The following is a list of practices that managers can use to build a culture that proms thrive and wane). So we need architecture top-notch but it constantly gets corruptedotes an evolution towards a better architecture.
Refactoring is the key practice that will keep the architecture afloat. It is the recurrent part of the force we need: something that comes back over and over again to fix what appears not so good now. But refactoring doesn’t happen by magic. What I believe can help:
Allocate time for refactoring in every iteration.
It creates a culture where developers know that management cares about architecture. Simple as that. If they pay for it, they care for it. One million time more effective than talk about quality.
Technical debt management: as with financial debts, it comes with interest rates; you better pay your loans on a regular basis or total interest will get sky-high – in the worst case leading to project bankruptcy, where you have to start from scratch again because you code base is no longer profitable given the likely project roadmap.
Risk management. Refactorings introduce bugs in areas of code that were stable before. So they add risk to your project. As always, you’re better off spreading that risk to avoid big surprises. You don’t want to refactor much right before a major release.
Do it now, otherwise you might end up not doing it at all. Don’t wait. The more you wait, the more deadlines and emergencies will convince you to postpone it again. Refactoring is a long-term endeavor. There is no immediate benefit in refactoring. It’s in the “important, non urgent” zone. The difference between good and bad on the long run. And you should do some “important, non urgent” activities every iteration.
Refactoring cost acceptance. If you are always refactoring, senior management will get used to a business-as-usual project pace that includes refactoring. They will accept it. But ask for 3 months of refactoring only with no features, and management (especially if it has no technical background) will likely say NO. Your regular project pace must come with quality included, period – remember we are talking about medical devices?
Provide a good safety netwith automated testing. Developers should be able to run a comprehensive test suite on their refactoring branch, and make sure they didn’t break anything before merging to the trunk. You don’t want them to disrupt the work of others or introduce bugs in the product. I’ve seen projects without tests and where it’s very difficult to predict impact; you know what happens? Developers don’t refactor, or very little. In this sense, automated testing is once again “more an act of design than of verification” (Bob Martin): in addition to favoring loosely-coupled design, automated testing allows design to evolve over time by enabling refactoring.
Don’t get too mad with regressions. You can’t make on omelet without breaking eggs. If a bug made it through your testing process, make the testing process better – but don’t yell at developers. They should feel safe to take a reasonable amount of risk. If they don’t, refactoring stops.
The agile manifesto states that “The best architectures […] and designs emerge from self-organizing teams. “ But when team size exceeds the canonical 7+-2 (typically when several scrum teams have an interaction in creating a bigger product or range of products), I find it useful to entitle architects to perform some key activities:
Settle disputes when consensus cannot be reached. Humanity has invented hierarchies to have power struggles settled once and for all – and not to resume on every design meeting.
Stimulate and validate good design before development. Having architects reject design after implementation is a tremendous waste. There should be a discussion over feature design before coding. I don’t mean formal reviews with design document approvals: a coffee break and a diagram on a napkin should be sufficient when trust is established.
Perform code reviews. They help the architects in know a little bit of everything. They allow mistakes to be spotted earlier. They allow the architects to check that the actually implemented design is what was agreed upon with the developer. If better ideas emerge during review, refactor while the code is still fresh in the developer’s head. Code reviews are an excellent opportunity for mentoring and training: concepts applied to practical cases. It’s good for developers to know that what they commit will be challenged, and that crap cannot make it to the trunk – they will pay more attention. Code review is definitely an activity with an excellent return on investment: many deep things happen in little time.
Maintain theone thousand feet view to add a broader context to design decisions. This is crucial in the architect legitimacy (in addition to recognized technical and social skills): somebody worth talking to to make sure local design (which may be excellent) fits well in the bigger picture. When the codebase gets big, the one thousand feet view will naturally get lost. As with code reviews, maintaining this view means, very concretely, that the architect has budgeted time to take care of the code of others.
Promote code reuse. Developers tend to reuse less than they could. And they can’t reuse something they don’t know about. Once again, the guy who knows a little bit about everything might prove useful.
On the human side of architecture
On the human side of architecture, I recommend the following considerations:
Hire near-architect developers. Make sure, during the recruitment process, that they have good design ideas, that they constantly learn, that they are open enough to understand what other designers think, that they are able to communicate their point of view in an understandable way. Having people with poor design skills and little ability to progress will destroy the architecture which must, to survive, be understood and refactored by every developer in the team. So make sure new recruits will find their way in your project’s patterns and practices. Juniors are a good asset if they have the potential to quickly get up to speed.
Architects will show developers that there is a career path for technical people. This might help fight turnover.
Good practices for spreading knowledge:
Iteration design retrospective: developers explain the design that was actually implemented to their peers, so that everybody has at least a basic knowledge of the recent changes.
TechDays: at a wider scale (scrum of scrums), teams present to others a summary of the global architecture of the component or software they are responsible of. This is also a good moment to share about new technologies that teams might use (for example, yesterday, one of us presented the new features of C# 6 that we recently migrated to, which should be used in our context, and which we should be wary of).
Hire nice architects. It’s quite common to see architects with a bad attitude. Maybe they feel technically insecure and need to show off and snap at other people to protect their realm, slowly falling into the ivory tower syndrome. But, to my mind, being an architect doesn’t mean you have to be the best developer in the house: you must be one of the good AND have social skills: leadership to convince people, openness to incorporate their good ideas into the architecture, enough altruism to take an interest in their work and give them a hand when they need it. If architects are not nice, people will stop asking questions, communication will dry up, and the bad-attitude architects will end up coding some kind of framework in isolation from the rest of the people and only criticize developers during nightmarish code reviews.
Technical debt backlog. Have the courage to recognize when things are bad or not so perfect, and change them. A backlog is good for memorizing refactoring ideas and prioritizing them. As with the product backlog, you will never implement everything. In fact, that would be bad: as with any human activity, some ideas are frankly inappropriate, and should be dumped. So let refactoring ideas mature for some time. The size of the backlog (ideally with estimates) will give you an excellent idea of the size of your technical debt. It should be monitored.
Whiteboards everywhere. Developers should start coding only once they are able to express their design intentions clearly on the whiteboard to an architect and their peers and reach a consensus. Whiteboards are an essential communication design inception tool. If you can’t draw it, it’s not clear enough.
Get external architecture reviews. It will give fresh ideas. Human beings can get used to anything. After a while inhaling a code stench, you don’t smell it anymore.
Ask new developers what they most dislike in the design, and listen to them. It means something. Especially if several of them agree on some issue.
Hire architecture consultants once in a while to tell you what they think. This will give you extra legitimacy to convince your management to finance important refactorings. I’ve had a good experience with such an audit: after an initial denial and rejection phase (it hurts to hear your baby is not perfect!), the team implemented about half of the recommendations, and they proved good on the long run. Some of them were already known to the team, but having somebody else point at them was the sparkle we needed to trigger action.
Use architecture verification tools. My in-house development teams successfully use NDepend and its Code Query Language to write architectural rules (for example: GUI layer cannot access DAL layer, methods and classes cannot exceed a certain size, namespace dependency cycles are forbidden, sub-domain A cannot access sub-domain B…). Once in the build, NDepend will shout when a rule is infringed. So these rules will be strictly abided by (corollary: they must be good, pragmatic rule, or they will cost a lot to enforce; be ready to drop them quickly if costs outweigh benefits). NDepend (as well as other code inspection rules) is so obtuse that developers soon learn they cannot get away with it; they will painfully internalize the rules in such a way that, in the end, the code they produce will no longer violate the rules – they will almost cease to be annoyed by them. So basic rules will be automatically enforced. This is excellent for the architects and their relations with developers: code inspection tools play the bad cop role, architects play the good cop role. Architects help developers solve rule infringements, leading to better team spirit. And architects bandwidth during code review is best used when repetitive stuff has already been taken care of.
Some architectural ideas are quite classical in the medical device software world, and should be considered in the early stages of a medical device project. I collected some:
Split the software to isolate the riskier features. For example, in a radiography device, all code surrounding the manipulation and the radioactive substances, their emission, and alerts around them should be split apart. You want to keep that part small to review it thoroughly and keep complexity low to avoid bugs. In addition, there is a huge overhead implied by class C 62304 requirement – you want to avoid that overhead in risk-free parts of the app.
On the opposite site, risk-free zones (such as the GUI, provided it takes no decision and no memory at all) should be split from the rest to be restarted at will in case of failures. And GUIs are doomed to mutate forever (to stick to UX fashion, and to give marketing opportunities for more product launches) – you don’t want to validate that automation impact on biological phenomenons again for the sake of an update in color palette.
Isolate real-time automation from the rest. Real-time or near-real-time is tough to get right. It will typically require lower-level languages (C, C++, PLC…) and maybe a special OS (RTOS) or RTSS (Intime, RTX, Preempt-RT…).
Having several OS may have an impact on electronics (another PC, a dedicated board…) and production price. This is a far-reaching decision that has to be taken wisely and early.
Low-level languages typically imply a lower productivity (C vs C#). And this part of the code will more or less follow the development lifecycle of the hardware: slow to start, a nightmare to tune and fix with all edge cases and recovery mechanisms, and then nothing – once the device is out, this code will have few reasons to change. But the higher-level part will always be changing – adapting to regulations, markets, healthcare network protocols.
Added bonus: isolating what’s not directly linked to the hardware will make a good basis for reuse on another machine.
Another extra for the road: you will need to emulate the hardware (to simulate rare conditions, to minimize costly and scarce real hardware usage, to speed up tests, to avoid being blocked until the hardware is ready), so have it clearly isolated to mock it through a simple interface.
Isolate components with cybersecurity risks. What’s in contact with networks and USB will typically be the entry point for attackers; therefore, it should have minimum rights – so a successful attackers cannot get much further.
Beware of networks. Calling third-party web services is a nice idea for, say, a climate app. But for medical devices, beware. Imagine there’s an earthquake or a war – a situation where the internet might be working very slowly, and people requiring urgent attention pouring into hospitals. Medical devices have to be working no matter what. So code that Clinical Decision Support algorithm locally.
Isolate tools. I may sound obvious once more. But don’t ship all these R&D tools (simulators, tests, low-level system testing routines…) in your production code. Medical Devices don’t need one more reason to fail. And keep in mind that these devices may be maintained in the field by versatile technicians that may have basic knowledge of computers and mess with the device if they can get a hand on powerful but unsafeguarded programs.
Design for testability. It has become mainstream, but I still see projects who avoid automated testing. My guess is their managers think automated testing is costly – and yes it is! In my experience, deeply automatically tested software costs about twice to implement. But you gain so much in horrible debugging time (who likes debugging? I’d rather write tests…) and by enabling refactoring by providing a safety net. And are we serious about writing safe and reliable medical device software, or are we not? You can’t be if you run away every time a quality-related activity seems costly. But to be profitable (and keep costs in a reasonable zone), automated testing has to be thought on the long run. And code has to be architected in a testable way from the very beginning. My typical advice here would be to heavily use dependency injection, mock all hardware-related components and network interfaces, run them you’re your CI server and integrate the test results into your traceability matrix to give them legitimacy).
Medical Devices are meant to be reliable and safe. Architecture is key to achieve this. A sound architecture driven by risk analysis will mitigate disasters and their consequences. And a good architecture (especially if that quality is maintained over time) will make a software with fewer bugs, easier to spot, and easier to fix without regressions. So how do we approach medical device architecture?
Spend time on initial design. I know, Agile scorns BDUF (Big Design Up Front). I think BDUF should be avoided in the sense of defining UML diagrams for every class in the system needed to implement those 5000 requirements. But if you really want to mitigate risks and maximize reuse, some separations are to be really considered at the very beginning – because they are a lot more difficult to achieve later.
Beware of dreamed features. I’ve been amazed, in the past few years, how much different the actual evolution of the platform I was responsible for was what we thought at the beginning. Projects come and go, partnerships change, markets mutate. So keep YAGNI (You Ain’t Gonna Need It) in mind. But don’t fall in trap of getting blind and miss the chance to prepare for changes that will really happen in the future.
Change the architecture when requirements change. Change the design when developers think a particular area of code is smelly. Nothing is sacred. Everybody gets things wrong once in a while, even rock-star architects.
Prove the architecture
With prototypes. I love the Tracer Bullet project pattern: on your first iteration, implement one only, simplified, core feature of the app, that encompasses all layers and components of the architecture (the metaphor is that this practice, just as a tracer bullet in the army, gives the team enough light to understand the landscape and correct fire in real conditions). Once you have this feature working, you know the architecture works (well, you don’t know yet how it will sustain change over time, edge cases and loads, but you know at least it’s not an impediment to development). Before that, you just hope for it.
With Stress tests and load tests. They are good judges to validate an architecture. I’ve often been flabbergasted by how much harder getting those tests to pass is, compared to what the team expects. You find so many rare, hard to reproduce bugs during those tests. You don’t want them to jump up in production – always at the worst possible time, by their very nature. As those tests might reveal deep problems rooted to the architecture itself, thus very costly to change, they should be performed as soon as possible.
With reliability tests. It’s always interesting to control software behavior when things fail. It is especially important when the software might save someone’s life – or ruin it. You should have strategies for handling every kind of failure: network failure, other software failure, device hardware failure, computer hardware failure, OS failure, power supply failure, cyberattack, and even failure in your own software. It’s not in the norms, but you have to go beyond norms as far as the real thing is concerned. And as everything with software, it doesn’t work until it has been tested. So test it. Make sure you don’t lose data in case of blue screen of death (they can be provoked on demand thanks to special drivers). Make sure you don’t lose data when you turn off the power switch (we had to disable several caches to make it work). Make sure you don’t loose biological result when the GUI goes mad (we set an automated to test to kill the GUI during load test).