This article is part of a series of posts about medical devices software testing:
- Overview
- Automated tests that run in the build system.
- Automated tests supervised by a tester.
- Manual tests.
Some tests are mostly automated but can’t be run in the build, for two reasons:
- Somebody needs to start them and interpret the results
- They need to run on the target computer and OS
They are the purple ones in the classification of the introduction:

Load test. Load tests are typically a special kind of integration tests. The idea is to ensure that your app works with the expected performance criteria at maximum capacity by testing it in conditions an order of magnitude more demanding. Both performance and capacity criteria should be precisely defined in specifications and related to a real user need, otherwise the temptation to lower the bar will be great. These tests are difficult to make pass and as such deserve a user story of their own. But they are an excellent bug finder: you will encounter those rare conditions that otherwise would happen only at client sites. They are a must have. While I recommend running some load tests in the build (which allows you to break the build as soon as a toxic commit kills performance, or to easily monitor the duration of these tests over time), at some point you must run them on the target hardware and OS and guarantee that the performance objectives will be met in production conditions.
Stamina tests. This time, the device operates at normal capacity, but for a long time. For example, I work on a medical device that is meant to run with one shutdown per week: it has to be tested for one week of continuous work (or more). Other kinds of interesting and surprising bugs can be spotted here, such as tiny memory leaks or hidden timeouts – what happens if the app is left open without activity for more than twenty-four hours? A very interesting way of performing stamina testing is to capture real production activity and to repeat it – in which case it also serves as regression testing.
Automated tests written by testers. Automated testing tools such as Ranorex or TestComplete allow testers to set up tests that exercise the GUI as an end user would and check the display. As such they are pretty global integration tests. They can be run in the build or on the same computer as the medical device, to be even more global tests (which is why I’ve classified them here). The easiest way for testers to write them is to record manual test sessions. But on the long run we have found more productive to write large portions of them directly with code, which helps to reuse them across projects. We have also found that such automated GUI tests could be very handy to automate the setup of manual tests (putting the SUT in a certain state before starting the test). Let’s face it: the entry cost is real. You need to study the market and choose the right tool (vendor lock-in is very likely given the cost of your tests and their adherence to the specifics of the tool), to get acquainted with the tool, to create a team responsible for these automated UI tests, to set up good communication channels between this team and the dev teams to avoid breaking tests when controls are renamed or refactored, to tweak your UI to help the tool find controls, to put everything in the software factory, to link the tests to requirement numbers and get their results into the traceability matrix, to provision the maintenance costs of these tests. But the rewards are great: imagine you could run your whole manual test plan in one night on your servers, imagine you could transform wasteful test execution time into an investment (repeatable test procedures), imagine you could thus get enough time for your testers to perform free testing – wouldn’t that be a lot more productive and interesting? Wouldn’t you find more bugs?

Test with network peers. Modern medical devices can no longer be off the grid: medical and laboratory staff need them to receive orders and send results through the healthcare organization network (with protocols such as HL7, ASTM, DICOM…), they may interact with non-embedded software that is easier to develop and maintain (Data Managers, for example), and they might interact with an IoT solution that helps keep them in optimal operating conditions and add remote services (such as allowing patients to inspect their medical records, or apply artificial intelligence technique to provide clinical decision support). All these wonderful features relying on networks need to be carefully tested. While low-level interactions (such as protocol handling or basic conversations between servers) should be tested in the build to provide fast feedback to developers and extensive edge case testing, at some point the integration must be tested in more realistic conditions: real hardened operating systems, real network cards, real cables or wifi, real firewalls and NATs, real DNSs, real intrusion detection systems, real production server hardware… I’ve classified network peers testing in the semi-automated category since you can typically script part of the test scenario, but will still need some degree of human intervention to set them up, start them and analyze the results (until the day the healthcare industry has fully moved to continuous delivery, but we still have a long way to go when embedded systems are involved).
Security testing. Network peers lead us to cybersecurity. As anything, security doesn’t work until it has been tested. There is much more to security than intrusion testing, but the latter at least guarantees that your security measures were able to repel one professional attacker. It’s doesn’t guarantee your device can’t be hacked, but it says it’s not that easy. To perform the attack properly, the auditor will need access to the device in real conditions (real computer, real hardened OS with real vulnerabilities). Trying to hack a system involves the expert manipulation of tools (such as MetaSploit) that automate attacks or entire catalogs of attacks and gather results to show vulnerabilities to the dev teams. I wouldn’t run such tools in a build system since they can easily compromise your IT infrastructure – they should be run only in a controlled, totally disconnected environment.