At FP Complete we have experience writing Medical Device software that has to go through rigorous compliance steps and eventually be approved by a government regulatory body such as the US Food and Drug Administration (FDA).
In this post we’d like to share some of the best practices and pitfalls we have learned when working in this area.
You may find this blog post especially relevant if you are
but of course you are also invited to read and discuss this topic with us when you are none of these.
Before we get to the problems and best practices, we’ll give some context by describing what a common project setup for Medical Device software may look like.
A common team structure inside a company working on a Medical Device might be:
Together, they want to develop a software product that makes some form of medical statement (which could be a diagnosis of a disease, a forecast of how a patient will react to some treatment, or a recommendation of treatment), or one that takes medical action (e.g. control logic of a physical device performing a treatment or administering a medicine).
A common project history is:
During the productisation phase, you are obliged to operate in “regulation-safe mode”, meaning that all processes and decisions need to be well-informed and documented. They must be able to get through a regulatory audit, if you do not want to be at risk that your product will be denied approval by your regulator and thus cannot be used or marketed.
The regulatory experts on your team will help you with this, telling you what certifications you’ll need to get and in which order to perform which steps. However, they are typically not experts at Software Engineering, and will rarely be able to provide concrete advice on how to do your Software Engineering to support the “regulation-safe mode” as much as possible.
Now that the project setup is clear, let’s get into some best practices to exercise and pitfalls to avoid when working on Medical Device software.
The fields of programming and medical regulation have some overlap in terminology that can result in disastrous miscommunication unless special care is taken to avoid this. For example, “unit testing” may mean two different things from the engineering and regulation perspectives. Your regulatory expert may completely misinterpret what regulatory steps you have already completed when you tell them that you’ve just finished writing some “unit tests”.
Disambiguate it to e.g. “engineering unit tests” and “regulatory unit tests” and enforce across your team that everybody use only these explicitly qualified phrases, and never “unit tests” alone.
A list of terminology that we found ambiguous between engineers and regulatory people includes:
Consequently they may try to apply processes to software that were designed for other products, and do not apply to software.
A common example is the assumption that after the product is “done”, it will never change again. This is a sensible expectation for a drug. Of course this doesn’t work with software: Continuous modifications are needed, already for routine security updates. (You can drastically reduce the frequency of such updates being necessary by using an advanced programming language such as Haskell, which is designed for safety and reliability and thus our language of choice for medical software; however you will never be able to entirely rule out the need for post-release updates.)
While this is obvious and natural to any programmer, it may not be to medical experts, and not understood in many medical companies. You may meet heavy resistance to any form of agile development model, continuous deployment setup, and frequent code changes after the release of the software. You should ensure that you train the managers and medical experts of the project on this aspect of software before you start the project, define clear boundaries between “device-software updates” and “security-software updates”, and set expectations, e.g. that the software may have to be recompiled and re-deployed should a security update for an underlying software library be necessary.
Continuous Integration (CI) means merging everybody’s work together frequently and running automated tests on it. While CI is common in software teams by now, researchers and data scientists may not be used to it. They may be more familiar with the workflow of developing their own, often one-off scripts and programs on their PCs and rarely sharing the code with their team members, instead only sharing the results.
For a regulated project, you should enforce that everyone on the team checks any code ever produced for any purpose of the project, into source code version control. That the results produced by this code should be generated or reproduced on the shared CI servers, as opposed to be generated only on a researcher’s own PC. This ensures that it is recorded which exact code produced which exact results in which exact environment, which helps a lot when making regulatorily relevant statements such as “our experiments have confirmed our thesis X”. It also speeds up development, because everybody on the team can see what everybody else does, or get notified by the CI server when accidentally breaking somebody else’s program or workflow. You should, where possible, refuse to accept results as certain unless you have seen them produced by your CI server, and train everybody on the team how to follow this workflow.
When we as programmers use advanced technical tooling like Haskell, we can easily enumerate the various features that will make the software more correct and reliable. However, these features may mean nothing to a medical expert, and thus may not be easily used by your team for advertising or explaining to a regulator why your software is especially safe. Consequently you should do research on what terms will be understood by medical experts, and map your tools and features into their terminology.
For example, if you use a compiler featuring static analysis, you might explicitly advertise this as a form of “formal software verification”, which is a term most medical experts are familiar with.
Here’s a list of cool tools we’ve used in the past that fall under “formal software verification”:
As a programmer, you should:
|results are identical with gold-standard||results are different from gold- standard|
|commit message does not expect change||good to merge||not good to merge, investigate why results changed|
|commit message expects change||not good to merge, investigate why change didn’t have the desired effect||possibly good to merge, let medical / data science team sign off the changed results, then update the gold-standard outputs|
Note how this is different from engineering unit-testing:
In engineering unit-testing, the programmer defines and understands precisely what the output of the algorithm is for each single test case. In gold-standard testing, the idea is not to understand the output for each input, but to get notified when outputs change (independent of what exactly the outputs look like). Because of this, gold-standard tests are easier to write: They require no thinking effort from the programmer, they only require input data to run on.
Make only controlled changes:
While software engineers love to upgrade their stack and switch tools and processes frequently, medical people tend to hate it. However, there are ways to make them more comfortable with it.
As a product manager or similar role, when you want to make a process change, stick to a predictable order such as:
Here is an example:
Let’s say it is necessary that data scientists switch their working environment operating system (OS) from Windows to Linux so that developers can more easily reproduce their results in the production software.
A lean “DevOps”- only approach usually doesn’t work with researchers.
While developers like to control machines and servers themselves and the team can be made more efficient that way, researchers like to have their heavy machinery moved by people who understand what they are doing.
Thus, as a manager, you should make sure that:
git, pushes things to the wrong branch, and so on.
Define ahead of time what role can block what activity to avoid unnecessary project slowdowns.
As a Project Manager, you should make sure that:
Enforce that all code be checked into version control. Make no exceptions here.
Arrange for personal scrap spaces in version
control, that are clearly marked as not being under the same
scrutiny as “device code”. If you do not do this, researchers and
programmers will not check their experiments into version control,
and the project will suffer. Examples for such scrap spaces are
branches prefixed with
wip/ (for work-in-progress),
In general, always clearly separate device-code and non-device code. This need not mean that they should be in independent source code repositories (as that would forbid ensuring experimental scripts work with the latest version of device-code). Instead, use other explicit means as separation, such as having one directory for device , and one for non-device code.
Relatedly, separate the device from the platform needed to run the device (such as deployment infrastructure and server tools). As mentioned earlier, this is especially important for infrastructure security updates.
You should optimise version control usage for
efficiency. For example: Have branches with a
doc- prefix only run documentation builds, and skip
the big or costly stages other builds may include. People will hate
tools for structured working such as version control and CI if it
makes their workflow slow. Always provide fast ways to do
If possible, use a linear development model in
version control (such as a “rebasing” workflow in
git). In an environment where reproducibility is of
utmost importance, being able to do automatic
bisections to find regressions is more important than
developers having to resolve more merge conflicts.
As a programmer or data scientist,
Don’t write :
TODO: fix this code .
TODO-ENG: Future performance
enhancement: While this computes the correct result and is safe to
use, we should make this faster by doing XYZ.
For each project, define and document clear criteria for labels
For example, you might designate
TODO-ENG as a
label to mean “irrelevant for the medical device operating
correctly, but engineering would like to change this”, and
TODO-DEVICE as a label to mean “this must be
changed before the release or next major milestone on the
roadmap”. You can then ensure before the next milestone that
TODO-DEVICE labels are gone.
Ensure everybody (including regulatory people) know which label means what. Add this information to your documentation. Also see the next point for more on that.
Whenever you make a decision of how things are done in the project, write it down, ideally in version control.
Don’t propagate engineering, review, and other process rules by word of mouth. One way regulators assess you is whether you stick to your own processes; they will not be able to find evidence of you doing so if you haven’t written the processes down.
Only having documentation is not enough. It also needs to be discoverable.
Use simple and
obvious ways for people to find any documentation they might need.
An approach that works well is to place a
in each sub-project’s top level directory (of course under version
control), and link to other documents from this entry point.
Use a simple tagging scheme, such as tags in brackets (e.g.
[ALIEN-SALIVA-DENSITY-ESTIMATION]) that allows you to
place textual anchors and references to them in code and
documentation. This is because linking from documentation to
documentation (which may be easier, e.g. using hyperlinks) is not
enough; you will also need to link from code to docs and from docs
to code (and referring to file name plus line number is obviously
not a good choice given that code can move around).
Medical device software tends to have a lot of documentation, so you will have many links and references in your project. At the time of an audit, you don’t want auditors unable to follow outdated documentation links. Have your tools team write tooling to find dangling links and references, possibly also to produce simple graphs so that you can easily visualise documentation references.
You cannot simply throw a bunch of engineers and researchers together and expect that they will work in perfect symbiosis and produce the desired results.
In many companies, R&D and Engineering may be separate departments that may have developed different ways of working and communicating. This maybe even more true when one of the two sides is brought in by a different company or via contracting . Bringing them together often warrants extra planning and being more explicit than usual when setting up joint workflows.
Make clear that the success of the project depends on the successful interaction between researchers and engineers.
Most importantly, be aware of the the “my side is fine” problem.
Researchers like to think:
These are my preconditions, and they have to be provided by the engineers. If those are provided, we’ll be fine.
Engineers like to think:
As long as I code up these maths written by the researchers, I’ll be safe.
As a result, neither of the two sides makes sure that the critical preconditions that make the system work are actually provided.
To avoid this, you should make sure each side understands the other well, that the interface between them is understood especially well by both, and that they talk often about it. Encourage mutual training: Have Researchers train Engineers to understand their maths, and Engineers train Researchers to read their code.
This is one of the most important bits when trying to make a safe device.
Allow and encourage any form of understanding question. “Is this safe to do, and why?” should be a common thing to be heard and written in your project. Establish that this does not question anybody’s reputation. Employ blame-free evaluation and analysis techniques.
Hopefully you have found these insights useful or interesting.
If you’d like our help with delivering Medical Device software, don’t hesitate to contact us.