You’re implementing a new IT
system, an app, a DevOps deployment, a DataOps system to merge your
data feeds. And it seems to work all right. So, are you done? How
assurance work (QA)
should you do before calling it finished?
Quality assurance includes
testing, but it also means using organized processes to ensure your
system works correctly and predictably, in success and in error
situations. If you don’t have a definition of your system’s correct
behavior, or a list of all the features supposed to be present; if
you haven’t kept track of changes; or if you don’t have
reproducible tests that verify your system’s actual behavior -- can
it’s right? Or are you just
hoping there aren’t mistakes waiting to be discovered
the hard way? Does “quality” just mean wishing for the
There’s a reason we say
quality assurance. The opposite of QA isn’t bad quality, it’s unknown quality. Maybe your system is great, even
flawless. Maybe it’s full of bugs -- serious or inconsequential.
Who knows? And what are
the consequences of not
We don’t need to be zealots --
there is no one universal right amount of quality assurance. Many
low-impact projects don’t need much. Many teams decide correctly to
invest less in QA than they could, simply because their priorities
lie elsewhere. Tracking and verifying the expected behavior and
actual quality of a complex system is, frankly, work.
Nevertheless, many projects
deserve more QA than they are getting, and the decision isn’t being
made in a business-like or engineer-like manner. People are
improvising and hoping, doing some sketchy testing and, seeing no
flames or smoke, moving on. As with security, organizations then
learn the hard way: underinvestment leads to very costly problems
There is no single right amount
of quality assurance
In real life
the correct investment in QA
depends on the cost of encountering a problem later.
If a defect that
might exist would take down your company, you should
spend whatever’s needed to find and eliminate it (or make sure it
wasn’t there). By contrast, if a defect would have no real effect
on anyone, why spend time and money on QA at all? QA isn’t about
excessive testing and tracking -- it’s about the right amount of testing and tracking. “Process” isn’t a
dirty word, but it’s also not a blind ritual or a cult -- it is
for successful IT and successful
A surprise defect appearing in
production is trouble, but how much trouble, and at what likelihood and cost?
Nearly all projects live between the two extremes of “don’t
promise, don’t look” and “test everything every time you move a
finger.” If we don’t have an explicit discussion about quality,
defects, the unknown, and all their consequences, we can’t decide
what to spend on quality assurance.
Quality assurance plan: a
Here at FP Complete we’ve
described a simple quality assurance scale to ease such discussions. Projects can range
from QA level 0 (can’t make assertions about the quality) to QA
level 5 (would bet human lives on the system working as
Have a look at these, and ask
whether your QA level today is in line with your real business
goals, and with the price of failure in your business.
Level 0: No quality
There’s no specific attempt to
know whether the code or installation is suitable for use. Maybe it
works correctly, maybe it doesn’t. Such work should be labeled as a
“draft” or “prototype” or “unfinished preview” or similar
Suitable for: projects where
complete failure is acceptable, such as learning
Level 1: Minimal QA
No formal quality vetting is
included in the project requirements. The work has been inspected
by its creator (the engineer or technician who implemented the
solution), the basic functionality has been tried and appears to
work correctly, and the creator believes it is suitable for use.
However, no record clearly shows that the required functionality
has been implemented and works as intended.
Most often, a list does not even
exist stating all the required functionality; or if such a list
does exist, it may not be complete and up-to-date. Lacking such a
list, it is literally not possible to show whether a quality bar
has been met or not.
Suitable for: prototypes, or a
low-budget situation tolerant of failures that can be corrected
after the fact.
Level 2: Traditional
We call this “traditional”
because it characterizes what most companies do on most
because we think it represents the
usual correct decision.
The key functionality points
(such as features or commands) have been listed in a document, and
for the most important ones, a documented test (manual or
automated) for a typical success case is implemented, and is run. A
report or log of these tests and runs is kept and can be provided
to the customer, to show how well tested the work is. Source code
is maintained under version control with a clear history. Bugs and
unfinished work are tracked in an issue-tracking system with a
Suitable for: production use in
a risk-tolerant situation, in which failures are undesirable but
can be tolerated and repaired. (Especially for low-volume
production use, where the limited number of runs means any given
bug may not be triggered, or may affect only a small number of
users.) Also for high-grade experimental use.
Level 3: Commercial
In addition to everything in
level 2, the key functionality points and most of the secondary
functionality points are documented and tested, each in at least
one success case, many in at least one failure case, and where
appropriate, in corner and/or fuzz (randomized) cases. Where
appropriate, load/stress/capacity testing is applied. Where
relevant, basic automated intrusion testing is applied. My previous post about
testing explains more
about these different kinds of tests you may want to
At this level, using more than
one pair of human eyes is also appropriate. Documentation describes
the basic design of any complex solution, and is reviewed by an
engineer who did not write it. The implementation is reviewed (such
as code review, or examining a cloud configuration) and approved by
an engineer who did not create it.
Suitable for: production use in
a routine business situation, with reduced tolerance for risk,
where failures may involve costly consequences that could easily
exceed the costs of proper QA work. For example, a production
system that will be used by the majority of your customers, such
that a missed defect could seriously impact your revenues or your
Level 4: Sensitive
In addition to everything in
level 3, all
exposed functionality points are
documented and tested in success and (where applicable) various
failure and corner cases, or any omissions explicitly listed. A
report is provided showing when each test was last run and against
what build, and whether it passed, with any not-run or not-passed
tests having a written explanation. Version control logs and
issue-tracking logs show the linkage between checkins, tickets, and
documented specifications or requirements. Thorough design
documentation is included, suitable for review by a third-party
auditor. All code is reviewed by at least one engineer, and
suggested clean-ups and refactorings documented and (where
Suitable for: production use in
a risk-intolerant situation, in which a single failure could have
severe costs greatly outweighing the cost of proper QA, such as a
major business process automation, a real-time pricing system, a
high-volume website, or a risk-management system. Also for: formal
audit or regulatory inspection, where an external regulator must be
shown that the system has met firm criteria for controls and
reproducibility, such as an FDA Class II medical device.
Ask yourself “Will we have to
interrupt major business if this system doesn’t work as intended?”
If so, you want at least this level. Ask also “Will we get in
trouble if we cannot prove we were very careful?” If so, you want
at least this level. On your way there, you can prioritize which QA
features fit your budget and your business priorities.
Level 5: Critical
In addition to everything in
level 4, every exception (such as an untested case or a case that
doesn’t pass all its tests) is reviewed by an expert committee and
either sent back for correction, or waived with a formal sign-off
regarding its known acceptable safety impact. Tests of failure
modes, and system survivability after severe load and
failure-driven stress events, are routine. Attacks, randomized
inputs, and defective data are routinely included in the automated
test suite. Where available, formal methods, source code analysis,
and execution profiling are also used. Even the requirements
themselves are subjected to scrutiny regarding their validity and
thoroughness. The toolchain is required to provide complete
traceability of any work present in the final product. An
inspection is done to ensure that no feature or case, explicit or
implicit, has gone unlisted and escaped the QA process.
Suitable for: projects in which
a failure could result in serious bodily harm, death, or another
intolerable loss. For example, real-time manned vehicle control, or
an FDA Class III medical device. Or an automated investment system
that trades hundreds of millions of dollars without human
Deciding on a Quality Assurance
So quality assurance can range
from quite trivial, to a powerful structure that can heavily affect
your engineering process, culture, schedule, and budget. That’s an
amazingly large range. How to choose?
Consider these key
is the total cost to our company of the routine defects we tend to
see? Is it more than improving our QA would cost?
would be the total cost to our company of a really bad defect that
was released unknowingly? Is this cost, weighted by its
probability, worse than improving our QA?
we doing so much QA that it’s hampering our ability to operate the
business? That is, are we letting the quest for perfection block
the release of product that’s good enough for our users and
we meeting all regulatory and contractual requirements? Will that
be agreed if a regulator or a lawsuit causes our engineering to be
inspected, or our records to be reviewed?
- Do we
have enough education in testing and project management that we
improve our QA? If not, do we have
access to experts who can teach us, and/or the ability to hire
staff with the needed background?
our top management and our culture support quality as much as they support getting new
features out the door, or is this an evangelism situation requiring
In the real world you will
always hear voices saying “hurry up and get done, at less cost” and
other voices saying “only the best will do for our users and our
data.” These are all important values, and there is a never-ending
balancing act to get enough of all of them. Be thoughtful and wise,
and don’t assume that “out of sight currently” means “doesn’t
How to get the discussion
Here at FP Complete, we have the
good fortune to serve lots of very smart companies’ engineering and
IT teams, from FinTech to medical devices to secure online
services. We see the results of QA first-hand, and we see how
necessary it is. A focused, intentional, best-practices approach to
QA is how companies get better results from their technology
I recommend raising the issue of
quality assurance at your weekly/monthly team meetings and
management meetings. Rather than assuming one right approach, raise
questions and start the discussion. How much QA is right for this
team, and this project? Here are some ideas for
know we try to ship a quality product. How confident are we that
unknown defects aren’t lurking in there? What would it take to do
have been scary stories lately about companies with security holes
and other costly bugs. How can we make sure we’re not one of those
stories next year?
built a culture that focuses on innovation and meeting user needs.
But have we also focused enough on predictable quality? Are we
sometimes improvising where we should be engineering?
seem to be a lot of people out there who know much more about
quality than we do. Can we afford to leave things this way? Is
quality a competitive vulnerability where another firm could start
upsetting and stealing our users, or make us look bad?
invested heavily in quality assurance. Is it time to institute a
decision-making path, to ensure that QA isn’t too heavy on one
project, or too light on another?
we turn great quality into a competitive advantage?
Quality assurance is an
evergreen topic -- always relevant, and likely to gain in
importance for a long time. Let’s all give it the attention it
Do you like this blog post and need help with DevOps, Rust or functional programming? Contact us.