I have been working on property testing for years now. My current conclusion is that property testing still has so much uncovered potential that with just some more effort, it can become an even greater tool for software development than it already is today. The concept is relatively young and the design space relatively unexplored. This post seeks to provide an overview of the different approaches and to outline a comparison between the most commonly used libraries for propertybased testing in Haskell. These include two popular libraries: QuickCheck and HedgeHog, and a new approach called “Validitybased Testing”.
The problems with testing
There are many ways to try to make sure that code works. They each have their own tradeoffs and we should consider these tradeoffs carefully. The first step is to consider how expensive breakage would be. The next step is to use the right tools. Use the right programming language, build system, etc, so that code is safe in certain ways automatically. Is it worth it to formally verify all of your code? Maybe, probably not. When you have gone through these steps, you turn to testing, and most likely: unit testing.
By “unit test”, here, I mean a test that checks whether the output of a function, for a single example, is equal to the expected output. An example hspec
snippet in Haskell:
myFunction exampleInput `shouldBe` expectedOutput
There are two standard problems with unit tests:

One has to write many examples to cover all possible code paths. Even when we do write tests that cover all code paths, we may not notice that we have forgotten to write code to deal with edge cases. We call this problem the coverage problem.

Writing these tests is both cumbersome, and usually not considered enjoyable. Unit testing is expensive from a developertime perspective. We call this problem the developer cost problem.
Property testing as a solution to the coverage problem
Property testing is a different approach to testing. The idea is that you can make a test parametric in a certain argument, and the test should still pass for every argument that you pass in. The next question is then: Where do you get these arguments? That depends on the kind of property test:
 Exhaustive property test: Run the test for each element of the argument type.
 Randomised property test: Run the test for N random elements drawn from the entire argument type.
 Exhaustive specialised property test: Run the test for each element of a subset of the argument type.
 Randomised specialised property test: Run the test for N random elements drawn from a subset of the argument type.
An exhaustive property test would be the most comprehensive, but for argument types inhabited by a very large or infinite number of values, like Int64
or String
, this is infeasible. To make the runtime of a property test reasonable, we usually opt for a randomised property test. Randomised property testing probabilistically solves the coverage problem of tests, but it exacerbates the developer cost problem. It is generally harder to come up with a general property of your code than an inputoutput example. You could argue that one would never write as many unit tests as property tests will generate for them, but the point here is about the strict amount of thinking and typing that the programmer has to do to get started. It is a tradeoff.
In this post, we will use the following running example of a property test:
(\str > reverse (reverse str) == str) :: String > Bool
This property states that if you reverse a string twice, then you get the original string. (For the purposes of this posts, we assume that data structures are finite. This is important because testing frameworks require that data is Show
able in order to print counter examples.)
Generators
Randomised property testing requires a way to generate the random values. These generators are responsible for generating the arguments to the property tests. An appropriate generator can often be the difference between finding a bug, and not finding a bug.
For example, when only generating small lists, the following property test may seem to hold:
(\ls > length ls < 100)
It is important to note that not all types allow easy sampling of values. Generating a value of type Word8
is relatively simple: We can just choose a value between 0x00
and 0xff
uniformly at random. However, generating a value from an unbounded type like String
is tricky. Indeed, there is no way to sample a value uniformly from the type String
because there are infinitely many Strings, and there isn’t one obvious way to sample from an infinite set. To solve this problem, generators often use some way to guide how large of a value to generate. This is usually called the size parameter. This size parameter then allows the generator for String
s to generate a String
up to the given size.
Generators can usually also be adapted to the exact test at hand. For example, combinators can usually be composed to build bigger combinators, or filtered to only return the generated values that satisfy a given predicate.
Shrinking
When a property test fails, it is likely that the value that made the test fail is large. This makes it hard to reason about why the test failed. A standard solution to this problem is to try and shrink the value that makes the test fail. The test is rerun with smaller values to make sure that the test still fails with the smaller value. After shrinking, the resulting value should be of a more manageable size.
An example of shrinking
Suppose we believe that the following property holds:
After lowercasing every character in a string, all the characters in this string will be lowercase.
\str > all isLower (map toLower str)
When we run the property test without shrinking, we may get a counterexample like the following:
"As5Enu3auO4"
It is certainly not immediately obvious to me why this is a counterexample for the above property. However, when we rerun the property test with shrinking, the counterexample is shrunk to the following:
"1"
Now it becomes rather obvious why the test failed, if we start debugging as follows:
λ toLower '1'
'1'
λ isLower '1'
False
Indeed, applying toLower
to the character '1'
returns '1'
, but it is also not considered lowercase by isLower
. In this case, the property that we thought would hold just does not hold. We could try to specialise it so that we get a property that does hold, but that is beyond the scope of this example.
Note that the shrunk counterexample: "1"
is not necessarily part of the original counter example. This brings me to my next point:
Shrinking with invariants
It is important that the shrunk value does not just fail the test, it must also have been generatable by the generator that was used. For example, consider the following property test that you may mistakenly write:
For every number larger than six: the number is larger than five and odd. (pseudocode)
∀ n. (n > 6) => (n > 5 && odd n)
This property is designed to be tested with values larger than six. We can constrain the set to draw random values from to the set of integers larger than six, using randomised specialised testing. However, if the shrinking method is not adapted as well, to preserve this constraint on input set, then something unexpected might occur. At some point, the generator might generate 8
as its random integer larger than 6
, and fail the property test. Before this value is reported, it is shrunk to zero because this is the smallest integer, and shrinking might as well try this first. (Note that shrinking is for humans, which is why integers are shrunk to zero, (debatably) the simplest integer, instead of minBound :: Int
, the smallest integer.) Zero still fails the property test, because zero is not greater than five, so we report that this property test fails because zero fails to make the property evaluate to True
. This will make the output look something like the following:
failed: the following number larger than six does not pass the property test: 0
It is therefore important that the shrinking method preserves the intended invariants of the subset that one wants to test. A value should only ever be shrunk to a smaller value that is still within this subset.
Quickcheck approach and problems
The QuickCheck library was the pioneer of randomised property testing in Haskell. QuickCheck makes use of a type class called Arbitrary
.
The Arbitrary
type class contains two functions:
arbitrary :: Arbitrary a => Gen a
to generate random values.arbitrary
is the default generator to use to generate values of a given type.shrink :: Arbitrary a => a > [a]
to shrink a given value.
You can run a quickcheck by passing something Testable
to the quickCheck
function. Something Testable
is usually of the form FooBar > Bool
where FooBar
is a member of the Arbitrary
type class. Quickcheck then generates random values of type FooBar
and evaluates the property with it to check that the resulting boolean is True
.
The example property of the reverse
function is written as follows, using QuickCheck:
λ> quickCheck $ \(str :: String) > reverse (reverse str) == str
+++ OK, passed 100 tests.
The QuickCheck approach has certain tradeoffs, some of which we’ll discover and analyse below.
Property combinators for nonspecialised randomised property testing
To test whether a given binary relation <*>
is reflexive, we can write a property test as follows:
\x > x <*> x
If we want to test reflexivity of many binary relations, we may write a function that takes a binary relation and produces a property. Such a function is called a property combinator:
reflexive :: Arbitrary a => (a > a > Bool) > Property
reflexive rel = \x > x `rel` x
Now we can use this property combinator to test reflexivity of any binary relation for which the arguments are a member of Arbitrary
.
Note that there is an important limitation to this approach. This property combinator can only test reflexivity of a given binary relation if the relation is reflexive for ALL possible values that may be generated by the arbitrary
generator.
This is already a problem with Rational
(from GHC.Real
) and Eq
, for example. The Eq
instance for Ratio a
asssumes that values are normalised:
> 2 :% 2 == 1 :% 1
False
If arbitrary :: Gen Rational
ever generates unnormalised values, then we can no longer test properties which require that (==)
works as if the values are actual ratios, even if the property holds for all values of Rational
except unnormalised values. This brings me to the next point
No laws or semantics for the Arbitrary
type class
To continue with the previous example of Rational
. How should we implement the Arbitrary
instance for Rational
:
 Should we ever let it generate
0 :% 0
?  Should we ever let it generate
1 :% 0
?  Should we ever let it generate
5 :% 1
? How about5 :% 1
?  Should we take the size parameter into account?
There are some considerations here:
 If
arbitrary :: Gen Rational
never generates unnormalised values, then property tests about functions that deal with(==)
assuming that values are actual ratios will never test what happens for this edgecase. Maybe they work, maybe they do not.  If
arbitrary :: Gen Rational
generates unnormalised values, then we cannot test properties which require that(==)
works as if the values are actual ratios, even if the property holds for all values ofRational
except unnormalised values.
There is a solution for problems like this in QuickCheck
: Using newtypes to guide generation. For example, the Positive a
newtype wrapper has a specialised generator to make sure that arbitrary :: Positive Rational
only generates positive Rational
values. One could imagine a newtype wrapper like Normalised a
that could have a specialised generator to make sure that Normalised Rational
never generates unnormalised values, if arbitrary :: Gen Rational
is implemented to sometimes generate unnormalised values.
However, this newtype approach has an important limitation as well. The property test to test for reflexivity would now look something like this:
reflexive (\(Normalised nn1) (Normalised nn2) > nn1 == nn2)
(Note that you cannot just supply a custom generator if you want to use property combinators that assume an Arbitrary
constraint.) This is rather cumbersome. It is a direct result of the fact that values like 5 :% 5
are not considered valid values. Who decided that these values are invalid even though they are certainly members of their type? It is the programmer who wrote the GHC.Real
module and decided that Ratio
values should satisfy the invariant that they are normalised. These invariants are not statically checked. QuickCheck’s Arbitrary
typeclass makes it cumbersome to work with values for which not all values are considered valid values.
Expensive generators and shrinking
The authors of QuickCheck chose to not have a default implementation for Arbitrary
. This is because there are certain problems with the automatic generation of Arbitrary
instances.
The QuickCheck documentation mentions that:
There is no generic arbitrary implementation included because we don’t know how to make a highquality one. If you want one, consider using the testingfeat or genericrandom packages.
There are multiple seperate libraries to allow programmers to have GHC implement Arbitrary
instances automatically using Generic
instances. However, using one of these is not common practice.
As a result, there is no default implemetation of arbitrary
and the default implementation for shrink
is const []
. This means that by default, shrink
is never overridden with a more sensible implementation and counter examples are never shrunk.
Because programmers have to implement generators and shrinking themselves, using QuickCheck (and Arbitrary
in particular) can be considered expensive (in developertime) and frictionfull.
Orphan instances for Arbitrary
A minor problem, but an annoying one nonetheless, is the question of where to write the Arbitrary
instances.
 If we write the
Arbitrary
instances next to the data types, then the entire library that we are writing must depend onQuickCheck
.  If we write the
Arbitrary
instances in the test suite, then they are orphan instances and we have to rewrite them for every test suite.  If we write the
Arbitrary
instances in a seperate library, then we can depend on this library for theArbitrary
instances, but this adds maintenance overhead.
The most commonly chosen solution seems to be to put Arbitrary
instances in the quickcheckinstances
library, but this has another problem: now the test suite depends on extra libraries for which the instances are in quickcheckinstances
, that the test suite may not be using at all.
Hedgehog as a solution to QuickCheck’s problems
Hedgehog is a newer way to do property testing.
The example property of the reverse
function is written as follows, using Hedgehog:
import Hedgehog
import qualified Hedgehog.Gen as Gen
import qualified Hedgehog.Range as Range
prop_reverse :: Property
prop_reverse =
property $ do
xs < forAll $ Gen.list (Range.linear 0 100) Gen.alpha
reverse (reverse xs) === xs
Hedgehog generators
Hedgehog uses a different implementation of generators. In particular, hedgehog generators have builtin shrinking, so that every generated value can automatically be shrunk without any extra developer effort to implement a shrinking function. This builtin shrinking also automatically takes care to ensure that shrunk values obey the same invariants as the generator.
Whereas QuickCheck declares “shrink” as a function, in Hedgehog generating and shrinking are performed simultaneously. Instead of generating a single value, we can generate a rose tree of them, where the main output is at the root, and the other nodes carry shrunk versions of it, with the smallest ones at the leaves.
Generators are defined more generally, to allow for monadic effects during generation. As such, generators are not seperated from properties entirely. Indeed, as you can see in the prop_reverse
example above, the generation of data and testing of properties can be interleaved monadically.
Suboptimal shrinking
Hedgehog explicitly makes a tradeoff when it comes to shrinking. The Hedgehog authors chose to make shrinking suboptimal in order to build it into generators.
The following is an example of such a tradeoff. Here Hedgehog is generating two characters that are equal, by generating them independently and then filterig the equal ones. It is a contrived example, admittedly, but it makes my point.
λ > let gen = Gen.enum 'a' 'b' in Gen.printTree (Gen.filter (uncurry (==)) ((,) <$> gen <*> gen))
=== Outcome ===
('b','b')
=== Shrinks ===
('a','a')
('a','a')
('a','a')
As you can see, Hedgehog correctly shrinks the generated value ('b', 'b')
to tuples with equal characters ('a', 'a')
, but it shrinks to the same value many times. This problem grows as the generated datastructure grows in size, and as the amount of options to try while shrinking grows. This is the main drawback with integrating shrinking with generating: It requires some duplicated work during shrinking.
Side note: QuickCheck handles this case wrong as well, but in a different way:
λ filter (uncurry (==)) $ QC.shrink ('b','b')
[]
Spoiler (see the rest of the post): genvalidity
handles this case correctly.
λ filter (uncurry (==)) $ shrinkUnchecked ('b', 'b')
[('a','a')]
Even more expensive generators
Hedgehog does not use any type classes for generators like QuickCheck does with Arbitrary
. Instead, every generator must be a manuallywritten value.
This means that Hedgehog does not have any problems with orphan instances or with a lack of laws or semantics. However, it also means that writing a generator for a value in Hedgehog now requires even more developer effort.
More cumbersome property combinators.
It is still possible to write property combinators for Hedgehog properties, but now the programmer will have to explicitly pass in a given generator:
associative :: Gen a > (a > a > a) > Property
This makes writing property combinators more cumbersome, and tests using property combinators more verbose.
Validitybased Testing as a solution to the problems of both QuickCheck and Hedgehog
Validitybased testing attempts to use the advantages from each of both QuickCheck and Hedgehog and make a different tradeoff. It focuses on making property testing cheap (in terms of developer time) and removing as much friction as possible.
The goals are:
 Free generators
 Free shrinking
 Cheap properties
The example property of the reverse function is written as follows, using the validity
based libraries:
forAllUnchecked $ \str > reverse (reverse str) == (str :: String)
The GenUnchecked
typeclass in the genvalidity
library for free generators and free shrinking.
The GenUnchecked
type class defines both a generator: genUnchecked :: Gen a
and a shrinking function shrinkUnchecked :: a > [a]
, just like QuickCheck does with Arbitrary
.
Unlike QuickCheck’s Arbitrary
, GenUnchecked
has very clear semantics: Any value that can possibly be represented by the type should be generatable (excluding bottom values and infinite values). Because of these very clear semantics, the implementations of both the genUnchecked
and shrinkUnchecked
functions are generated via the Generic
instance by default. This means that the programmer never has to write such a generator themselves, and shrinking comes for free as well.
The validity
library to define invariants explicitly
The next step for validitybased testing is to assert that types with internal invariants are as natural as types themselves. Indeed, types exist to constrain the possible values of a given variable using compiletime checks whereas runtime invariants exist to constraint the possible values of a given variable using programmingtime cleverness. This will be important later, to further remove friction, but requires a bit of extra upfront thought.
The validity
library allows programmers to explicitly define, in code (instead of documentation), what it means for a value of a given type to be valid. Types that have an instance of the Validity
typeclass implement the validate :: a > Validation
to explain all the reasons why a value may be invalid. The semantics of this type class is that an invalid value of any type should never exist at runtime, otherwise the programmer has made a mistake.
For example, the type Map k v
has an internal invariant: it should always have Data.Map.valid :: Ord k => Map k v > Bool
evaluate to True
. Its Validity
instance defines this explicitly:
  A 'Map' of things is valid if all the keys and values are valid and the 'Map' itself
 is valid.
instance (Ord k, Validity k, Validity v) => Validity (Map k v) where
validate m =
mconcat
[ declare "The Map structure is valid." $ M.valid m
, delve "Map elements" $ M.toList m
]
This implementation checks that the internal map structure is valid: M.valid
, and that the individual elements of the map are valid values by themselves as well.
The validate
function does not just check whether a value is valid, but also reports all the reasons why a value may not be valid. This makes for very useful error messages. For example:
newtype Prime = Prime { unPrime :: Int }  INVARIANT: isPrime
instance Validity Prime where
validate (Prime p) = declare "The number is prime" $ isPrime p
λ let Left e = prettyValidate [Prime 2, Prime 3, Prime 5, Prime 6]
λ putStrLn e
The element at index 3 in the list
\ Violated: The number is prime  Aha!
The default implementation of Validity
uses a type’s Generic
instance to derive an implementation that defines a value to be valid if all of its subparts are valid. This makes it frictionless to instantiate Validity
for types of which the validity of its values depends only on its subparts.
The GenValid
typeclass in the genvalidity
library for free generators and free shrinking for valid values.
To generate valid values, genvalidity
has the GenValid
type class. This type class also has very clear semantics.
 The
genValid :: Gen a
generator must only ever generate valid values.  The
shrinkValid :: a > [a]
shrinking function should must only ever shrink valid values to valid values..  The
genValid
generator should be able to generate all possible valid values that can exist at runtime.
These semantics also allow for default implementations:
genValid :: GenValid a => Gen a
genValid = genUnchecked `suchThat` isValid
shrinkValid :: GenValid a => a > [a]
shrinkValid = filter isValid . shrinkUnchecked
This makes it relatively frictionless to generate valid values as well.
Note that if there are not many valid values of a given type, or if checking validity is expensive, the default genValid
implementation using suchThat
may be too slow, and a programmer may have to override it with a faster implementation. This is the tradeoff that genvalidity
makes to remove friction, but there are helper functions in genvalidity
to write faster generators too.
As a result, the following is all that’s necessary for a programmer to write a custom type with generators, and start writing property tests:
{# LANGUAGE DeriveGeneric #}
module Example where
data MyType = MyType
{ myBool :: Bool
, myRational :: Rational
} deriving (Show, Eq, Generic)
instance Validity MyType
instance GenUnchecked MyType
instance GenValid MyType
Overriding genValid
to make it faster
While genValid
is always correct by default, it can sometimes be too slow. In such cases we want to override genValid
to make it faster, but then it is up to us to make sure that it is still correct. When overriding genValid
, it is advisable to add a test that makes sure that genValid
still only generates valid values It is also important that genValid
can generate any value that would be considered valid. We cannot test for this, so it is up to the programmer to maintain this property of the generator.
As an example, consider a newtype Prime = Prime Int
with a Validity
instance that describes that the integer inside is prime. The genUnchecked
generator is generated using the Generic
instance, and it works exactly as it should. The default implementation of genValid
is equivalent to the following:
genValid :: Gen Prime
genValid = genUnchecked `suchThat` isValid
There are two ways to speed up a generator that uses suchThat
:

Generate values that are automatically valid, and don’t filter them using
isValid
afterward. This is particularly useful if theisValid
predicate is expensive to compute. In the case ofPrime
: primality testing is relatively expensive. 
Generate values that are more likely to be valid than the ones generated by
genUnchecked
and then filter them withisValid
. This is particularly useful if there aren’t many valid values. In the case ofPrime
: the likelihood for an arbitrary number goes down as the number grows larger.
In the case of the Prime
example, we could speed up genValid
by replacing genUnchecked
by a generator that either generates 2
, or an odd number. The result would look like this:
genValid :: Gen Prime
genValid = (oneof [pure 2, (\x > 2*x + 1) <$> genUnchecked]) `suchThat` isValid
Cheap properties using property combinators from genvalidityproperty
Once these type classes have been implemented, a programmer can use the many property combinators from genvalidityproperty
and even test suite combinators from genvalidityhspec
.
 # Property combinators
 Test whether two functions are inverses
inverseFunctions :: (Show a, Eq a, GenUnchecked a) => (a > b) > (b > a) > Property
 Test whether a binary relation is symmetric
symmetry :: (Show a, GenUnchecked a) => (a > a > Bool) > Property
 Test whether a binary operator is commutative:
commutative :: (Show a, Eq a, GenUnchecked a) => (a > a > a) > Property
 Test whether a function maintains validity
producesValidsOnValids :: (Show a, Show b, GenValid a, Validity b) => (a > b) > Property
Each property combinator in genvalidityproperty
comes in multiple variants:
symmetry
for unchecked valuessymmetryOnValid
for values generated byGenValid
symmetryOnArbitrary
for values generated byArbitrary
(for compatibility with QuickCheck)symmetryOnGens
for values generated by a given generator
Cheap properties using test suite combinators from genvalidityhspec
With the TypeApplications
extension that is new in GHC 8.0, we can now write entire test suites that are parametric in type argument (without the use of proxies). These socalled test suite combinators are of the form testSuiteCombinator :: forall a. Spec
.
The genvalidityhspec
library and its companion libraries already contain many test suite combinators.
 Test whether a custom Eq instance of a type makes sense
eqSpec :: forall a. (Show a, Eq a, Typeable a, GenUnchecked a) => Spec
 Test that a Monad doesn't violate the monad laws
monadSpec :: forall (f :: * > *). (Monad f, Eq (f Int), Show (f Int), Typeable f, GenUnchecked (f Int)) => Spec
 Test whether a lens doesn't violate the lens laws.
lensSpec :: forall s b. (Show b, Eq b, GenUnchecked b, Validity b, Show s, Eq s, GenUnchecked s, Validity s) => Lens' s b > Spec
Each property combinator in genvalidityhspec
comes in multiple variants:
eqSpec
for unchecked valueseqSpecOnValid
for values generated byGenValid
eqSpecOnArbitrary
for values generated byArbitrary
(for backward compatibility)eqSpecOnGens
for values generated by a given generator
Conclusion
Property testing can be a great tool in your quality arsenal. This point has outlined the major similarities and differences between property testing tools. You should now be more equipped to make the necessary tradeoffs between these tools, but more important than trying to optimise your usage is that you give property testing a try.
Below are some exercises to get you started with Validitybased testing. validity
and the related libraries can all be found on Stackage.
Exercise: Odd
Objective: Generate valid Odd
values using genValid
by completing the following piece of code.
newtype Odd = Odd Int  INVARIANT: The Int must be odd
Solution:
instance Validity Int where
validate (Odd i) = declare "The Int is odd" $ odd i
instance GenUnchecked Odd
instance GenValid Odd
Exercise: JSON
Objective: Test whether the JSON instance in the following piece of code works, using genvalidityhspecaeson
, by completing the following piece of code:
data MyType = MyType
{ myBool :: Bool
, myRational :: Rational
} deriving (Show, Eq, Generic)
instance FromJSON MyType where
parseJSON = withObject "MyType" $ \o > MyType <$> o .: "bool" <*> o .: "Rational"
instance ToJSON MyType where
toJSON mt = object ["bool" .= (myBool mt), "double" .= (myRational mt)]
Solution:
data MyType = MyType
{ myBool :: Bool
, myRational :: Rational
} deriving (Show, Eq, Generic)
instance FromJSON MyType where
parseJSON = withObject "MyType" $ \o > MyType <$> o .: "bool" <*> o .: "Rational"
instance ToJSON MyType where
toJSON mt = object ["bool" .= (myBool mt), "double" .= (myRational mt)]
instance Validity MyType
instance GenUnchecked MyType
instance GenValid MyType
spec :: Spec
spec =
describe "MyType" $
jsonSpecOnValid @MyType