24 Jan 2017

Haskell is an amazing language. With its extremely powerful type system and a pure functional paradigm it prevents programmers from introducing many kinds of bugs, that are notorious in other languages. Despite those powers, code is still written by humans, and bugs are inevitable, so writing quality test suites is just as important as writing an application itself.

Over the course of history buggy software has cost industry billions of dollars in damage and even lost human lives, so I cannot stress enough, how essential testing is for any project.

One of the ways to test software is through writing unit tests, but since it is not feasible to test all possible inputs exhaustively for most functions, we usually check some corner cases and occasionally test with other arbitrary values. Systematic generation of random input, that is biased towards corner cases, could be very helpful in that scenario, and that's where QuickCheck comes into play. This state of the art property testing library was originally invented in Haskell, and, because it turned out to be so powerful, it was later ported to other languages. However, the real power of random testing is unleashed when combined with purity of Haskell.

## Properties

Let's start by looking at this exemplar properties of a `reverse`

function:

reverse (reverse xs) == xs reverse (xs ++ ys) == reverse ys ++ reverse xs

We know, that they will hold for all finite lists with total values. Naturally, there are ways to prove them manually and there are even tools for Haskell, such as LiquidHaskell, that can help you automate proving some properties. Formal proof of correctness of a program is not always possible: some properties are either too hard or impossible to prove. Regardless of ability to prove a property of a function, we at least need to check that it works correctly on some finite set of inputs.

import Test.QuickCheck prop_RevRev :: Eq a => [a] -> Bool prop_RevRev xs = reverse (reverse xs) == xs prop_RevApp :: [Int] -> [Int] -> Bool prop_RevApp xs ys = reverse (xs ++ ys) == reverse ys ++ reverse xs

We can load those properties into GHCi and run `quickCheck`

on them. Here is a
quick way on how to do it from a terminal, and a detailed guide
on how to get started with `stack`

.

```
$ stack --resolver lts-7.16 ghci --package QuickCheck
Configuring GHCi with the following packages:
GHCi, version 8.0.1: http://www.haskell.org/ghc/ :? for help
Loaded GHCi configuration from /tmp/ghci3260/ghci-script
Prelude> :load examples.hs
[1 of 1] Compiling Main ( examples.hs, interpreted )
Ok, modules loaded: Main.
*Main> quickCheck prop_RevRev
+++ OK, passed 100 tests.
*Main> quickCheck prop_RevApp
+++ OK, passed 100 tests.
*Main>
```

What just happened? QuickCheck called `prop_RevRev`

and `prop_RevApp`

100 times
each, with random lists as arguments and declared those tests as passing,
because all calls resulted in `True`

. Far beyond what a common unit test could have done.

Worth noting, that in reality, not only `prop_RevRev`

, but both of those
properties are polymorphic and `quickCheck`

will be happy to work with such
functions, even if type signatures were inferred, and it will run just fine in
GHCi. On the other hand, while writing a test suite, we have to restrict the
type signature for every property to concrete type, such as `[Int]`

or `Char`

,
otherwise type checker will get confused. For example, this program will not
compile:

import Test.QuickCheck main :: IO () main = quickCheck (const True)

For the sake of example let's write couple more self-explanatory properties:

prop_PrefixSuffix :: [Int] -> Int -> Bool prop_PrefixSuffix xs n = isPrefixOf prefix xs && isSuffixOf (reverse prefix) (reverse xs) where prefix = take n xs prop_Sqrt :: Double -> Bool prop_Sqrt x | x < 0 = isNaN sqrtX | x == 0 || x == 1 = sqrtX == x | x < 1 = sqrtX > x | x > 1 = sqrtX > 0 && sqrtX < x where sqrtX = sqrt x

Now, this is great, but how did we just pass various functions with different
number of arguments of different types to `quickCheck`

, and how did it know what
to do with them? Let's look at it's type signature:

λ> :t quickCheck quickCheck :: Testable prop => prop -> IO ()

## Testable

So, it seems, that QuickCheck can test anything that is `Testable`

:

λ> :i Testable class Testable prop where property :: prop -> Property exhaustive :: prop -> Bool instance [safe] Testable Property instance [safe] Testable prop => Testable (Gen prop) instance [safe] Testable Discard instance [safe] Testable Bool instance [safe] (Arbitrary a, Show a, Testable prop) => Testable (a -> prop)

The last instance is for a function `(a -> prop)`

, that returns a `prop`

, which,
in turn, must also be an instance of `Testable`

. This magic trick of a recursive
constraint for an instance definition allows `quickCheck`

to test a function
with any number of arguments, as long as each one of them is an instance of
`Arbitrary`

and `Show`

. So here is a check list of requirements for writing a
testable property:

- Zero or more arguments, which have an instance of
`Arbitrary`

, that is used for generating random input. More on that later. - Arguments must also be an instance of
`Show`

, so if a test fails, offending value can be displayed back to a programmer. Return value is either:

`True`

/`False`

- to indicate pass/fail of a test case.`Discard`

- to skip the test case (eg. precondition fails).`Result`

- to customize pass/fail/discard test result behavior, collect extra information about the test outcome, provide callbacks and other advanced features.`Property`

for a much finer control of test logic. Such properties can be used as combinators to construct more complex test cases.`Prop`

used to implement`Property`

- Start with
`prop_`

or`prop`

, followed by the usual`camelCase`

, but that is just a convention, not a requirement. - Has no side effects. Also not a requirement, but strongly suggested, since
referential transparency is lost with
`IO`

and test results can be inconsistent between runs. At the same time there are capabilities for testing Monadic code, which we will not go into here.

## Preconditions

Here is another very simple property of lists `xs !! n == head (drop n xs)`

,
so let's define it as is:

prop_Index_v1 :: [Integer] -> Int -> Bool prop_Index_v1 xs n = xs !! n == head (drop n xs)

Naturally, you can see a problem with that function, it cannot accept just any
random `Int`

to be used for indexing, and `quickCheck`

quickly finds that
problem for us and prints out violating input along with an error:

λ> quickCheck prop_Index_v1 *** Failed! Exception: 'Prelude.!!: index too large' (after 1 test): [] 0

Interestingly, if you try to run this example on any computer, there is a very
good chance that it will give exactly the same output, so, it seems that input to
properties is not completely random. In fact, thanks to the function `sized`

,
the first input to our property will always be an empty list and an integer `0`

,
which tend to be really good corner cases to test for. In our case, though, `!!`

and `head`

are undefined for empty lists and negative numbers. We could add some
guards, but there are facilities provided for such common cases:

prop_Index_v2 :: (NonEmptyList Integer) -> NonNegative Int -> Bool prop_Index_v2 (NonEmpty xs) (NonNegative n) = xs !! n == head (drop n xs)

This version is still not quite right, since we do have another precondition
`n < length xs`

. However, it would be a bit complicated to describe this relation
through the type system, so we will specify this precondition at a runtime using
implication operator (⇒). Note, that return type has changed too:

prop_Index_v3 :: (NonEmptyList Integer) -> NonNegative Int -> Property prop_Index_v3 (NonEmpty xs) (NonNegative n) = n < length xs ==> xs !! n == head (drop n xs)

Test cases with values, that do not satisfy the precondition, will simply get discarded, but not to worry, it will still generate the 100 tests. In fact it will generate up to a 1000 before giving up. An alternate way to achieve similar effect would be to generate a valid index within a property itself:

prop_Index_v4 :: (NonEmptyList Integer) -> Property prop_Index_v4 (NonEmpty xs) = forAll (choose (0, length xs-1)) $ \n -> xs !! n == head (drop n xs)

λ> quickCheck prop_Index_v3 >> quickCheck prop_Index_v4 +++ OK, passed 100 tests. +++ OK, passed 100 tests.

Just in case, let's quickly dissect this for all (∀) business. It takes a random
value generator, which `choose`

happens to produce, a property that operates on it's
values and returns a property, i.e. applies values from a specific generator
to the supplied property.

λ> :t forAll forAll :: (Show a, Testable prop) => Gen a -> (a -> prop) -> Property λ> sample' $ choose (0, 3) [0,2,2,3,3,3,0,1,0,1,3]

There is a very subtle difference between the last two versions, namely `_v3`

will discard tests that do not satisfy a precondition, while `_v4`

will always
generate a value for `n`

that is safe for passing to index function. This is not
important for this example, which is good, but that is not always the
case. Whenever precondition is too strict, QuickCheck might give up early while
looking for valid values for a test, but more importantly, it can give a false
sence of validity, since most of the values that it will find could be trivial
ones.

## Pitfalls

For this section we will use prime numbers in our examples, but rather than
reinventing the wheel and writing functions for prime numbers ourselves we will
use primes package.
Just for fun, let's write a property for `primeFactors`

, which is based on
Fundamental Theorem of Arithmetic:

prop_PrimeFactors :: (Positive Int) -> Bool prop_PrimeFactors (Positive n) = isPrime n || all isPrime (primeFactors n)

That was incredibly easy and is almost a direct translation of a theorem itself. Let's consider a fact that every prime number larger than 2 is odd, thus we can easily derive a property that sum of any two prime numbers greater than 2 is even. Here is a naive way to test that property:

prop_PrimeSum_v1 :: Int -> Int -> Property prop_PrimeSum_v1 p q = p > 2 && q > 2 && isPrime p && isPrime q ==> even (p + q)

As you can imagine it is not too often that a random number will be prime, this certainly will affect the quality of this test:

λ> quickCheck prop_PrimeSum_v1 *** Gave up! Passed only 26 tests.

It only found 26 satisfiable tests out of a 1000 generated, that's bad. There is even more to it, in order to convince ourselves, that we are testing functions with data that resembles what we expect in real life, we should always try to inspect the values being generated for a property. An easy way to do that is to classify them by some shared traits:

prop_PrimeSum_v1' :: Int -> Int -> Property prop_PrimeSum_v1' p q = p > 2 && q > 2 && isPrime p && isPrime q ==> classify (p < 20 && q < 20) "trivial" $ even (p + q)

λ> quickCheck prop_PrimeSum_v1' *** Gave up! Passed only 29 tests (96% trivial). λ> quickCheckWith stdArgs { maxSuccess = 500 } prop_PrimeSum_v1' *** Gave up! Passed only 94 tests (44% trivial).

Almost all values this property was tested on are in fact trivial ones. Increasing number of tests was not much of a help, because, by default, values generated for integers are pretty small. We could try to fix that with appropriate types, but this time we will also generate a histogram of unique pairs of discovered prime numbers:

prop_PrimeSum_v2 :: (Positive (Large Int)) -> (Positive (Large Int)) -> Property prop_PrimeSum_v2 (Positive (Large p)) (Positive (Large q)) = p > 2 && q > 2 && isPrime p && isPrime q ==> collect (if p < q then (p, q) else (q, p)) $ even (p + q)

λ> quickCheck prop_PrimeSum_v2 *** Gave up! Passed only 24 tests: 16% (3,3) 8% (11,41) 4% (9413,24019) 4% (93479,129917) ...

This is better, there are less trivial values, but still, number of tests is far from satisfactory. It is also extremely inefficient to look for prime values that way, and, for any really large value passed to the property, it will take forever to check its primality. Much better approach would be to choose from a list of prime values, which we have readily available for us:

prop_PrimeSum_v3 :: Property prop_PrimeSum_v3 = forAll (choose (1, 1000)) $ \ i -> forAll (choose (1, 1000)) $ \ j -> let (p, q) = (primes !! i, primes !! j) in collect (if p < q then (p, q) else (q, p)) $ even (p + q)

λ> quickCheck prop_PrimeSum_v3 +++ OK, passed 100 tests: 1% (983,6473) 1% (953,5059) 1% (911,5471) ...

## Arbitrary

There could be a scenario where we needed prime values for many tests, then it
would be a burden to generate them this way for each property. In such cases
solution is always to write an instance for `Arbitrary`

:

newtype Prime a = Prime a deriving Show instance (Integral a, Arbitrary a) => Arbitrary (Prime a) where arbitrary = do x <- frequency [ (10, choose (0, 1000)) , (5, choose (1001, 10000)) , (1, choose (10001, 50000)) ] return $ Prime (primes !! x)

Calculating large prime numbers is pretty expensive, so we could simply use
something like `choose (0, 1000)`

, similarly to how it was done in
`prop_PrimeSum_v3`

, but there is no reason why we should exclude generating
large prime numbers completely, instead, we can reduce their chance by
describing a custom distribution with `frequency`

function.

Now writing `prop_PrimeSum`

is a piece of cake:

prop_PrimeSum_v4 :: Prime Int -> Prime Int -> Property prop_PrimeSum_v4 (Prime p) (Prime q) = p > 2 && q > 2 ==> classify (p < 1000 || q < 1000) "has small prime" $ even (p + q)

λ> quickCheck prop_PrimeSum_v4 +++ OK, passed 100 tests (21% has small prime).

## CoArbitrary

There are quite a few instances of `Arbitrary`

, many common data types from
`base`

are, but the most peculiar one is a function:

λ> :i Arbitrary class Arbitrary a where arbitrary :: Gen a shrink :: a -> [a] ... instance [safe] (CoArbitrary a, Arbitrary b) => Arbitrary (a -> b) ...

That's right, QuickCheck can even generate functions for us! One of restrictions
is that an argument to the function is an instance of `CoArbitrary`

, which also
has instance for a function, consequently functions of any arity can be
generated. Another caveat is that we need an instance of `Show`

for functions,
which is not a standard practice in Haskell, and wrapping a function in a
`newtype`

would be more appropriate. For clarity we will opt out from this
suggestion and instead demonstrate this cool feature in action. One huge benefit
is that it allows us to easily write properties for higher order functions:

instance Show (Int -> Char) where show _ = "Function: (Int -> Char)" instance Show (Char -> Maybe Double) where show _ = "Function: (Char -> Maybe Double)" prop_MapMap :: (Int -> Char) -> (Char -> Maybe Double) -> [Int] -> Bool prop_MapMap f g xs = map g (map f xs) == map (g . f) xs

## HSpec

One of the first concerns, that programmers usually raise when coming from other languages to Haskell, is that there are situations when unit tests are invaluable, but QuickCheck does not provide an easy way to do that. Bear in mind, QuickCheck's random testing is not a limitation, but rather is a priceless feature of testing paradigm in Haskell. Regular style unit tests and other QA functionality (code coverage, continuous integration, etc.) can be done just as easily as they are done in any other modern language using specialized libraries. In fact, those libraries play beautifully together and complement each other in many ways.

Here is an example of how we can use hspec to create a test suite containing all properties we have discussed so far, plus few extra unit tests for completeness of the picture.

#!/usr/bin/env stack -- stack --resolver lts-7.16 runghc --package QuickCheck --package hspec --package primes module Main where import Test.Hspec import Test.QuickCheck ... main :: IO () main = hspec $ do describe "Reverse Properties" $ do it "prop_RevRev" $ property prop_RevRev it "prop_RevApp" $ property prop_RevApp it "prop_PrefixSuffix" $ property prop_PrefixSuffix describe "Number Properties" $ do it "prop_Sqrt" $ property prop_Sqrt describe "Index Properties" $ do it "prop_Index_v3" $ property prop_Index_v3 it "prop_Index_v4" $ property prop_Index_v4 it "unit_negativeIndex" $ shouldThrow (return $! ([1,2,3] !! (-1))) anyException it "unit_emptyIndex" $ shouldThrow (return $! ([] !! 0)) anyException it "unit_properIndex" $ shouldBe (([1,2,3] !! 1)) 2 describe "Prime Numbers" $ do it "prop_PrimeFactors" $ property prop_PrimeFactors it "prop_PrimeSum_v3" $ property prop_PrimeSum_v3 it "prop_PrimeSum_v4" $ property prop_PrimeSum_v4 describe "High Order" $ do it "prop_MapMap" $ property prop_MapMap

## Conclusion

Random testing can be mistakenly regarded as an inferior way of software testing, but many studies have certainly shown that it is not the case. To quote D. Hamlet:

By taking 20% more points in a random test, any advantage a partition test might have had is wiped out.

It is very easy to start using QuickCheck to test properties of pure functions. There is also a very similar toolbox included in the library for testing monadic functions, thus allowing for a straightforward way of testing properties of functions that do mutations, depend on state, run concurrently and even perform I/O. Most importantly, this library provides yet another technique for making Haskell programs even safer.

Writing tests doesn't have to be a chore, it can be fun. We certainly find it fun at FPComplete and will be happy to provide training, consulting or development work.