20 Mar 2014

A common analytic task is the monte carlo simulation. This involves sampling large numbers of random values in order to come to some kind of conclusion.

Random numbers are often thought of as a weak point in purely functional languages. In a purely functional language like Haskell, we keep a strict separation between pure and impure functions. Since by its very nature, a function generating a random value generates different values each time it's called, we're forced to move it into the impure world. At that point, it may seem like we've lost a lot of our claimed benefits from purity.

This blog post is intended to demonstrate that, with the right choice of libraries, we're able to have our cake and eat it too: write pure code for the pure parts of our program and keep side-effects contained. We'll also explore how Haskell's type system allows us to write even more concise code than dynamically typed languages, not to mention most statically typed languages, and how we can move from using gigabytes of memory to just a few tens of kilobytes.

## Calculating pi

For demonstration purposes, we want to start with a simple monte carlo simulation: estimating the value of pi. To do this, we'll focus on a 1-by-1 square, and imagine that the bottom-left corner (i.e., the origin) is the center of a unit circle (i.e., radius of 1). For any point in that square, we can determine if it falls inside the unit circle by calculating its distance from that bottom-left corner. In other words, for a point (x, y), if x*x + y*y < 1, we're inside the circle. Otherwise we're not.

Now comes the monte carlo simulation. We need to generate random points in that square, and check if they're in the circle. We determine the probability that one of these random points is in the circle, and use this to estimate the area of the unit circle which intersects with our square. Since this is one quarter of the entire circle, our probability will actually estimate pi divided by 4, since the area of the unit circle is pi.

## Let's get imperative

We want to start off with a baseline approach to the problem. A common tool in quant research is Python, together with the Numpy and Scipy libraries. Let's see how they might be used to come up with a solution:

```
from numpy import average
from scipy import rand
print average([x[0]*x[0]+x[1]*x[1]<1 for x in rand(10000000,2)])*4
```

Note: While I'm not an expert in Numpy and Scipy, this example was provided to me by someone who uses those tools regularly. If there's a problem in the code or my explanation of it, please let me know.

The real magic here is the `rand`

function. It will generate a matrix with the
given dimensions, filled with random numbers. We pass in two parameters:
10,000,000 to be the number of experiments to perform, and 2, since each
experiment needs two random values (an x and y coordinate). Our program then
uses a list comprehension to iterate over the 10 million two-length vectors, and
for each vector, determines if the distance from the origin is less than one.
Our boolean result is implicitly converted to a 0 or 1, we get the average, and
multiply by 4. When run on my system, I get the result: 3.1413928.

### The problem

The problem with this approach is memory usage: the entire 20 million member matrix is kept in memory at the same time! My system ended up using 250MB of memory to perform this calculation. But the problem is worse than that, since the memory use will grow linearly as we increase our sample size!

We can fairly easily solve our memory problem, by going to an explicit for loop:

```
import random
count = 100000000
successes = 0
for i in xrange(count):
x = random.random()
y = random.random()
if x*x + y*y < 1:
successes += 1
print 4.*successes/count
```

In comparison to the previous code, this program runs in about 3MB of memory, but has twice the run time. (Recommendations for improvement are welcome.)

While the code isn't *complicated*, it is pretty tedious. And maybe this is
just the Haskeller in me, but I got seriously tripped up by the need to use
`1.0`

instead of just `1`

when summing up `successes`

. Initially, I just used
`1`

, which resulted in my answer always being 0, since Python performed integer
division instead of floating point division. It's certainly a strange feeling
to get a type error at runtime!

So we now have two Pythonic approaches to the problem: a concise, fast, memory-inefficient version, and a longer, slower, memory-efficient version.

**UPDATE** There are some great comments on the Python code below. I'm going to
leave the original content of this post unmodified, but recommend readers look
at the comments for clarifications.

## Enter Haskell, and IAP

Let's see how Haskell, and in particular FP Complete's Integrated Analysis
Platform (IAP), would address this problem. One of the most common complaints
we hear from people about their current analysis tools is their inability to
deal well with large data sets. As such, we've designed our tool with constant
memory analyses as a primary goal. In order to do that, we use the `conduit`

library, which is one of Haskell's streaming data libraries.

The nice thing about a streaming data library like conduit is that, not only does is make it easy to keep only the bare minimum data in memory at a time, but it's also designed around separating pure and impure code. This was one of the areas of concern we raised at the beginning of our blog post. So let's see how IAP and conduit allow us to solve the problem:

(Note: If you're viewing this on fpcomplete.com, you can run the code in the page. Since the School of Haskell servers do not perform optimized builds, the run time is longer than what you can expect from a proper compiled build.)

```
{-# LANGUAGE ExtendedDefaultRules, NoMonomorphismRestriction #-}
import Conduit
main = do
let cnt = 10000000
successes <- sourceRandomN cnt $$ lengthIfC (\(x, y) -> x*x + y*y < 1)
print $ successes / cnt * 4
```

**NOTE** If you want to play with this code in the IDE, please click on the
"Open in IDE" button above, go to the settings page, and change the environment
from Stable to Unstable. We're using some bleeding-edge packages in this
example.

We start off with some language extensions. These give us the same environment
as the IAP itself works in, essentially turning off some stricter type rules to
allow for more experimentation. We then import the `Conduit`

library, and start
off our `main`

function.

The first and third lines in our `main`

function should be pretty
straight-forward: we define how many simulations we want to perform on line
one, and print out our estimate of pi on line three. All of the work happens
on line two. Since this is the first time we're looking at conduit in this
blog, we'll take it slowly.

When we use conduit, we create a data *pipeline*. In a pipeline, we need some
data *source* (also known as a producer), a data *sink* (also known as a
consumer), and some optional transformers (not used in this example). Our
source is the function:

sourceRandomN cnt

This function allocates a new random number generator, and then generates a
stream of `cnt`

random values. Unlike Scipy's `rand`

function, it does *not*
create an in-memory vector. Instead, each new random value is produced on
demand.

Our sink is a little bit more complicated:

lengthIfC (\(x, y) -> x*x + y*y < 1)

`lengthIfC`

is a function which will determine the length of its input stream
which satisfies some test. (If you're familiar with Excel, think of `countif`

.)
Its parameter is a function which returns a `Bool`

. And thanks to Haskell's
light-weight lambda syntax, we can easily define our function inline.

Once we have our source and sink, we use conduit's connect operator (`$$`

) to
plug them together and produce a final result- in this case, the number of
random points that fall inside our unit circle.

When I run this program on my system, it takes 44KB of memory. If I increase
the sample size to 100 million or higher, it *still* takes just 44KB of memory,
living up to the promise of constant memory usage. And as an added bonus, our
program also runs significantly faster: in my test, the Scipy program took
19.620s, and the Haskell program took 0.812s.

## The benefits of a type system

In our Scipy version of the program, we had to explicitly state that we wanted
to have two values generated for each simulation via the second parameter to
the `rand`

function. In the Python explicit iteration version, we called
`random`

twice in each iteration of the loop. How does the Haskell program know
to generate two values, and not just one? The answer is embedded in our lambda
function:

({-hi-}\(x, y){-/hi-} -> x*x + y*y < 1)

We declared that our lambda function takes a pair of values. That's all Haskell
needs to know: the type inferencer automatically determines from here that the
input to the data sink must be a pair of values, and if the input must be a
pair of values, the output of the data source must *also* be a pair of values.
And here's the really cool part: `sourceRandomN`

is a polymorphic function, and
can generate many different types of random values. So if we modified our
program to instead take three random, it would still work.

If you're wondering about how this happens, it's all powered by the
`Variate`

typeclass,
which has an instance for each different type of random value that can be
generated. There are instance for `Double`

, `Int`

, and a number of other
numeric types, plus instances for pairs, 3-tuples, and 4-tuples. So if you
wanted to write a program that required a random `Int`

, `Double`

, and `Bool`

,
it will work. (And if you need more than 4 values, you can use nested tuples.)

This begs another question: how does Haskell know to use `Double`

s in our
analysis? This is once again type inferencing helping us out, and determining
that we needed a floating point type. The default floating point type is
`Double`

, so that's what Haskell uses.

Compare all that to Python, where accidentally replacing `1.0`

with `1`

results
in a garbage result from the program.

### Rewrite rules

One other powerful feature that plays into our performance here is *rewrite
rules*. These are instructions we can give to the Haskell compiler to say:
"you're about to do X, but it would be faster to do it a different way." A
simple example of a rewrite rule would be to replace something like:

`sum [x..y]`

with the mathematically identical:

`(x+y)*(y-x+1)/2`

Many Haskell libraries- conduit included- use rewrite rules to get rid of intermediate data structures, and thereby let you as a user write high-level code while generating more efficient machine code.

Part of our upcoming IAP work will be researching even more ways to fuse away data structures in conduit. The great advantage of that is that you'll be able to start writing code today, and as the rewrite rules improve, your code will speed up automatically.

## A solution that scales

For a problem as simple as the one we've described, it's not really that difficult to write a solution with most any language. Even a straight C solution can easily be done in under 15 lines of code. The question is: how does it handle the large problems?

We're designing IAP to make it easy and natural to compose smaller programs into larger solutions. By basing everything on constant-memory operations, we're hoping to help our users avoid running up against a brick wall when their data sets suddenly jump in size by three orders of magnitude. In other words, we believe the first solution you write will be able to be extended to more complicated problems, and much larger data sets.

Stay tuned to this blog for more demonstrations of our upcoming IAP product. And if you have questions about how IAP could fit into your research and analysis platforms, please feel free to contact out support team.