14 Apr 2017

This is a technical post series about **pure** functional
programming. The intended audience is general programmers who are
familiar with closures and some functional programming.

We're going to be seeing how *pure* functional programming **differs**
from regular "functional programming", in a significant way.

We're going to be looking at a little language theory, type theory, and implementation and practice of pure functional programming.

We'll look at correctness, architecture, and performance of pure programs.

We're going to look at code samples in C#, JavaScript and SQL. We're going to look at Haskell code. We're going to look at assembly language.

In closing we'll conclude that typed purely functional programming
*matters*, and in a different way to what is meant by "functional
programming".

Hold onto yer butts.

## In this post

Firstly, we'll start with a pinch of theory, but not enough to bore you. We'll then look at how functional programming differs from "pure" functional programming. I'll establish what we mean by "side-effects". Finally, I'll try to motivate using pure languages from a correctness perspective.

## Types and Functions

A function is a relation between terms. Every input term has *exactly
one* output term. That's as simple as it gets.

In type theory, it's described in notation like this:

*f : A → B*

Inputs to *f* are the terms in type *A*, and outputs from *f* are the
terms in type *B*. The type might be `Integer`

and the terms might be
all integers .., `-2`

, `-1`

, `0`

, `1`

, `2`

, ...

The type might be `Character`

and the terms might be `Homer`

, `Marge`

,
`SideShowBob`

, `FrankGrimes`

, `InanimateCarbonRod`

, etc.

This is the core of functional programming, and without type theory, it's hard to even talk formally about the meaning of functional programs.

Get it? I thought so. We'll come back to this later.

## Functional programming isn't

*Functional programming* as used generally by us as an industry tends
to mean using first-class functions, closures that capture their
environment properly, immutable data structures, trying to write code
that doesn't have side-effects, and things like that.

That's perfectly reasonable, but it's distinct from *pure* functional
programming. It's called "*pure* functional", because it has **only
functions** as defined in the previous section. But why aren't there
proper functions in the popular meaning of functional programming?

It's something of a misnomer that many popular languages use this term
*function*. If you read the language specification for
Scheme,
you may notice that it uses the term *procedure*. Some procedures may
implement functions like cosine, but not all procedures are functions,
in fact most aren't. Any procedure can do cheeky arbitrary
effects. But for the rest of popular languages, it's a misnomer we
accept.

Let's look into why this is problematic in the next few sections.

## Side-effects, mutation, etc.

What actually are side-effects, really? What's purity? I'll establish
what *I am going to mean* in this series of blog posts. There are
plenty of other working definitions.

These are some things you can do in source code:

- Mutate variables.
- Make side-effects (writing to standard output, connecting to a socket, etc.)
- Get the current time.

These are all things you cannot do in a pure function, because they're not explicit inputs into or explicit outputs of the function.

In C#, the expression `DateTime.Now.TimeOfDay`

has type `TimeSpan`

,
it's not a function with inputs. If you put it in your program, you'll
get the time since midnight when evaluating it. Why?

In contrast, these are side-effects that pure code can cause:

- Allocate lots of memory.
- Take lots of time.
- Loop forever.

But none of these things can be *observed* during evaluation of a pure
functional language. There isn't a meaningful pure function that
returns the current time, or the current memory use, and there usually
isn't a function that asks if the program is currently in an infinite
loop.

So I am going to use a meaning of purity like this: an effect is something implicit that you can observe during evaluation of your program.

## Which languages are purely functional

So, a pure functional language is simply one in which a function cannot observe things besides its inputs. Given that, we can split the current crop of Pacman-complete languages up into pure and impure ones:

Pure languages | Impure languages |
---|---|

Haskell (we'll use Haskell in this series), PureScript and Idris are purely functional in this sense. You can't change variables. You can't observe time. You can't observe anything not constructed inside Haskell's language. Side effects like the machine getting hotter, swap space being used, etc. are not in Haskell's ability to care about. (But we'll explore how you can straight-forwardly interact with the real world to get this type of information safely in spite of this fact.)

We'll look at the theoretical and practical benefits of programming in a language that only has functions during this series.

## Reasoning and function composition

In a pure language, every expression is pure. That means you can move things around without worrying about implicit dependencies. Function composition is a basic property that lets you take two functions and make a third out of them.

Calling back to the theory section above, you can readily understand
what function composition is by its types. Lets say we have two
functions, `f : X → Y`

and `g : Y → Z`

:

The functions `f : X → Y`

and `g : Y → Z`

can be composed, written ```
f
∘ g
```

, to yield a function which maps `x`

in `X`

to `g(f(x))`

in `Z`

:

In JavaScript, `f ∘ g`

is:

`function(x){ return f(g(x)); }`

In Haskell, `f ∘ g`

is:

\x -> f (g x)

There's also an operator that provides this out of the box:

`f . g`

You can compare this notion of composition with shell scripting pipes:

`sed 's/../../' | grep -v ^[0-9]+ | uniq | sort | ...`

Each command takes an input and produces an output, and you compose
them together with `|`

.

We'll use this concept to study equational reasoning next.

## A really simple example

Let's look at a really simple example of composition and reasoning about it in the following JavaScript code and its equivalent in Haskell:

JavaScript | Haskell |
---|---|

```
> [1, 4, 9].map(Math.sqrt)
[1, 2, 3]
> [1, 4, 9].map(Math.sqrt).map(function(x){ return x + 1 });
[2, 3, 4]
``` | ```
> map sqrt [1,4,9]
[1.0,2.0,3.0]
> map (\x -> x + 1) (map sqrt [1,4,9])
6.0
``` |

There's a pattern forming here. It looks like this:

`xs.map(first).map(second)`

map second (map first xs)

The `f`

and `g`

variables represent any function you might use in
their place.

## Let's do equational reasoning

We know that `map id ≣ id`

, that is, applying the identity function
across a list yields the original list's value. What does that tell
us? Mapping preserves the structure of a list. In JavaScript, this
implies that, given `id`

var id = function(x){ return x; }

Then `xs.map(id) ≣ xs`

.

Map doesn't change the length or shape of the data structure.

By extension and careful reasoning, we can further observe that:

map second . map first ≣ map (second . first)

In JavaScript, that's

`xs.map(first).map(second) ≣ xs.map(function(x){ return second(first(x)); })`

For example, applying a function that adds 1 across a list, and then applying a function that takes the square root, is equivalent to applying a function across a list that does both things in one, i.e. adding one and taking the square root.

So you can refactor your code into the latter!

Actually, whether the refactor was a good idea or a bad idea depends on the code in question. It might perform better, because you put the first and the second function in one loop, instead of two separate loops (and two separate lists). On the other hand, the original code is probably easier to read. "Map this, map that ..."

You want the freedom to choose between the two, and to make transformations like this freely you need to be able to reason about your code.

## But it doesn't work in JavaScript

Turning back to the JavaScript code, is this transformation really
valid? As an exercise, construct in your head a definition of
`first`

and `second`

which would violate this assumption.

```
xs.map(first).map(second)
→
x.map(function(x){ return second(first(x)); })
```

The answer is: no. Because JavaScript's functions *are not
mathematical functions*, you can do anything in them. Imagine your
colleague writes something like this:

```
var state = 0;
function first(i){ return state += i; }
function second(i){ return i `mod` state; }
```

You come along and refactor `x.map(first).map(second)`

, only to find
the following results:

```
> var state = 0;
function first(i){ return state += i; }
function second(i){ return i % state; }
[1,2,3,4].map(first).map(second)
[1, 3, 6, 0]
> var state = 0;
function first(i){ return state += i; }
function second(i){ return i % state; }
[1,2,3,4].map(function(x){ return second(first(x)); })
[0, 0, 0, 0]
```

Looking at a really simple example, we see that basic equational
reasoning, a fundamental "functional" idea, **is not valid in
JavaScript**. Even something simple like this!

So we return back to the original section "Functional programming
isn't", and why the fact that some procedures are functions doesn't
get you the same reliable reasoning as when you can *only define
functions*.

A benefit like being able to transform and reason about your code translates to practical pay-off because most people are changing their code every day, and trying to understand what their colleagues have written. This example is both something you might do in a real codebase and a super reduced down version of what horrors can happen in the large.

## Summary

In this post I've laid down some terms and setup for follow up posts.

- We've looked at what purity is and what it isn't.
- We've looked at functions and function composition.
- We've looked at how reasoning is easier if you only have functions. We'll circle back to this in coming posts when we get to more detailed reasoning and optimizations.

In the next post, we'll explore:

- SQL as a pure language, which is a familiar language to almost everybody.
- How pure languages deal with the real-world and doing side-effects, which are obviously practical things to do.
- Evaluation vs execution, the generalization of pure vs impure languages.

After that, we'll explore the performance and declarative-code-writing benefits of pure functional languages in particular detail, looking at generated code, assembly, performance, etc.