What pure functional programming is all about: Part 2

1 May 2017 Chris Done

In the last post, we covered the following:

  • What purity is and what it isn't.
  • We looked at functions and function composition.
  • We've looked at how reasoning is easier if you only have functions.

In this post, we'll explore:

  • SQL as a pure language, which is a familiar language to almost everybody.
  • How pure languages deal with the real-world and doing side-effects, which are obviously practical things to do.
  • Evaluation vs execution, the generalization of pure vs impure languages.

In the next post, we'll explore the performance and declarative-code-writing benefits of pure functional languages in particular detail, looking at generated code, assembly, performance, etc.

In a familiar setting: SQL

In the last post, we established that pure languages (like Haskell or PureScript) are unable to make observations about the real world directly.

So, then, the question is: How do pure languages interact with the real world? Let's look at SQL first, because everyone knows SQL, and then use that as a bridge towards how it's done in Haskell and other general purpose pure languages.

Imagine a subset of SQL which is pure. It's not a big leap: SQL is mostly pure anyway. I don't like it when people write an article and ask me to imagine a syntax that's in their heads, so here is trivial a PEG grammar, which you can try online here.

Select = 'SELECT ' Fields ' FROM ' Names Where?
Names = Name (', ' Name)*
Fields = Exp (', ' Exp)*
Where = ' WHERE ' Exp
Exp = Prefix? (Funcall / Condition / Name) (Connective Exp)?
Funcall = Name '(' Exp ')'
Condition = (Funcall / Name) Op Exp
Prefix = 'NOT '
Connective = ' AND ' / ' OR '
Op = ' = ' / ' <> ' / ' < ' / ' > ' / ' * '
Name = [a-zA-Z0-9]+

The following is an example of the above grammar.

SELECT sin(foo), bar * 3 FROM table1, table2 WHERE foo > 1 AND NOT bar = floor(foo)

This language is pure. The result of evaluating one of these expressions is the same every time. You can swap the order of commutative expressions like floor(foo) = bar and it doesn't change the meaning of the program. floor(x) * sin(y) = sin(y) * floor(x).

You get a description of the work you want to be done, and then the database engine (PostgreSQL, MySQL, SQL Server, etc.), usually creating an optimized query plan based on what it knows about your query and the database schema and contents, executes the query on the database, returning a collection of results.

You didn't write instructions about how to get the database, where to look on the database, how to map indices across tables and whether to do an index scan or a sequence scan, allocating a buffer, etc.

And yet, above, we clearly are giving the SQL engine a little program (not a turing complete one), that can compute conditionals and even functions (floor and sin) that we specified freely.

Because our little program is pure, SQL is capable of basically rewriting it completely, reducing it into a more normalized form without redundancy, and executing something far more efficient.

Look at what PostgreSQL is able to do with your queries. Here I do a query which works on an indexed field on a table with 37,626,086 rows. PostgreSQL knows both the query and my data, and plans accordingly, able to estimate the cost of how much work we'll have to do.

ircbrowse=> explain select * from event where id > 123 and id < 200*2;
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Index Scan using event_unique_id on event  (cost=0.56..857.20 rows=309 width=87)
   Index Cond: ((id > 123) AND (id < 400))

(The 200*2 was evaluated statically.)

Simply based on a static analysis of the SQL program.

Evaluation vs execution

The SQL example is a specific case of a more general way of separating two concerns: evaluation and execution (or interpretation). On the one side you have your declarative language that you can evaluate, pretty much in terms of substition steps (and that's how you do your reasoning, in terms of the denotational semantics), and on the other you have a separate program which interprets that language in "the real world" (associated with operational semantics).

I'm sure any programmer reading this easily understands the informal denotational semantics of the SQL mini language above, and would feel comfortable implementing one or more ways of executing it.

In the same way, Haskell and other pure languages, construct declarative descriptions of work which a separate program can execute.

Evaluation

What's evaluation, as differentiated from execution? To do your language's regular evaluation steps, as you might do on paper. If you've read your Structured and Interpretation of Computer Programs by MIT, you'll know this as the substitution model. Here's an example.

If you want to count the length of a linked list, you might do:

length [1, 2, 3]

And length is implemented like this:

length [] = 0
length (x:xs) = 1 + length xs

So let's evaluate the expression, step-by-step:

length [1, 2, 3]

1 + length [2, 3]

1 + (1 + length [3])

1 + (1 + (1 + length []))

1 + (1 + (1 + 0))

1 + (1 + 1)

1 + 2

3

Even if you don't know Haskell, I think the simple substitution steps to evaluate the expression are straight-forward and make sense. That's evaluation.

A Terminal mini-language

Let's look at a data type that models a list of terminal actions to perform. I realise that if you don't know Haskell or ML, the below is mostly nonsense. I'll also bring in JavaScript as a "lingua franca", so that the idea translates nicely.

(If it looks concise in Haskell and verbose in JavaScript, it's because JavaScript was designed for imperative, OO-style programming, and Haskell was designed for pure functional programming.)

data Terminal a
  = Print String (Terminal a)
  | GetLine (String -> Terminal a)
  | Return a
function Print(string, next) {
  this.line = string;
  this.next = next;
}
function GetLine(cont) {
  this.f = cont;
}
function Return(a) {
  this.a = a;
}

It can describe printing to stdout, getting a line from stdin, or returning a result. An example might be:

Print
  "Please enter your name: "
  (GetLine
     (\name ->
        Print
           ("Hello, " ++ name ++ "!")
           (Return ())))
new Print(
  "Please enter your name: ",
  new GetLine(function(name){
    return new Print("Hello, " + name + "!",
                     new Return(null));}))

Think of this like an abstract syntax tree. It's the description of what to do. When there is a function in the AST, that's a callback. The callback is just another pure function that returns a new action to be interpreted.

The expressions above can be evaluated, but they perform no side-effects. It's a piece of purely functional code which generates a data structure. Our substitution model above still works.

Hammering the point home, consider a greeting action like this:

greeting :: String -> Terminal ()
greeting msg =
  Print
    msg
    (GetLine
       (\name ->
          if name == "Chris"
            then Print "That's my name too." (Return ())
            else Print "Hello, stranger!" (Return ())))

The greeting function itself is pure. It takes a message string, and constructs a Print using that. That's done by evaluation. Also, the sub-expression (\name -> ...) is another pure function that takes a name and conditionally produces one action or another.

Now, onto execution.

Execution

We can write a number of ways to interpret this AST. Here is one pure implementation. It takes as input (1) some action list to run and (2) lines of input, then it returns a value and some output lines:

data Result a = Success a | Failure String deriving Show
interpretPure :: Terminal a -> [String] -> Result (a,[String])
interpretPure action stdin =
  case action of
    Return a -> Success (a, [])
    Print line next ->
      case interpretPure next stdin of
        Success (a, stdout) -> Success (a, line : stdout)
        Failure e -> Failure e
    GetLine f ->
      case stdin of
        [] -> Failure "stdin closed"
        (line:stdin') -> interpretPure (f line) stdin'

In diagram form:

Note the part,

    GetLine f ->
      case stdin of
        [] -> Failure "stdin closed"
        (line:stdin') -> interpretPure (f line) stdin'
-------------------------------------- ^

Above is where we call the pure function from GetLine to produce another action for us to interpret.

In JavaScript:

function Success(x){ this.success = x; }
function Failure(e){ this.error = e; }
function interpretPure(action, stdin) {
  if (action instanceof Return) {
    return new Success({ result: action.a, stdout: [] });
  } else if (action instanceof Print) {
    var result = interpretPure(action.next, stdin);
    if (result instanceof Success) {
      return new Success({
        result: result.success.result,
        stdout: [action.line].concat(result.success.stdout)
      });
    } else return result;
  } else if (action instanceof GetLine) {
    if (stdin == "") {
      return new Failure("stdin closed");
    } else {
      var line = stdin[0];
      var stdin_ = stdin.slice(1);
      return interpretPure(action.f(line), stdin_);
    }
  }
}

Which we can run like this:

> interpretPure demo ["Dave"]
Success ((),["Please enter your name: ","Hello, Dave!"])
> interpretPure demo []
Failure "stdin closed"
> JSON.stringify(interpretPure(demo, ["Dave"]));
"{"success":{"result":null,"stdout":["Please enter your name: ","Hello, Dave!"]}}"
> JSON.stringify(interpretPure(demo, []));
"{"error":"stdin closed"}"

If we liked, we could interpret this as real I/O that talks to the world directly.

In Haskell, we translate it to a different type called IO. In JavaScript, let's do a good olde-fashioned 90's-style alert/prompt user interaction.

interpretIO :: Terminal a -> IO a
interpretIO terminal =
  case terminal of
    Return a -> return a
    Print str next -> do
      putStrLn str
      interpretIO next
    GetLine f -> do
      line <- getLine
      interpretIO (f line)
function interpretIO(action) {
  if (action instanceof Return) {
    return action.a;
  } else if (action instanceof Print) {
    alert(action.line);
    interpretIO(action.next);
  } else if (action instanceof GetLine) {
    var line = prompt();
    interpretIO(action.f(line));
  }
}

In this case, we're just translating from one model to another. The IO a type is interpreted by the Haskell runtime either in a REPL prompt, or in a compiled program. JavaScript is capable of executing things directly during evaluation, so we were able to implement something similar to what Haskell's runtime does for IO.

In SICP, they talk about programs as being spells, and the interpreter being like a spirit that does your bidding. Here is a formal description:

(In practice, Haskell's IO type is not a closed set of operations, but an open set that can be extended with e.g. bindings to C libraries and such, and it isn't interpreted at runtime but rather compiled down to bytecode or machine code. The strict evaluation/execution separation remains in place, however.)

Summary

We've covered an informal comparison between "evaluating" and "executing" code, which is a distinction often discussed in the functional programming community.

We've explored making a declarative data type that represents terminal interaction, and implemented two different ways of interpreting it: as a pure function, or executing it directly. There may be other interpretations, such as talking to a server. We could write an interpreter which logs every action performed. We could also write a test suite which feeds randomized or specific inputs and demands consistent outputs.

We looked at how SQL is such an example of an evaluation (1 + 1 is evaluated before executing the query) vs execution separation. In SELECT customer.purchases + 1 FROM customer, the customer.purchases + 1 expression is evaluated by the interpreter for each row in the table.

In the next post, we'll explore:

  • Performance of pure code.
  • Optimization of pure code, in particular: Inlining, deforestation, common sub-expression elimination, fusion, rewrite rules, etc.
  • And how these optimizations allow for low- or zero-cost abstractions.

We may further explore different types of interpreters (software transactional memory, local mutation, parsers, etc).

P.S.

There are some extra articles that may be of interest for the fearless:

comments powered by Disqus

Copyright © 2013-2017 FP Complete Corp. All rights reserved