Learning a new programming language involves learning syntax, semantics, and idioms. Syntax itself can tell you a lot about the philosophy of the language, but learning syntax without any context is not only hard but also boring. On the other hand, you'd want to get some handle on syntax as soon as possible, so you can sight-read simple programs and try your own hand at modifying them.
So here's a little introduction to Haskell syntax, without any formal definitions, that touches on the philosophy of the language and tries not to be too boring. Unless you are already a Haskell pro, you're not supposed to understand all of it on the first reading. In fact you might want to occasionally come back to it as you are progressing in your understanding of Haskell. (If you're a Haskell pro, you might cringe at some mental shortcuts I've taken for the sake of simplicity.)
The good thing is that, once you get used to it, you'll be able to see more program logic at a glance. In a wordy language you not only waste precious keystrokes but you also dilute the structure and meaning of your program. Code that fills one screenful in Haskell, would be spread over several screenfuls in Java. For a Haskell programmer, studying a Java or a C++ program is like looking at it through a microscope: you see lots of detail but miss the big picture.
One of those things that seem indispensable in other languages are special characters for separating statements (e.g., semicolons) and delimiting blocks (e.g., braces). Mind you, every programmer worth his or her salt uses formatting to make their code readable, but the compiler throws the formatting away. Haskell does (almost) the opposite. It recognizes blocks by indentation (but please use spaces, not tabs), and newlines as separators.
As part of this philosophy of frugality of expression Haskell doesn't require type signatures -- although an experienced Haskeller provides them for clarity -- so type errors in this strongly typed language are often cryptic for the uninitiated. For instance, if you define a function
f that adds two numbers and then call it with two strings, the compiler will not complain about bad arguments, it will complain about string not supporting operator plus. And it will formulate this complaint in a very non-obvious way:
No instance for (Num [Char]) arising from a use of `f' Possible fix: add an instance declaration for (Num [Char])
Also, the case is important. Anything that starts with a capital letter is either a concrete type or a data constructor. Lower-case-starting names are reserved for function names and variables, including type variables.
The bad thing about terseness is that lack of redundancy makes error reporting harder. The compiler truly doesn't know if you have forgotten a closing parenthesis or messed up indentation.
a b c d eis a call to function
e. If you use parentheses it's only to delimit individual arguments (if they are expressions); and commas are used to separate tuple elements. So if you see something like this:
f (a, b, c)you interpret it as a call to (application of) a function
fto just one argument -- a tuple of three elements. It's very important to train your eyes to look for spaces. They separate functions from their arguments and arguments from each other.
When an argument to a function is a result of another function application, you have to use parentheses for that. For instance,
a b (c d)means the function
aacting on two arguments,
(c d), which itself is an application of function
d. There is a way to omit the parentheses in the above by using the operator
a b $ c dA
$is interpreted as: What follows is the last argument to the function you're calling.
a b c = d e fIt's a definition of the function
athat takes two formal parameters,
c. The body of this function is a call to
dwith two arguments,
f. Of course, the use of short meaningless names is not encouraged in Haskell but I do it here to help you prime your internal parser.
When formal parameters are enclosed in parentheses, they contain patterns. If an argument is data with some structure, it can be pattern-matched on the spot. Types that have more than one constructor may be pattern-matched in more than one way. In that case there may be what looks like several definitions of the same function. For instance, a function that operates on a tree might be defined as:
f (Leaf x) = ... f (Node left right) = ...Here,
(Leaf x)matches a leaf node containing a value
(Node left right)matches an internal node with two children
Function definitions may also contain guards: boolean expressions constraining the arguments.
The use of the equal sign as part of function definition has a deeper meaning. The right hand side is equal to the left hand side in the sense that it might be substituted for it anywhere in the program -- with the formal parameters replaced by actual arguments. In fact using this kind of "equational reasoning" you may formally prove things about your code.
f a b c d e = ...with, say, two arguments:
f 7.1 2.35The result is a new function that takes 3 arguments (the remaining ones). This is called partial application and is possible because any function of multiple arguments can be curried, i.e., interpreted as a function of one argument returning another function. This fundamental property is reflected in the way we write type signatures for functions:
A function of one argument has the type:
a -> bwhere a and b are some types. For instance, a function
strLenthat takes a string and returns an integer would have the signature:
strLen :: String -> IntType signatures are optional but, nevertheless, they are routinely written before function definitions.
The signature of the function
prefix that takes a string and an integer and returns a string would be:
prefix :: String -> Int -> StringThe last arrow points to the return type of the function, the preceding one(s) separate argument types (here,
Int). But there is an equivalent, curried, interpretation of this signature:
prefix :: String -> (Int -> String)In this interpretation
prefixis a function of one argument of type
Stringreturning a function that takes an
Intand returns a
String. The parentheses are usually omitted because of the right associativity of the arrow.
Another common trick is to provide a signature that lists more arguments than the function definition. For instance:
f :: a -> b -> c f x = gHere a two-argument function
fis defined in terms of a one-argument function
g. This is equivalent to:
f x y = g ybut using the curried form is considered cooler because it eliminates some notational redundancy. The extreme of this approach is called point-free style, in which arguments are not used at all and functions are defined in terms of other functions.
The downside to currying is cryptic error messages. The compiler can't, for instance, guess that you have forgotten to provide an argument to some function. It will instead think that you used a curried function and may, for instance, complain that it can't convert the (curried) function to some other type that was expected at that point in your code.
return(more about which later).
It follows that the
if/then/else construct is an expression too: it returns a value from one of the branches.
You might think that a definition of a local variable would be a statement, but it isn't. Instead you introduce locals using the
let/in expression. You "let" a name (or a pattern) be equal to something "in" an expression. The scope of the new name(s) span the expression following
in. For instance:
let x = 5 in x * xis equal to 25.
Another statement-like expression is the pattern-matching
case/of construct, as in:
case tree of Leaf x -> ... Node left right -> ...The value of this expression is one of the expressions to the right of the arrows.
You might think that a loop must be a statement, but it isn't because:
Recursion is not used as often as you'd expect since a lot of loops can be replaced by bulk operations: functions that take lists or other containers. They are, under the surface, implemented using recursion, but they are often easier to read and reason about. Four such functions from the Haskell standard library, Prelude, are ubiquitous in Haskell code:
f a b c + g d eyou know that you are adding the result of two function calls rather than calling one function with one of the arguments being the sum of two terms. Other operators have precedences between 0 (the lowest) and 9 (the highest, but still lower than function application). In particular the
$operator has precedence 0; and the dot, function composition, has precedence 9.
Parentheses may always be used to clarify both precedence and associativity.
The OR, denoted by
| (vertical bar) in data definitions, could be understood as creating a tagged union -- in the simplest case it's just an enumeration. For instance, a Bool type can be described as:
data Bool = True | False
Boolis the name of the type and
Falseare called data constructors (notice the mandatory capitalization).
The AND combination could be understood as forming tuples or structures with one or more fields of different types.
Since data is immutable in Haskell, an object always remembers how it was created. It remembers which data constructor was called and what values (or expressions) were passed to it.
Data constructor can also be used as a pattern to match this information, e.g., in a function definition, a let expression, or in the case/of expression. We've seen examples of this in items 3 and 5. Here's the (recursive) data type definition that was used in those examples (see also the Appendix):
data Tree = Leaf Int | Node Tree Tree
The order of definitions within a module doesn't matter. That means you can call a function or use a data structure whose definition is further down in the file -- no no need for forward declarations. There is even a
where construct that lets you define local functions or variables after the code that calls them, as in:
len x y = sqrt (sq x + sq y) where sq a = a * a
By the way, Haskell is a lazy language, so things are not evaluated in the order you write them. Instead they are evaluated when the values are needed. Evaluation is forced, for instance, when you want to output results, or when you want to pattern-match them. There is also the bang operator and the
seq function that force (some degree of) evaluation.
One of the advantages of lazy evaluation is that you can define infinite data structures, like this list from 0 to infinity:
doand contain lines that look like statements and assignments:
do a <- giveMeAnA b <- giveMeAB return (a + b)You can even see here the
return"statement" (really, a library function). The "statements" inside the
doblocks are executed in order -- when and if they are executed. This kind of code is called monadic because there is always a monad behind it. Monads are very interesting beasts but, unfortunately, beyond the scope of the current article.
In monadic code there is usually something hidden from view; often some state is passed implicitly, or some form of flow of control is imposed. The details vary from monad to monad.
An important example of monadic code is input and output. You always do I/O inside a
do statement -- using the IO monad. This way you are guaranteed that I/O actions are executed in order.
A function that does I/O returns an IO object, often called an action. There's nothing you can do with an IO action, except to ignore it (in that case no I/O will be done), or pass it all the way to main, where you can return it to the system --
main usually has the signature:
main :: IO ()
I'd like to thank Michael Snoyman for valuable comments on the draft of this blog.
Here's the more complete version of the tree example I used in this article:
data Tree = Leaf Int | Node Tree TreeThe two functions, f and g, are equivalent.
g (Leaf x) = x g (Node left right) = g left + g right
f tree = case tree of Leaf x -> x Node left right -> f left + f right