Streaming data is a problem domain I've played with a lot in Haskell. In Haskell, the closest we come to built-in streaming data support is laziness-by-default, which doesn't fully capture streaming data. (I'm not going into those details today, but if you want to understand this better, there's plenty of information in the conduit tutorial.) Real streaming data is handled at the library level in Haskell, with many different options available.
Rust does things differently: it bakes in a concept called iterators not only with the standard library, but the language itself:
Also, Rust's approach follows a state machine design, as opposed to many Haskell libraries which use coroutines. That choice turns out to be pretty crucial to getting good performance, and applies in the Haskell world as well. In fact, I've already blogged about this concept with my aptly-named Vegito concept. For those familiar with it: you'll see some crossovers in this blog post, but prior knowledge isn't necessary.
While digging into the implementation of iterators in Rust, I found it very enlightening how the design differed from what idiomatic Haskell would look like. Trying to mirror the design from one language in the other really demonstrates some profound differences in the languages, which is what I'm going to try and dive in on today.
To motivate the examples here, we're going to try to perform the same computation many times: stream the numbers from 1 to 1,000,000, filter to just the evens, multiply every number by 2, and then sum them up. You can find all of the code in a Gist. Here's an overview of the benchmark results, with many more details below:
Also, each function takes an integer argument to tell it the highest value it should count it (which is always 1,000,000). Criterion requires this kind of argument be present to ensure that the Haskell compiler (GHC) doesn't optimize away our function call and give us bogus benchmarking results.
Baseline and cheating
The Gist includes code in Haskell, C, and Rust, with many different implementations of the same kind of function. The Haskell code foreign imports both Rust and C and uses the Criterion benchmarking library to benchmark them. To start off, I implemented a cheating version of each benchmark. Instead of actually filtering and doubling, it just increments the counter by 4 each time and adds it to a total. For example, in C this looks like:
By contrast, the non-cheating loop version in C is:
Similarly, we have cheating and loop implementations in Rust:
And in Haskell. Haskell uses recursion in place of looping, but under the surface the compiler turns it into a loop at the assembly level.
haskellCheating :: Int -> Int haskellCheating high' = loop 0 4 where loop !total !i | i <= high = loop (total + i) (i + 4) | otherwise = total high = high' * 2 recursion :: Int -> Int recursion high = loop 1 0 where loop !i !total | i > high = total | even i = loop (i + 1) (total + i * 2) | otherwise = loop (i + 1) total
These two sets of tests give us some baseline numbers to compare everything else we're going to look at. First, the cheating results:
You may be surprised that C is about twice as fast as Rust and Haskell. But look again: C is taking 87 nanoseconds, while Rust and Haskell both take about 175 microseconds. It turns out that GCC it able to optimize this into a downward-counting loop, which drastically improves the performance. We can do similar things in Rust and Haskell to get down to nanosecond-level performance, but that's not our goal today. I do have to say: well done GCC.
The non-cheating results still favor C, but not to the same extent:
EDIT Originally this article listed a performance number for the C loop as being faster than Rust. However, as pointed out on Reddit, the code in question was mistakenly using
All of the results are the same order of magnitude. C and Rust are neck and neck, with Haskell lagging by about 15%. Understanding the differences between the languages' performance would be an interesting topic in and of itself, but our goal today is to compare the higher-level APIs and see how they affect performance within each language. So for the rest of this post, we'll focus on comparing intra-language performance numbers.
Posted by Michael Snoyman - 10 July, 2017