Part III: Extending Vector

As before, if you haven't already, I'd highly recommend reading parts 1 and 2 before proceeding. That said, if you ignore the motivation for the sorting, this post largely stands alone as a practicum on how to extend the vector package with a custom Vector type and custom stream fusion combinators.

All of the No-Prizes from the previous post are still available! That said, ssyvlan on reddit was quite helpful in discussions about how to do the leading zero count more efficiently.

There is only one No-Prize opportunity in this post simply because it otherwise is a fairly simple engineering exercise, and other than that one point I can't think of open questions I have about this code.

Vector

Now that we have Morton keys, I finally want to at least be able to represent a sparse matrix.

When I think speed in Haskell, I think vector, repa, or accelerate. Today I'm going to focus on vector, because I think it is the easiest to adapt to suit my purposes.

Roman Leshchinskiy, author of vector is probably the single most dedicated disciple of speed I can think of in the Haskell community today. I pay lip service to speed, but he'll sit there and spend weeks fixing any single thing that he can't optimize away into a tight c-like loop. If we're going to try to be competitive with a more traditonal language implementation in the end, we'll want to piggyback on his efforts. Any loss of speed due to stupid misuses of his tools, on the other hand, will entirely be my fault.

One of the main things that vector gives us is the power of stream fusion. That is to say that if you build a Vector and then immediately consume it, the vector library is often smart enough to never bother to construct the intermediate Vector at all! As we multiply we'll be merging and concatenating streams of values. It is nice to know we can avoid building huge arrays in the interim.

Again, as I merely pay lip service to the speed God and don't attend services regularly, I'll be going through the motions with a couple of custom stream fusion combinators later on, but I've done rather poor software engineering in that I haven't bothered to check that I needed them. I'd be truly surprised if that turned out not to be the case though.

Sadly, the vocabulary of Vector here is somewhat overloaded, so from here out when I talk about a Vector, unless I explicitly state otherwise, I mean the concept of Vector from Data.Vector. If it isn't clear from context then, I'll try to pretend the following imports are in scope:

-- show
import Data.Vector as Boxed
import Data.Vector.Generic as Generic
import Data.Vector.Unboxed as Unboxed
import Data.Vector.Primitive as Primitive
import Data.Vector.Storable as Storable
-- /show

-- show These will also occur in code samples
import qualified Data.Vector.Generic.Mutable as GM
import qualified Data.Vector.Generic as G
import Data.Vector.Fusion.Stream as Stream

-- /show
main = putStrLn "Those modules still exist!"

... that way if I talk about an Unboxed.Vector it should make some sense.

Sorting out Vectors

The reason I want to use vector is that I want my storage to be arranged contiguously in memory whenever possible. Moreover, it enables me to use Dan Doel's excellent vector-algorithms to manipulate the the Vector of (Key,Value) pairs in our sparse matrices.

This comes up right away when we start to try to build a sparse matrix, as I want to sort my sparse matrices by Key in Morton order, and vector-algorithms provides me with a large number of sorting algorithms to experiment with, including the fastest intro sort available in Haskell today. Moreover, it also includes an american flag sort that could be used to gain asymptotically on the initial insertion by using the fact that we can not only compare but tell you the bit position at which our keys differ, even if we use variable length keys. This means we're not limited to the asymptotics of a comparison sort when constructing our initial matrices.

Dan's sorts rely very heavily on aggressive inlining to get worker-wrapper transforms to fire and to ensure that things like the comparisons done inside the sorting routine are being passed unboxed values.

Using vector and vector-algorithms leads to our first stumbling block.

What I want to do is store a Vector (Key, a) in such a way that I can have unboxed keys. However, I don't want to lose the flexibility to have unboxed values!

To that end, when I started this project about a week ago, I pushed out a hybrid-vectors package to hackage.

As it'll be the glue that holds together our matrix representation until I find something better, I felt it made sense to say a few words motivating its construction.

Unboxed Vectors

vector is already quite smart about managing storage. In particular an unboxed Vector of pairs is managed as a pair of vectors of the individual parts. Anybody who has done much GPU work will recognize this as the structure of arrays (SoA) approach that is often used there rather than array of structures (AoS) approach used more traditionally throughout the rest of the industry.

In case you've never thought about it before, it is worth mentioning that it has many benefits. One of those benefits is that you don't pay cache storage for parts of the structure you don't look at. You also don't incur unnecessary time/space trade-offs for alignment issues, etc. The cache dominates most concerns about slightly more complicated addressing logic, and if you look at things like the x86 instruction set, the addressing logic usually simplifies as well! We can calculate the \*{1,2,4,8} multiplier in the index en passant, but the odd sizes you get in "array of structures" are usually not so easy to use.

All of this is well and good, but I can't just use Data.Vector.Unboxed.

My particular motivation for working on this code in the first place requires me to be able to do matrix multiplication over certain fairly complicated ring-like structures that I can't Unbox. I say "ring-like" because right seminearrings without zero-divisors arise in various forms of chart parsing, and some are even non-associative!

Hybrid Vectors

Fortunately, in a great feat of engineering, Roman left the entire Vector framework open to extension with new Vector types, by providing us with Data.Vector.Generic and moreover, he made it so that the entire stream fusion framework he uses will just magically work with any instance of Generic.Vector.

So all we need to do is define our own Vector (and MVector) type for the kinds of "hybrid" vectors we need.

Then Dan's fast sorting algorithms can be used out of the box and we can steal and extend Roman's fusion framework.

Data.Vector.Unboxed has already shown us the way to build an SoA-style vector. We just need it to be more permissive about what kind of Generic.Vector vectors are allowed to comprise each side.

To define a Generic.Vector, first we must define Generic.MVector that'll be used for most operations involved in building or manipulating it behind the scenes.

{-# LANGUAGE CPP #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE UndecidableInstances #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE DeriveDataTypeable #-}
{-# LANGUAGE ScopedTypeVariables #-}

import Control.Monad
import Data.Monoid
import qualified Data.Vector.Generic.Mutable as GM
import qualified Data.Vector.Generic as G
import Data.Vector.Fusion.Stream as Stream
import Data.Data
import Prelude hiding ( length, null, replicate, reverse, map, read, take, drop, init, tail )
import Text.Read
-- show
data MVector :: (* -> * -> *) -> (* -> * -> *) -> * -> * -> * where
  MV :: !(u s a) -> !(v s b) -> MVector u v s (a, b)

instance (GM.MVector u a, GM.MVector v b) => GM.MVector (MVector u v) (a, b) where
  basicLength (MV ks _) = GM.basicLength ks
  basicUnsafeSlice s e (MV ks vs) = MV (GM.basicUnsafeSlice s e ks) (GM.basicUnsafeSlice s e vs)
  basicOverlaps (MV ks vs) (MV ks' vs') = GM.basicOverlaps ks ks' || GM.basicOverlaps vs vs'
  basicUnsafeNew n = liftM2 MV (GM.basicUnsafeNew n) (GM.basicUnsafeNew n)
  basicUnsafeReplicate n (k,v) = liftM2 MV (GM.basicUnsafeReplicate n k) (GM.basicUnsafeReplicate n v)
  basicUnsafeRead (MV ks vs) n = liftM2 (,) (GM.basicUnsafeRead ks n) (GM.basicUnsafeRead vs n)
  basicUnsafeWrite (MV ks vs) n (k,v) = GM.basicUnsafeWrite ks n k >> GM.basicUnsafeWrite vs n v
  basicClear (MV ks vs) = GM.basicClear ks >> GM.basicClear vs
  basicSet (MV ks vs) (k,v) = GM.basicSet ks k >> GM.basicSet vs v
  basicUnsafeCopy (MV ks vs) (MV ks' vs') = GM.basicUnsafeCopy ks ks' >> GM.basicUnsafeCopy vs vs'
  basicUnsafeMove (MV ks vs) (MV ks' vs') = GM.basicUnsafeMove ks ks' >> GM.basicUnsafeMove vs vs'
  basicUnsafeGrow (MV ks vs) n = liftM2 MV (GM.basicUnsafeGrow ks n) (GM.basicUnsafeGrow vs n)
-- /show
main = putStrLn "It typechecks, so it must be correct."

All we've done is say that our Hybrid.MVector u v s (a,b) is going to be comprised (strictly!) of two other vectors u s a and v s b so long as we have appropriate Generic.MVector instances to rely upon, and so long as they can both be manipulated in the same PrimMonad.

All we've done is borrow from similar operations from the vectors that comprise our constituent parts.

In the real code we then proceed to INLINE everything in sight, so that any inlined combinators from Data.Vector.Generic can "see through" our instance and INLINE the bodies of our methods.

We can construct a custom Generic.Vector similarly.

{-# START_FILE Hybrid.hs #-}
{-# LANGUAGE CPP #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE UndecidableInstances #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE DeriveDataTypeable #-}
{-# LANGUAGE ScopedTypeVariables #-}
module Hybrid where

import Control.Monad
import Data.Monoid
import qualified Data.Vector.Generic.Mutable as GM
import qualified Data.Vector.Generic as G
import Data.Vector.Fusion.Stream as Stream
import Data.Data
import Prelude hiding ( length, null, replicate, reverse, map, read, take, drop, init, tail )
import Text.Read

data MVector :: (* -> * -> *) -> (* -> * -> *) -> * -> * -> * where
  MV :: !(u s a) -> !(v s b) -> MVector u v s (a, b)

instance (GM.MVector u a, GM.MVector v b) => GM.MVector (MVector u v) (a, b) where
  basicLength (MV ks _) = GM.basicLength ks
  basicUnsafeSlice s e (MV ks vs) = MV (GM.basicUnsafeSlice s e ks) (GM.basicUnsafeSlice s e vs)
  basicOverlaps (MV ks vs) (MV ks' vs') = GM.basicOverlaps ks ks' || GM.basicOverlaps vs vs'
  basicUnsafeNew n = liftM2 MV (GM.basicUnsafeNew n) (GM.basicUnsafeNew n)
  basicUnsafeReplicate n (k,v) = liftM2 MV (GM.basicUnsafeReplicate n k) (GM.basicUnsafeReplicate n v)
  basicUnsafeRead (MV ks vs) n = liftM2 (,) (GM.basicUnsafeRead ks n) (GM.basicUnsafeRead vs n)
  basicUnsafeWrite (MV ks vs) n (k,v) = do
    GM.basicUnsafeWrite ks n k
    GM.basicUnsafeWrite vs n v
  basicClear (MV ks vs) = do
    GM.basicClear ks
    GM.basicClear vs
  basicSet (MV ks vs) (k,v) = do
    GM.basicSet ks k
    GM.basicSet vs v
  basicUnsafeCopy (MV ks vs) (MV ks' vs') = do
    GM.basicUnsafeCopy ks ks'
    GM.basicUnsafeCopy vs vs'
  basicUnsafeMove (MV ks vs) (MV ks' vs') = do
    GM.basicUnsafeMove ks ks'
    GM.basicUnsafeMove vs vs'
  basicUnsafeGrow (MV ks vs) n = liftM2 MV (GM.basicUnsafeGrow ks n) (GM.basicUnsafeGrow vs n)

  {-# INLINE basicLength #-}
  {-# INLINE basicUnsafeSlice #-}
  {-# INLINE basicOverlaps #-}
  {-# INLINE basicUnsafeNew #-}
  {-# INLINE basicUnsafeReplicate #-}
  {-# INLINE basicUnsafeRead #-}
  {-# INLINE basicUnsafeWrite #-}
  {-# INLINE basicClear #-}
  {-# INLINE basicSet #-}
  {-# INLINE basicUnsafeCopy #-}
  {-# INLINE basicUnsafeMove #-}
  {-# INLINE basicUnsafeGrow #-}

-- show
data Vector :: (* -> *) -> (* -> *) -> * -> * where
  V :: !(u a) -> !(v b) -> Vector u v (a, b)

type instance G.Mutable (Vector u v) = MVector (G.Mutable u) (G.Mutable v)

instance (G.Vector u a, G.Vector v b) => G.Vector (Vector u v) (a, b) where
  basicUnsafeFreeze (MV ks vs) = liftM2 V (G.basicUnsafeFreeze ks) (G.basicUnsafeFreeze vs)
  basicUnsafeThaw (V ks vs) = liftM2 MV (G.basicUnsafeThaw ks) (G.basicUnsafeThaw vs)
  basicLength (V ks _) = G.basicLength ks
  basicUnsafeSlice i j (V ks vs) = V (G.basicUnsafeSlice i j ks) (G.basicUnsafeSlice i j vs)
  basicUnsafeIndexM (V ks vs) n = liftM2 (,) (G.basicUnsafeIndexM ks n) (G.basicUnsafeIndexM vs n)
  basicUnsafeCopy (MV ks vs) (V ks' vs') = do
    G.basicUnsafeCopy ks ks'
    G.basicUnsafeCopy vs vs'
  elemseq (V ks vs) (k,v) b = G.elemseq ks k (G.elemseq vs v b)
-- /show
  {-# INLINE basicUnsafeFreeze #-}
  {-# INLINE basicUnsafeThaw #-}
  {-# INLINE basicLength #-}
  {-# INLINE basicUnsafeSlice #-}
  {-# INLINE basicUnsafeIndexM #-}
  {-# INLINE basicUnsafeCopy #-}
  {-# INLINE elemseq #-}

instance (G.Vector u a, G.Vector v b, c ~ (a, b)) => Monoid (Vector u v c) where
  mappend = (G.++)
  {-# INLINE mappend #-}
  mempty = G.empty
  {-# INLINE mempty #-}
  mconcat = G.concat
  {-# INLINE mconcat #-}

instance (G.Vector u a, G.Vector v b, Show a, Show b, c ~ (a, b)) => Show (Vector u v c) where
  showsPrec = G.showsPrec

instance (G.Vector u a, G.Vector v b, Read a, Read b, c ~ (a, b)) => Read (Vector u v c) where
  readPrec = G.readPrec
  readListPrec = readListPrecDefault
  
instance (G.Vector u a, G.Vector v b, Eq a, Eq b, c ~ (a, b)) => Eq (Vector u v c) where
  xs == ys = Stream.eq (G.stream xs) (G.stream ys)
  xs /= ys = not (Stream.eq (G.stream xs) (G.stream ys))
  {-# INLINE (==) #-}
  {-# INLINE (/=) #-}

instance (G.Vector u a, G.Vector v b, Ord a, Ord b, c ~ (a, b)) => Ord (Vector u v c) where
  compare xs ys = Stream.cmp (G.stream xs) (G.stream ys)
  {-# INLINE compare #-}

{-# START_FILE Main.hs #-}

module Main where
import Hybrid
import Data.Vector.Unboxed as Unboxed
import Data.Vector
import Data.Vector as Boxed
import Data.Vector.Generic as Generic
import Data.Vector.Unboxed as Unboxed
import Data.Vector.Primitive as Primitive
import Data.Vector.Storable as Storable
import qualified Data.Vector.Generic.Mutable as GM
import qualified Data.Vector.Generic as G
import Data.Vector.Fusion.Stream as Stream

-- show
type V = Hybrid.Vector Unboxed.Vector Boxed.Vector (Int,Integer)
main = do
  print (G.fromList [(1,2),(3,4)] :: V) 
  print $ G.slice 2 3 (G.fromList (Prelude.zip [1..10] [11..20]) :: V)
-- /show

It is worth taking a couple of minutes to talk about the instances, though.

Generic.Vector types typically provide the obvious Monoid, Show, Read, Eq, and Ord instances we've come to expect.

But if we were not careful and wrote

instance (G.Vector u a, G.Vector v b) => Monoid (Vector u v (a,b))

then the compiler would be unduly reticent to use the instance. That is, it won't realize that the moment it can see that we have a Hybrid.Vector, that the argument must be a pair as, after all, our GADT only has one constructor!

So instead we need to write the remaining instances to abuse the power of system Fc and modern Haskell to say that once the compiler figures out it needs one of these instances for a Hybrid.Vector u v c it can infer c must be of the form (a,b) by writing these instances as

instance (G.Vector u a, G.Vector v b, c ~ (a, b)) => Monoid (Vector u v c) where
  mappend = (G.++)
  mempty = G.empty
  mconcat = G.concat
instance (G.Vector u a, G.Vector v b, Show a, Show b, c ~ (a, b)) => Show (Vector u v c) where
  showsPrec = G.showsPrec
instance (G.Vector u a, G.Vector v b, Read a, Read b, c ~ (a, b)) => Read (Vector u v c) where
  readPrec = G.readPrec
  readListPrec = readListPrecDefault

With Eq and Ord we need to lean a little bit on the fusion machinery, just because no such defaults are provided for us.

instance (G.Vector u a, G.Vector v b, Eq a, Eq b, c ~ (a, b)) => Eq (Vector u v c) where
  xs == ys = Stream.eq (G.stream xs) (G.stream ys)
  xs /= ys = not (Stream.eq (G.stream xs) (G.stream ys))
instance (G.Vector u a, G.Vector v b, Ord a, Ord b, c ~ (a, b)) => Ord (Vector u v c) where
  compare xs ys = Stream.cmp (G.stream xs) (G.stream ys)

Now, we finally have everything we need for our Hybrid.Vector type to feel like a real Vector.

OK, not quite everything, for reasons that I'm sure make sense to Roman, he goes and duplicates the entire vector API individually customized to each vector subtype within their own modules. Admittedly, it does help with inference in the presence of such an all encompassing API that can both produce and consume values of the same type. To that end, hybrid-vectors includes a couple thousand lines of boilerplate as well to make a vector programmer feel at home. It also contains another, related notion, that of a Mixed.Vector that permits more operations to cooperate across vector types, as it was something that was easy to implement while I had all of the vector internals paged in mentally.

However, the details of that implementation is entirely mechanical.

Similarly, we need to go through a similar boilerplate exercise making an instance of Unbox for Key, but nothing new is learned.

Custom Stream Fusion

Before we move on, we are going to need at least one custom stream fusion combinator for adding two matrices together.

I don't want to pre-judge that all addition will fall under the purview of the Num typeclass, as for instance since we're sparse we can use Mat Unboxed.Vector () as a Boolean matrix and pay surprisingly little overhead as an Unboxed.Vector of ()s is represented simply by its size!

Consequently, let's define a generalized merge operation that can permit values to cancel.

mergeStreamsWith 
  :: (Monad m, Ord i) 
  => (a -> a -> Maybe a) 
  -> Stream m (i, a)
  -> Stream m (i, a)
  -> Stream m (i, a)

In Vector's current version of monadic stream fusion, a Stream consists of a step function, a state, and any knowledge we have about the Size of the Stream.

data Stream m a = forall s . Stream (s -> m (Step s a)) s Size   

Therefore with an appropriate new step function, we can write:

mergeStreamsWith f (Stream stepa sa0 na) (Stream stepb sb0 nb)
  = Stream step (MergeStart sa0 sb0) (toMax na + toMax nb)

Here our starting state MergeStart is described below, and we convert the bounds we know on the input streams into a conservative upper bound on our new stream size.

Stream fusion works by ensuring that each Step you take does not recurse, so the compiler is able to move all of the iteration to one outer-most loop.

At each Step, we can Yield a new answer and switch to a new state, simply switch to a new state or terminate our Stream.

data Step s a = Yield a s | Skip s | Done   

In our case we need to consume from two other streams, but we can only afford to ask for a bounded amount of work from each at each Step. So let's build a custom state type for our Stream:

data MergeState sa sb i a
  = MergeL sa sb i a
  | MergeR sa sb i a
  | MergeLeftEnded sb
  | MergeRightEnded sa
  | MergeStart sa sb

We'll want to reason through our step function by cases as it is rather tedious.

MergeStart is our initial state, consisting of the initial states of both of our input streams.

  step (MergeStart sa sb) = do
    r <- stepa sa
    return $ case r of
      Yield (i, a) sa' -> Skip (MergeL sa' sb i a)
      Skip sa'         -> Skip (MergeStart sa' sb)
      Done             -> Skip (MergeLeftEnded sb)

We try to read from the left stream. If it succeeds we know the value from the left stream (and are in state MergeL). If it skips we have to try again. If it says it has no more content, we should fast forward through the right hand stream with MergeLeftEnded.

The rest of the logic follows similarly.

Once we have a value from our left Stream, we should try to read from our right, merging values if their indices are equal, and otherwise putting them in order. Depending on which candidate was merged, we proceed to MergeR or MergeL. If during the Merge, we determine that our elements cancelled out, e.g. 5 + (-5), thn rather than recurse into start to ensure a bounded amount of work is done, we Skip back there.

  step (MergeL sa sb i a) = do
    r <- stepb sb
    return $ case r of
      Yield (j, b) sb' -> case compare i j of
        LT -> Yield (i, a)     (MergeR sa sb' j b)
        EQ -> case f a b of
           Just c  -> Yield (i, c) (MergeStart sa sb')
           Nothing -> Skip (MergeStart sa sb')
        GT -> Yield (j, b)     (MergeL sa sb' i a)
      Skip sb' -> Skip (MergeL sa sb' i a)
      Done     -> Yield (i, a) (MergeRightEnded sa)

MergeR is entirely symmetric to MergeL:

  step (MergeR sa sb j b) = do
    r <- stepa sa
    return $ case r of
      Yield (i, a) sa' -> case compare i j of
        LT -> Yield (i, a)     (MergeR sa' sb j b)
        EQ -> case f a b of
          Just c  -> Yield (i, c) (MergeStart sa' sb)
          Nothing -> Skip (MergeStart sa' sb)
        GT -> Yield (j, b)     (MergeL sa' sb i a)
      Skip sa' -> Skip (MergeR sa' sb j b)
      Done     -> Yield (j, b) (MergeLeftEnded sb)

and then we just have to deal with fast forwarding:

  step (MergeLeftEnded sb) = do
    r <- stepb sb
    return $ case r of
      Yield (j, b) sb' -> Yield (j, b) (MergeLeftEnded sb')
      Skip sb'         -> Skip (MergeLeftEnded sb')
      Done             -> Done
  step (MergeRightEnded sa) = do
    r <- stepa sa
    return $ case r of
      Yield (i, a) sa' -> Yield (i, a) (MergeRightEnded sa')
      Skip sa'         -> Skip (MergeRightEnded sa')
      Done             -> Done

Putting it all together we get:

-- show
import Data.Vector.Fusion.Stream.Monadic (Step(..), Stream(..))
import Data.Vector.Fusion.Stream.Size

data MergeState sa sb i a
  = MergeL sa sb i a
  | MergeR sa sb i a
  | MergeLeftEnded sb
  | MergeRightEnded sa
  | MergeStart sa sb

mergeStreamsWith
  :: (Monad m, Ord i) 
  => (a -> a -> Maybe a) 
  -> Stream m (i, a) 
  -> Stream m (i, a) 
  -> Stream m (i, a)
mergeStreamsWith f (Stream stepa sa0 na) (Stream stepb sb0 nb)
  = Stream step (MergeStart sa0 sb0) (toMax na + toMax nb) where
  step (MergeStart sa sb) = do
    r <- stepa sa
    return $ case r of
      Yield (i, a) sa' -> Skip (MergeL sa' sb i a)
      Skip sa'         -> Skip (MergeStart sa' sb)
      Done             -> Skip (MergeLeftEnded sb)
  step (MergeL sa sb i a) = do
    r <- stepb sb
    return $ case r of
      Yield (j, b) sb' -> case compare i j of
        LT -> Yield (i, a)     (MergeR sa sb' j b)
        EQ -> case f a b of
           Just c  -> Yield (i, c) (MergeStart sa sb')
           Nothing -> Skip (MergeStart sa sb')
        GT -> Yield (j, b)     (MergeL sa sb' i a)
      Skip sb' -> Skip (MergeL sa sb' i a)
      Done     -> Yield (i, a) (MergeRightEnded sa)
  step (MergeR sa sb j b) = do
    r <- stepa sa
    return $ case r of
      Yield (i, a) sa' -> case compare i j of
        LT -> Yield (i, a)     (MergeR sa' sb j b)
        EQ -> case f a b of
          Just c  -> Yield (i, c) (MergeStart sa' sb)
          Nothing -> Skip (MergeStart sa' sb)
        GT -> Yield (j, b)     (MergeL sa' sb i a)
      Skip sa' -> Skip (MergeR sa' sb j b)
      Done     -> Yield (j, b) (MergeLeftEnded sb)
  step (MergeLeftEnded sb) = do
    r <- stepb sb
    return $ case r of
      Yield (j, b) sb' -> Yield (j, b) (MergeLeftEnded sb')
      Skip sb'         -> Skip (MergeLeftEnded sb')
      Done             -> Done
  step (MergeRightEnded sa) = do
    r <- stepa sa
    return $ case r of
      Yield (i, a) sa' -> Yield (i, a) (MergeRightEnded sa')
      Skip sa'         -> Skip (MergeRightEnded sa')
      Done             -> Done
  {-# INLINE [0] step #-}
{-# INLINE [1] mergeStreamsWith #-}
-- /show
main = putStrLn "That compiles, too."

It is worth noting the phase-controlled INLINE pragmas in the final definition as well. Without them you really won't see much benefit from stream fusion! They ensure that the compiler is careful to hold onto the steps uninlined until the right moment.

No-Prize #6

It'd probably be much more efficient for us to us an efficient k-merge or, even better, a cache-oblivious variant that is careful to make no assumptions about our caches. The only two operations we'll be using Stream fusion for are concatenation and mergeStreamsWith for now. Is there a more efficient fusion form we should be using that can exploit this structure?

Matrices, not Vectors

Now that we have hybrid-vectors, we can finally peek at what our matrix representation can look like.

newtype Mat v a = Mat { runMat :: Hybrid.Vector Unboxed.Vector v (Key, a) }

and how we could build one:

fromList :: Generic.Vector v a => [(Key, a)] -> Mat v a
fromList xs 
  = Mat 
  $ Generic.modify (Intro.sortBy (compare `on` fst)) 
  $ Generic.fromList xs

As we'll see next time don't need type parameters for the number of dimensions and we don't need to store information on the dimensionality of the contents as well as we'll be able to turn back to the notions of "most significant most significant difference" and 2-fattest numbers a second time.

And you'd hoped the bit-twiddling was behind us. Hah!

-Edward Kmett

August 18, 2013