Explicit speculative parallelism for Haskell's Par monad
Haskell provides quite a few ways for writing parallel programs, but none of them is fully automatic. The programmer has to use some annotations or library to run computations in parallel explicitly. The most recent paper (and library) for writing parallel programs follows the latter approach. You can find more information about the library in a paper by Simon Marlow et al. A monad for deterministic parallelism [1] and it is also available on Hackage. However, I'll explain all the important bits that I'll use in this article.
The library provides an explicit way for writing parallel programs using a Par
monad. The library contains
constructs for spawning computations and sharing state using blocking variables. However, the whole programming model
is fully deterministic. I believe that it is sometimes useful to lift the determinacy requirement.
Some programs are deterministic, but the fact cannot be (easily) automatically verified. For example, say you have two functions
fib1
and fib2
. They both give the same result, but each of them is more efficient
than the other one on certain inputs. To calculate a Fibonacci number, the program could speculatively
try to evaluate both functions in parallel and return the result of the one that finishes first.
Unfortunately, this cannot be safely implemented using a fully deterministic library. In this article, I'll show some
examples where speculative parallelism can be useful and I'll talk about an extension to the Par
monad
that I implemented (available on GitHub). The extension allows
programmers to write speculative computations, provided that they manually verify that their code is
deterministic.
Introduction
In this section, I'll show a basic example that demonstrates how the Par
monad works. I'll use a
contrived example of calculating Fibonacci numbers to keep things simple. You can find better examples in
the Par
monad examples repository.
The performance measurements that I'll include in this article are just for demonstration (executed
on Intel Core 2 Duo CPU). I'll also briefly introduce the unamb
operator [3] designed by
Conal Elliott. This solves a very
important problem, originally in Functional Reactive Programming [2], which is closely related to
speculative computations.
Deterministic parallelism monad
The parallelism wraps a computation inside a monad. A parallel version of a function that returns
Int64
will have a return type Par Int64
. The type represents a computation
that can be started (using the runPar
function).
Here is a basic example:
-- Naive calculation of Fibonacci numbers
fib :: Int64 -> Int64
fib x | x < 1 = 1
fib x = fib (x-2) + fib (x-1)
-- Parallel version of the function
pfib :: Int64 -> Par Int64
pfib n | n < 25 = return (fib n)
pfib n = do
av <- spawn (pfib (n - 1))
b <- pfib (n - 2)
a <- get av
return (a + b)
The fib
function is a usual (non-parallel) function for calculating Fibonacci numbers. The
pfib
function is a parallel version (as you can see from the type signature). When the argument
is smaller than 25, it runs sequential fib
to avoid generating too many small work items.
If the argument is sufficiently large, the function spawns a computation to calculate
pfib (n-1)
in parallel. The type of the spawn
function is:
spawn :: NFData a => Par a -> Par (IVar a)
The NFData
requirement specifies that it should be possible to fully evaluate the
data structure. The rest of the signature says that the function takes some computations and returns
a variable (wrapped in Par
) that will eventually contain the result of the
computation. When writing code in the Par
monad, we can get a value from the variable
using get
function. The function blocks the computation until a value becomes
available and has the following type:
get :: IVar a -> Par a
Now it should be easy to understand the rest of the pfib
body. It spawns one recursive
computation (as a background task), then runs the second recursive call on the main thread and then
waits until the background operation completes. Finally, it returns the sum of the results.
The following simple snippet shows how to run the two functions and it shows some approximate performance measurements:
main = do
-- Naive sequential fibonacci (1.8 sec)
print $ fib 32
-- Parallel version with 25 threshold (1.0 sec)
print $ runPar $ pfib 32
When we have a value of type Par a
, we can evaluate it (to get an a
value)
using the runPar
function. When running the parallel function on dual-core machine,
it gives you a slightly less than 2x speedup. For comparison, a version implemented using the
par
and pseq
combinators (which is a lower-level parallelism mechanism
in GHC) completes in approximately 0.95 seconds.
Finally, it is worth mentioning that the programming model used by the Par
monad is
very similar to the model used by async
in F#. The spawn
function
corresponds to Async.StartChild
and the runPar
does the same thing as
Async.RunSynchronously
.
Unambiguous choice
Before discussing the support for speculative programming, I'd like to shortly discuss the unambiguous choice operator, which was created by Conal Elliott for his Functional Reactive Programming (FRP) implementation, but is also available as a separate package on Hackage.
The operator is a function of type a -> a -> a
. It takes two arguments, starts
evaluating them concurrently and returns a value that becomes available first.
When using the operator, the programmer needs to make sure that the two arguments
are compatible. This means that if they both finish evaluating, their
values are equal. Formally, the requirement looks like this:
compatible a b = (a = ⊥) | (b = ⊥) | (a = b)
The operator can be used for two purposes. First, it can be used to deal with computations
that do not terminate. For example, say you have an expression a && b
. When b
is False
, the result of the expression should be (logically) also False
.
However, when a
doesn't terminate, the overall expression will also never
yield a value, because it will start evaluating a
first. Using unamb
,
you can write:
import Data.Unamb
c = unamb (a && b) (b && a)
This use clearly matches the compatibility requirement. When both arguments terminate, the
result is same in both of the cases. However, the expression returns False
even if a
is non-terminating computation, because the second argument will
evaluate to False
.
The second possible use of unamb
is for simple speculative evaluation.
I described a typical scenario earlier. We have two functions and one is faster for some of
the inputs, but not for all of them. Using unamb
, we can write the following
helper that takes two functions and returns the value that is produced first:
tryBoth :: (t -> a) -> (t -> a) -> t -> a
tryBoth f1 f2 n = unamb (f1 n) (f2 n)
To demonstrate how the function works, I implemented a more clever function for calculating
Fibonacci numbers (ffib
), which counts from zero and keeps adding last two
generated numbers. If you want to play with it, you can find the complete source code at
the end of the article. The following example shows the results:
main = do
-- Naive recursive implementation (~1.7 sec)
print fib 32
-- Efficient implementation (~0.01 sec)
print ffib 32
-- Tries evaluating result using both functions (~0.01 sec)
tryBoth fib ffib 32
In this example, we could just call ffib
, because it will be always faster.
However, that's not always the case. You might have a very fast function that almost always
works, but sometimes fails and a backup function that is slow, but always works.
The unamb
operator doesn't break determinacy of code only when it is
used properly. The programmer needs to make sure that the arguments are compatible.
However, I believe that this kind of construct is very useful and I hope that the previous
examples demonstrated that. In the rest of the article, I'll describe my extension that
allows you to use similar concepts in the Par
monad.
Speculative parallelism for Par monad
The extension (available on GitHub) adds support for explicit cancellation of computations. When starting a computation, you can give it a cancellation token. The token can be later used (from another computation) to cancel the background task. The design of the extension is very similar to how cancellation works in F# asynchronous workflows and in Task Parallel Library (TPL) in .NET.
Parallel unambiguous choice
The support for cancellation is a low-level control structure and it is not expected that
it will be used directly very often. However, you can use it to implement your own functions
that provide higher-level of abstraction. For example, it is quite easy to implement
unamb
operator for Par a
values:
-- Unambiguous choice between two parallel computations
-- Assumes that computations p1 and p2 are compatible
punamb :: NFData a => Par a -> Par a -> Par a
punamb p1 p2 = do
res <- newBlocking
ct1 <- newCancelToken
ct2 <- newCancelToken
forkWith ct1 (do
a <- p1; put res a; cancel ct2)
forkWith ct2 (do
a <- p2; put res a; cancel ct1)
v <- get res
return v
The type of the function specifies that the value returned by the function
can be fully evaluated, but it is equally easy to define a version that evaluates the
value to a head-strict form (using put_
instead of put
).
The body of the punamb
first creates some object for implementing the synchronization.
It uses newBlocking
to create a variable (of type IVar a
) for storing the result.
A variable created using newBlocking
behaves slightly differently than standard variables created
using new
. When you attempt to write a value to a standard variable for the second time, the
writing fails and causes an exception. However, variables created using newBlocking
behave
differently - instead of causing an exception, the second write is just blocked (indefinitely) and the
operation never completes. This makes it possible to implement a race between two computations.
The next two values ct1
and ct2
are cancellation tokens that are later used
to cancel running computations.
After the initialization, the body starts two background computations using the forkWith
function.
The function takes a CancelToken
as the first argument and a computation of type Par ()
as the second argument. When called (from the Par
) monad, it starts the computation in
background and associates the cancellation token with it. This allows others to cancel the computation.
The two computations started using forkWith
are quite simple. Each of them evaluates
one of the punamb
's arguments, then attempts to cancel the other computation and
stores the result to the res
variable. In a typical case, one of the computations
completes first and canceles the other (which is still running). However, if both complete at the
same time, then one wins the race and writes value to res
. The other attempts to write
the result, but gets blocked. The race doesn't affect determinism, as long as the two computations
given as arguments to punamb
are compatible.
The following snippet uses punamb
to implement an example similar to the
earlier demonstration of unamb
. The pffib
function (not shown in this
article) is a faster implementation of Fibonacci numbers embedded in the Par
monad:
main = do
-- Naive recursion using Par monad (~1.1 sec)
print $ runPar $ pfib 32
-- Efficient recursion using Par monad (~0.01 sec)
print $ runPar $ pffib 32
-- Tries evaluating both & returns the first (~0.04 sec)
print $ runPar $ punamb (pfib 32) (pffib 32)
As you would expect, the punamb
operation behaves similarly to unamb
.
When given two computations, it finishes after the first one completes. The time is slightly larger,
because the first computation (running in parallel) also occupies some of the resources.
You may be wondering how the cancellation works - at which point is a Par
computation cancelled when another computation calls cancel
. The following
section briefly explains how is the cancellation implemented in my modification of the package.
How are computations cancelled?
The cancellation for the Par
monad is cooperative, which means that the
computations can be cancelled only when they are written in a particular way that permits
cancellation. In particular, the cancel
function will not be able to cancel a
Par
computation that is created by writing return foo
(where
foo
is some expression).
A computation can be cancelled at points when it performs some special computation in
the Par
monad. Most importantly, this includes the monadic binding (as well as
other special calls like forking). The pfib
function (demonstrated above)
contains many such points. It can be cancelled when it tries to spawn a recursive call (in
background), when it recursively calls itsef as well as when it tries to read the result
of the background computation.
You can exlpore the source code of my modification on GitHub. The commit that adds the cancellation support
(see here)
shows the difference compared with the original version. Implementing an un-cooperative
cancellation (as, for example, in unamb
) would be also possible, but more
difficult. However, I believe that the cooperative cancellation model works very well
for the Par
monad, because parallel computations generally consist of
large number of small (atomic) steps.
To finish the blog post, the last section shows a single larger example of implementing tree processing as a speculative computation explicitly using cancellation.
Speculative tree processing
In this section, I present a larger example that uses the cancellation support to implement
a speculative processing of a tree. We look how to implement a forall
function that checks whether
a specified predicate holds for all leaves of the tree. When the function returns False
for some
value, the overall result will also be false. In this case, the function can cancel all parallel tasks and return
immediately.
The example uses the following simple tree structure:
data Tree a
= Leaf a
| Node (Tree a) (Tree a)
The following snippet shows a basic parallel version of forall
. It uses the same recursive
parallel processing pattern that was used earlier in the pfib
function:
-- Test whether all elements match the given predicate (in parallel)
forallp :: (a -> Bool) -> Tree a -> Par Bool
forallp p (Leaf num) = return $ p num
forallp p (Node left right) = do
al' <- spawn $ forallp p left
ar <- forallp p right
al <- get al'
return (al && ar)
When processing a leaf, the function calls the predicate and returns the result. When processing a node,
it spawns processing of the left sub-tree as a background operation (using spawn
, which is
implemented using fork
) and then processes the right sub-tree. After both operations
complete, it returns logical conjunction of the two results.
When implementing the speculative version, we need to spawn two background computations (to start
processing the left and right sub-tree). When any of them completes returning False
, we
can immediately store the result in some variable, which unblocks the main computation (and it can
return the overall result). When the result is False
, the process needs to wait until the
other computation completes before it can set the final result.
The following snippet shows the speculative version:
-- Test whether all elements match the given predicate
-- (Speculatively - False is returned immediately)
foralls :: (a -> Bool) -> Tree a -> Par Bool
foralls p tree = do
tok <- newCancelToken
r <- forall' p tok tree
cancel tok
return r
where
-- Recursively start computations using the same cancellation token
forall' p tok (Leaf v) = return (p v)
forall' p tok (Node left right) = do
leftRes <- new
rightRes <- new
finalRes <- newBlocking
forkWith tok (forall' p tok left >>=
completed leftRes rightRes finalRes)
forkWith tok (forall' p tok right >>=
completed rightRes leftRes finalRes)
get finalRes
-- Called when one of the branches complete. Write result immediately
-- if it is False, otherwise wait for the second computation.
completed varA varB fin resA = do
put varA resA
if not resA then put fin False
else get varB >>= put fin . (&& resA)
The main body of the function creates a new CancelToken
and then calls a helper that does
the actual processing. The cancellation token is used when starting any background computations during the
recursive processing. As a result, when the helper function returns False
(meaning that some
computations may be still running), we can cancel all (still running, but unnecessary) computations
using this single token.
The function that implements the recursive processing first creates three variables. The first two
(leftRes
and rightRes
) are used by their corresponding computations (when a
left computation completes, it writes result to leftRes
). The last variable is created using
the newBlocking
function, because it can be accessed by any of the two computations. Its value
is set to the final result as soon as possible. The function then spawns two tasks to process sub-trees
and waits for the final result.
The two background computations spawned by the function both make a recursive call and then pass the
result to the completed
function. It takes all three variables (variable of the currently
completed computation, variable of the other computation and a final result variable) and the result
of the computation. If the result is False
, then it sets the final result. Otherwise, it
waits until the other computation completes (by blocking on the other variable) and then sets the
final result (in this case, the computation may race with the other call to completed
,
but that is not a problem - they both produce the same value and the second one will just block
when it attempts to write).
The complete source code contains a full example that you can run to test the function. Assuming we have
a tree value named tree
, we can write the following test. It demonstrates the performance of a
simple sequential version, parallel version and a speculative version from the last listing:
main = do
-- Binary tree contains ~270 large primes and a single
-- non-prime number. The tree is fully evaluated.
-- Single-threaded version (see source code link) (~3.7 sec)
measure "Non-prime num (Seq) " (forall isPrime tree2)
-- Simple parallel version (~1.9 sec)
measure "Non-prime num (Par) " (runPar $ forallp isPrime tree2)
-- Parallel version with shortcircuiting (~0.01 sec)
measure "Non-prime num (Shr) " (runPar $ foralls isPrime tree2)
The example generates a tree containing large prime numbers and one number that is not a prime. Then it ensures that the tree is fully evaluated and processes it using different techniques. Parallel processing is roughly two times faster than sequential version, but a speculative version is faster by orders of magnitude. This is, of course, the case only when the tree contains at least one non-prime number. The performance depends on the data, but in general, the speculative parallelism can be used to implement various useful heuristics.
Summary
In this article, I demonstrated a couple of extensions that I implemented for the Par
monad that has been recently proposed by Simon Marlow, Ryan Newton and Simon Peyton Jones. The
extension adds functions for cancellation of computations in the monad.
The support for cancellation should be viewed as a low-level mechanism. In practice, it can be
used to implement useful higher-level abstractions such as the unamb
operator designed
by Conal Elliott. The operator provides a high-level way for writing speculative computations
(or working with partially undefined functions). When using the operator, the developer is
responsible for ensuring that the arguments are compatible (will give the same result).
In this article, I showed an implementation of unamb
for the Par
monad
and also a larger tree-processing example. The tree processing example cannot be elegantly written
using unamb
, because it requires more expressive abstraction. This abstraction could be,
for example, a Haskell version of joinads [4], a language extension that I designed with Don
Syme during an internship at Microsoft Research. I'll write more about that in some future blog post...
Downloads & Source code
- Implementation of the
Par
monad with cancellation (GitHub) - Complete source code of examples from the article (GitHub)
References
- [1] A monad for deterministic parallelism (PDF) - Simon Marlow, Ryan Newton, Simon Peyton Jones
- [2] Push-pull functional reactive programming - Conal Elliott
- [3] Unamb - Haskell Wiki
- [4] Joinads: a retargetable control-flow construct for reactive, parallel and concurrent programming - Tomas Petricek, Don Syme
Published: Tuesday, 17 May 2011, 1:59 PM
Author: Tomas Petricek
Typos: Send me a pull request!
Tags: research, haskell, parallel