Accelerator and F# (I.): Introduction and calculating PI
I already wrote about two projects that I worked on during an internship at MSR back in 2007 (ASP.NET support in F# and F# WebTools). Even though this was more than 2 years ago (and I did one more internship at MSR in the meantime), I still have one more project that I never published on the web. The folks from the F# team reminded me of this project recently, so I thought I could finally publish it. The project used Microsoft Research Accelerator [1, 2], which is a C# library for developing array-based computations and executing them on a GPU. More recently, the Accelerator team at MSR published Accelerator v2 , which was a good motivation to update my original project...
In this article, we'll look at the simplest way of using Accelerator from F#. Accelerator provides a managed interface that can be naturally used from both C# and F#. We can use a mix of method calls and overloaded operators to describe a computation. In F#, we'll also define our additional custom operators to make the code a bit nicer. After we introduce Accelerator using a simple C# demo, we'll look how to calculate an approximate value of the PI number using a Monte-Carlo method.
This article is the first one from a series about using Accelerator from F#. The list of articles may change, but here is a list of articles that I'm currently planning to write:
- Accelerator and F# (I.): Introduction and calculating PI
- Accelerator and F# (II.): The Game of Life on GPU
- Accelerator and F# (III.): Data-parallel programs using F# quotations
- Accelerator and F# (IV.): Composing computations with quotations
Accelerator v2 is implemented in native code, but ships with a managed wrapper that
allows us to use it elegantly from C# and other managed languages as well. The library
exposes various types that can be used for creating data-parallel computations that
process 1D or 2D arrays of data. In this article, we'll work with
which represents a computation that returns a 1D array of floats (
float in C#).
Accelerator provides overloaded operators for point-wise operations with arrays,
so you can for example write
res = nums1 * nums2 which represents a point-wise
multiplication of two arrays (
res[i] = nums1[i] * nums2[i] for each
Aside from overloaded operators, we can use other operations with arrays that are available
as static methods in the
ParallelArrays class. For example the
method moves elements of the array to the left or to the right. Shifting elements of an array
[1; 2; 3; 4] to the right by 1 will result in
[1; 1; 2; 3] (the first
element is left unchanged).
When writing code that calculates with the
FloatParallelArray type, it is
important to realize, that we're not actually running any computations yet! The type
represents a computation and doesn't perform it. This means that when you write
a computation that adds some arrays and performs a few other operations, Accelerator only
remembers what do you want to do with the data. The computation is executed later
when we create a target that knows how to perform the operations. If you're
familiar with LINQ to SQL, you already know this idea - writing calculation with the
FloatParallelArray type is just like writing a LINQ query. Executing the
computation later corresponds to running the LINQ query as a SQL code. The two
available targets allow us to run the code on a multi-core CPU (only on 64bit) and on
a GPU using DirectX 9 (in this case, the code is translated to shader code).
Enough with theory for now and let's look at some actual example. I'll start with a very
brief code that shows how to use Accelerator in C# and then we'll look at some more interesting
F# code. Assume we have an array of
float values named
The following example builds a computation that calculates an average value of an element
and two its neighbors for each element in the array:
var input = new FloatParallelArray(nums); var sum = ParallelArrays.Shift(input, 1) + input + ParallelArrays.Shift(input, -1); var output = sum / 3.0f;
We start by creating a new
FloatParallelArray instance that stores the
input data for the program (this represents a constant computation). Then we write a
calculation that adds three arrays - the input array and the input array shifted
by 1 both to the left and to the right. Finally, on the last line, we divide the
array by 3. This is also a point-wise operation, so it actually divides each element
of the array by 3. Now that we have a
which represents the computation, we can run it using GPU:
var target = new DX9Target(); var res = target.ToArray1D(output);
The code to run the computation is quite simple. We first initialize a target.
DX9Target, which executes computations using GPU. If we wanted
to run the operation using multi-core CPU, we could use
(but note that this works only on x64). If you run the code, you'll get results like this:
7, 5, 5, 1, 5, 1, 0, 7, 6, 5 6, 6, 4, 4, 2, 2, 3, 4, 6, 5
If you know a little bit about F#, you probably wouldn't have any troubles using Accelerator from F#. Since F# is a fully-compatible managed language, we can use any .NET libraries in a very similar way as in C#. However, we can use a couple of tricks to make the F# code nicer and more readable than the corresponding C# version...
Calculating PI in F#
Before we look at the calculation itself, we'll write a few lines of code that will make the rest of program nicer. This will be shared with the upcoming articles in the series and in principle, we could encapsulate the functionality in an F# library. However, I'll make the code as simple as possible and just write it again in each project.
Using Accelerator from F#
First of all, we'll need to add reference to the
assembly. This is just a managed wrapper for a native library that also needs to
be copied to the output directory. You'll need to find an appropriate version of
Accelerator.dll in the Accelerator installation directory
(either x86 or x64 version) and copy it to
bin/Debug in your project.
Then we can start by writing a couple of helpers:
open Microsoft.ParallelArrays type PA = Microsoft.ParallelArrays.ParallelArrays type FPA = Microsoft.ParallelArrays.FloatParallelArray // Initialization of constant 2D arrays let gridSize = 4096 let shape = [| gridSize; gridSize; |] let zero, one = new FPA(0.0f, shape), new FPA(1.0f, shape)
We start by opening the namespace that contains all the Accelerator functionality.
To make the code a little bit more succinct, we also define two type aliases. The
FPA type represents a computation that returns an array of floats when
PA type represents the static class which contains
operations for working with these computations.
Next, we specify the size of the array we'll use during the computation - we'll work with 2D arrays of the size 4096x4096. Under the cover, this will be used as a size of a texture on the GPU that's used to perform the computation using shaders. Even though we could work with one-dimensional arrays as well, using 2D arrays is a good idea, because it can be easier processed on the GPU.
We'll also create two simple
FPA instances that represent
constant computations. The first one contains zeros and the second one contains
ones. The second parameter of the constructor specifies the size of the array
we're creating (using an array of integers with two elements). Since many of the
operations that Accelerator can perform are point-wise, we'll need the two
one) when encoding the algorithm.
As the last thing before looking at the actual computation, we'll also write three functions that wrap functionality that's available using static methods in C# (we could of course use static methods directly, but using functions will lead to a more idiomatic F# code):
// Calculating with 2D float arrays let sqrt (a:FPA) = PA.Sqrt(a) let sum (a:FPA) = PA.Sum(a) let select (cond:FPA) (a:FPA) (b:FPA) = PA.Select(cond,a,b)
sqrt function is a point-wise operation that calculates
square roots of all elements of the array.
sum is a little bit more
interesting - it adds all numbers from the source array and returns an array that
contains just a single element, which is the sum (for example sum of
[1; 2; 3; 4]
will be an array
The most interesting operation is probably the
Just like others, it is a point-wise operation. It takes three arrays of floats
and expects arrays of the same size. It compares the value of the first array (
with zero. If the value is greater than or equal to zero, it returns a value from
the first array (
a), otherwise it returns value from
For a one-dimensional array, this corresponds to the following C# code:
res[i] = cond[i]>=0 ? a[i] : b[i] (for each index
Now, let's finally move on to the PI number calculation...
Calculating PI using Monte Carlo
We'll implement the PI calculation using a Monte Carlo method. The idea of the algorithm is that we'll generate a large number of random points in a square and then count how many of them are inside a circle inscribed in the square. You can get a better idea by looking at the image at the beginning of the article - the image shows a circle inside a window and a large number of randomly generated points. Points inside the circle are in green and other points are in red. We'll need to count the fraction of points that are inside the circle.
To start, we'll need to generate a large number of random points. We'll generate
two 2D arrays with random values to represent X and Y coordinates of points. This can
be done using
Array2D.init function, which initializes an array
using the function that calculates an element value (in our example, the function
let rnd = new Random() let generator x y = float32(rnd.NextDouble()) let px = new FPA(Array2D.init gridSize gridSize generator) let py = new FPA(Array2D.init gridSize gridSize generator)
Once we have .NET arrays with random values, we use them to create
constant computations represented as the
from Accelerator. Next, we'll write a function that takes a target
(an object that knows how to evaluate
and calculates an approximate value of PI using our random points:
let calculatePi (target:Target) = // Calculate distances of points from the center let pxc, pyc = (px - 0.5f) * 2.0f, (py - 0.5f) * 2.0f let distances = sqrt ((pxc * pxc) + (pyc * pyc)) // Count points with distance less than one let inCircle = select (one - distances) one zero let count = sum(inCircle) // Evaluate the count and calculate PI let result = target.ToArray1D(count) result. / (float32(gridSize * gridSize) / 4.0f)
The first two blocks of code work with the
FPA type, which
means that we're just composing a computation, which we'll execute later.
We start by adjusting the random values a little bit. The numbers are
in range 0 to +1, so we subtract 0.5 and multiply the values by 2, to get
arrays with values in range -1 to +1. Next, we create a new array that
represents distances of points from the point (0, 0). Note that each
i, j coordinate in the 2D array represents a single point.
Once we have the
distances value, we want to count the
number of points that are inside the circle - this means points with
distance less than or equal to 1. This can be implemented using a trick.
one and get an array
with value larger than or equal to 0 when the point is inside the circle.
Then we use the
select function to get 1 for points inside
the circle and 0 for all other points (as a reminder - this use of
select function corresponds to the following C# code:
(one - distances)[i] >= 0 ? one[i] : zero[i]). If we
then sum all values inside the 2D array, we'll get the count of points
for which the result was 1 (meaning points inside the circle).
As the last step, we need to calculate PI from the number of points
inside the circle and the total number of points inside the square. The result
of summing is stored in an array, because Accelerator works with arrays.
However, it contains only one element, so we get the value using
Then we divide it by the total number of points and also by 4, because our
square was of size 2 by 2 (values ranging from -1 to +1). And that's our
value of PI! The last step is to actually run the computation using
let dxTarget = new DX9Target() let pi = calculatePi(dxTarget) printfn "Approximate PI value: %f" pi
As you can see, running the calculation is now pretty simple. We created an
DX9Target to run the computation on GPU and passed it
to our function that does all the work. Alternatively, we could also use
to run the computation in parallel on a multi-core CPU.
This article was a brief introduction of a very interesting Accelerator library
and a demonstration of using this library from F# in the simples possible way.
We've seen how to use the
FloatParallelArray, which is a core type
representing a computation in Accelerator and we've implemented a PI calculation
using it. In the next article of this series we'll look at one more example - a
famous Conway's Game of Life. Later, we'll also look how to run "usual" F# code
(meaning, code that doesn't use
FloatParallelArray explicitly) with
Accelerator using F# quotations.
- Download the PI calculation sample (ZIP, 5.5kB)
-  Accelerator: Using Data Parallelism to Program GPUs for General-Purpose Uses - Microsoft Research
-  Accelerator Project Homepage - Microsoft Research
-  Microsoft Research Accelerator v2 - Microsoft Connect
-  Monte Carlo method - Wikipedia.org
-  GPGPU and x64 Multicore Programming with Accelerator from F# - Satnam Singh's blog at MSDN