Accelerator and F# (I.): Introduction and calculating PI
I already wrote about two projects that I worked on during an internship at MSR back in 2007 (ASP.NET support in F# and F# WebTools). Even though this was more than 2 years ago (and I did one more internship at MSR in the meantime), I still have one more project that I never published on the web. The folks from the F# team reminded me of this project recently, so I thought I could finally publish it. The project used Microsoft Research Accelerator [1, 2], which is a C# library for developing array-based computations and executing them on a GPU. More recently, the Accelerator team at MSR published Accelerator v2 [3], which was a good motivation to update my original project...
In this article, we'll look at the simplest way of using Accelerator from F#. Accelerator provides a managed interface that can be naturally used from both C# and F#. We can use a mix of method calls and overloaded operators to describe a computation. In F#, we'll also define our additional custom operators to make the code a bit nicer. After we introduce Accelerator using a simple C# demo, we'll look how to calculate an approximate value of the PI number using a Monte-Carlo method.
This article is the first one from a series about using Accelerator from F#. The list of articles may change, but here is a list of articles that I'm currently planning to write:
- Accelerator and F# (I.): Introduction and calculating PI
- Accelerator and F# (II.): The Game of Life on GPU
- Accelerator and F# (III.): Data-parallel programs using F# quotations
- Accelerator and F# (IV.): Composing computations with quotations
Introducing Accelerator
Accelerator v2 is implemented in native code, but ships with a managed wrapper that
allows us to use it elegantly from C# and other managed languages as well. The library
exposes various types that can be used for creating data-parallel computations that
process 1D or 2D arrays of data. In this article, we'll work with FloatParallelArray
,
which represents a computation that returns a 1D array of floats (float[]
in C#).
Accelerator provides overloaded operators for point-wise operations with arrays,
so you can for example write res = nums1 * nums2
which represents a point-wise
multiplication of two arrays (res[i] = nums1[i] * nums2[i]
for each i
).
Aside from overloaded operators, we can use other operations with arrays that are available
as static methods in the ParallelArrays
class. For example the Shift
method moves elements of the array to the left or to the right. Shifting elements of an array
[1; 2; 3; 4]
to the right by 1 will result in [1; 1; 2; 3]
(the first
element is left unchanged).
When writing code that calculates with the FloatParallelArray
type, it is
important to realize, that we're not actually running any computations yet! The type
represents a computation and doesn't perform it. This means that when you write
a computation that adds some arrays and performs a few other operations, Accelerator only
remembers what do you want to do with the data. The computation is executed later
when we create a target that knows how to perform the operations. If you're
familiar with LINQ to SQL, you already know this idea - writing calculation with the
FloatParallelArray
type is just like writing a LINQ query. Executing the
computation later corresponds to running the LINQ query as a SQL code. The two
available targets allow us to run the code on a multi-core CPU (only on 64bit) and on
a GPU using DirectX 9 (in this case, the code is translated to shader code).
Enough with theory for now and let's look at some actual example. I'll start with a very
brief code that shows how to use Accelerator in C# and then we'll look at some more interesting
F# code. Assume we have an array of float
values named nums
.
The following example builds a computation that calculates an average value of an element
and two its neighbors for each element in the array:
var input = new FloatParallelArray(nums);
var sum =
ParallelArrays.Shift(input, 1) + input +
ParallelArrays.Shift(input, -1);
var output = sum / 3.0f;
We start by creating a new FloatParallelArray
instance that stores the
input data for the program (this represents a constant computation). Then we write a
calculation that adds three arrays - the input array and the input array shifted
by 1 both to the left and to the right. Finally, on the last line, we divide the
array by 3. This is also a point-wise operation, so it actually divides each element
of the array by 3. Now that we have a FloatParallelArray
named output
,
which represents the computation, we can run it using GPU:
var target = new DX9Target();
var res = target.ToArray1D(output);
The code to run the computation is quite simple. We first initialize a target.
We're using DX9Target
, which executes computations using GPU. If we wanted
to run the operation using multi-core CPU, we could use X64MulticoreTarget
(but note that this works only on x64). If you run the code, you'll get results like this:
7, 5, 5, 1, 5, 1, 0, 7, 6, 5 6, 6, 4, 4, 2, 2, 3, 4, 6, 5
If you know a little bit about F#, you probably wouldn't have any troubles using Accelerator from F#. Since F# is a fully-compatible managed language, we can use any .NET libraries in a very similar way as in C#. However, we can use a couple of tricks to make the F# code nicer and more readable than the corresponding C# version...
Calculating PI in F#
Before we look at the calculation itself, we'll write a few lines of code that will make the rest of program nicer. This will be shared with the upcoming articles in the series and in principle, we could encapsulate the functionality in an F# library. However, I'll make the code as simple as possible and just write it again in each project.
Using Accelerator from F#
First of all, we'll need to add reference to the Microsoft.Accelerator.dll
assembly. This is just a managed wrapper for a native library that also needs to
be copied to the output directory. You'll need to find an appropriate version of
native Accelerator.dll
in the Accelerator installation directory
(either x86 or x64 version) and copy it to bin/Debug
in your project.
Then we can start by writing a couple of helpers:
open Microsoft.ParallelArrays
type PA = Microsoft.ParallelArrays.ParallelArrays
type FPA = Microsoft.ParallelArrays.FloatParallelArray
// Initialization of constant 2D arrays
let gridSize = 4096
let shape = [| gridSize; gridSize; |]
let zero, one = new FPA(0.0f, shape), new FPA(1.0f, shape)
We start by opening the namespace that contains all the Accelerator functionality.
To make the code a little bit more succinct, we also define two type aliases. The
FPA
type represents a computation that returns an array of floats when
executed. The PA
type represents the static class which contains
operations for working with these computations.
Next, we specify the size of the array we'll use during the computation - we'll work with 2D arrays of the size 4096x4096. Under the cover, this will be used as a size of a texture on the GPU that's used to perform the computation using shaders. Even though we could work with one-dimensional arrays as well, using 2D arrays is a good idea, because it can be easier processed on the GPU.
We'll also create two simple FPA
instances that represent
constant computations. The first one contains zeros and the second one contains
ones. The second parameter of the constructor specifies the size of the array
we're creating (using an array of integers with two elements). Since many of the
operations that Accelerator can perform are point-wise, we'll need the two
constants (zero
and one
) when encoding the algorithm.
As the last thing before looking at the actual computation, we'll also write three functions that wrap functionality that's available using static methods in C# (we could of course use static methods directly, but using functions will lead to a more idiomatic F# code):
// Calculating with 2D float arrays
let sqrt (a:FPA) = PA.Sqrt(a)
let sum (a:FPA) = PA.Sum(a)
let select (cond:FPA) (a:FPA) (b:FPA) = PA.Select(cond,a,b)
The sqrt
function is a point-wise operation that calculates
square roots of all elements of the array. sum
is a little bit more
interesting - it adds all numbers from the source array and returns an array that
contains just a single element, which is the sum (for example sum of [1; 2; 3; 4]
will be an array [10]
).
The most interesting operation is probably the select
function.
Just like others, it is a point-wise operation. It takes three arrays of floats
and expects arrays of the same size. It compares the value of the first array (cond
)
with zero. If the value is greater than or equal to zero, it returns a value from
the first array (a
), otherwise it returns value from b
For a one-dimensional array, this corresponds to the following C# code:
res[i] = cond[i]>=0 ? a[i] : b[i]
(for each index i
).
Now, let's finally move on to the PI number calculation...
Calculating PI using Monte Carlo
We'll implement the PI calculation using a Monte Carlo method. The idea of the algorithm is that we'll generate a large number of random points in a square and then count how many of them are inside a circle inscribed in the square. You can get a better idea by looking at the image at the beginning of the article - the image shows a circle inside a window and a large number of randomly generated points. Points inside the circle are in green and other points are in red. We'll need to count the fraction of points that are inside the circle.
To start, we'll need to generate a large number of random points. We'll generate
two 2D arrays with random values to represent X and Y coordinates of points. This can
be done using Array2D.init
function, which initializes an array
using the function that calculates an element value (in our example, the function
is called generator
):
let rnd = new Random()
let generator x y = float32(rnd.NextDouble())
let px = new FPA(Array2D.init gridSize gridSize generator)
let py = new FPA(Array2D.init gridSize gridSize generator)
Once we have .NET arrays with random values, we use them to create
constant computations represented as the FloatParallelArray
type
from Accelerator. Next, we'll write a function that takes a target
(an object that knows how to evaluate FloatParallelArray
computations)
and calculates an approximate value of PI using our random points:
let calculatePi (target:Target) =
// Calculate distances of points from the center
let pxc, pyc = (px - 0.5f) * 2.0f, (py - 0.5f) * 2.0f
let distances = sqrt ((pxc * pxc) + (pyc * pyc))
// Count points with distance less than one
let inCircle = select (one - distances) one zero
let count = sum(inCircle)
// Evaluate the count and calculate PI
let result = target.ToArray1D(count)
result.[0] / (float32(gridSize * gridSize) / 4.0f)
The first two blocks of code work with the FPA
type, which
means that we're just composing a computation, which we'll execute later.
We start by adjusting the random values a little bit. The numbers are
in range 0 to +1, so we subtract 0.5 and multiply the values by 2, to get
arrays with values in range -1 to +1. Next, we create a new array that
represents distances of points from the point (0, 0). Note that each
i, j
coordinate in the 2D array represents a single point.
Once we have the distances
value, we want to count the
number of points that are inside the circle - this means points with
distance less than or equal to 1. This can be implemented using a trick.
We subtract distances
from one
and get an array
with value larger than or equal to 0 when the point is inside the circle.
Then we use the select
function to get 1 for points inside
the circle and 0 for all other points (as a reminder - this use of
the select
function corresponds to the following C# code:
(one - distances)[i] >= 0 ? one[i] : zero[i]
). If we
then sum all values inside the 2D array, we'll get the count of points
for which the result was 1 (meaning points inside the circle).
As the last step, we need to calculate PI from the number of points
inside the circle and the total number of points inside the square. The result
of summing is stored in an array, because Accelerator works with arrays.
However, it contains only one element, so we get the value using result.[0]
.
Then we divide it by the total number of points and also by 4, because our
square was of size 2 by 2 (values ranging from -1 to +1). And that's our
value of PI! The last step is to actually run the computation using
Accelerator:
let dxTarget = new DX9Target()
let pi = calculatePi(dxTarget)
printfn "Approximate PI value: %f" pi
As you can see, running the calculation is now pretty simple. We created an
instance of DX9Target
to run the computation on GPU and passed it
to our function that does all the work. Alternatively, we could also use X64MulticoreTarget
to run the computation in parallel on a multi-core CPU.
Summary
This article was a brief introduction of a very interesting Accelerator library
and a demonstration of using this library from F# in the simples possible way.
We've seen how to use the FloatParallelArray
, which is a core type
representing a computation in Accelerator and we've implemented a PI calculation
using it. In the next article of this series we'll look at one more example - a
famous Conway's Game of Life. Later, we'll also look how to run "usual" F# code
(meaning, code that doesn't use FloatParallelArray
explicitly) with
Accelerator using F# quotations.
Downloads and References
- Download the PI calculation sample (ZIP, 5.5kB)
- [1] Accelerator: Using Data Parallelism to Program GPUs for General-Purpose Uses - Microsoft Research
- [2] Accelerator Project Homepage - Microsoft Research
- [3] Microsoft Research Accelerator v2 - Microsoft Connect
- [4] Monte Carlo method - Wikipedia.org
- [5] GPGPU and x64 Multicore Programming with Accelerator from F# - Satnam Singh's blog at MSDN
Published: Monday, 21 December 2009, 3:21 AM
Author: Tomas Petricek
Typos: Send me a pull request!
Tags: functional, academic, meta-programming, accelerator, f#, math and numerics