TP

# Accelerator and F# (I.): Introduction and calculating PI I already wrote about two projects that I worked on during an internship at MSR back in 2007 (ASP.NET support in F# and F# WebTools). Even though this was more than 2 years ago (and I did one more internship at MSR in the meantime), I still have one more project that I never published on the web. The folks from the F# team reminded me of this project recently, so I thought I could finally publish it. The project used Microsoft Research Accelerator [1, 2], which is a C# library for developing array-based computations and executing them on a GPU. More recently, the Accelerator team at MSR published Accelerator v2 , which was a good motivation to update my original project...

In this article, we'll look at the simplest way of using Accelerator from F#. Accelerator provides a managed interface that can be naturally used from both C# and F#. We can use a mix of method calls and overloaded operators to describe a computation. In F#, we'll also define our additional custom operators to make the code a bit nicer. After we introduce Accelerator using a simple C# demo, we'll look how to calculate an approximate value of the PI number using a Monte-Carlo method.

## Introducing Accelerator

Accelerator v2 is implemented in native code, but ships with a managed wrapper that allows us to use it elegantly from C# and other managed languages as well. The library exposes various types that can be used for creating data-parallel computations that process 1D or 2D arrays of data. In this article, we'll work with FloatParallelArray, which represents a computation that returns a 1D array of floats (float[] in C#).

Accelerator provides overloaded operators for point-wise operations with arrays, so you can for example write res = nums1 * nums2 which represents a point-wise multiplication of two arrays (res[i] = nums1[i] * nums2[i] for each i). Aside from overloaded operators, we can use other operations with arrays that are available as static methods in the ParallelArrays class. For example the Shift method moves elements of the array to the left or to the right. Shifting elements of an array [1; 2; 3; 4] to the right by 1 will result in [1; 1; 2; 3] (the first element is left unchanged).

When writing code that calculates with the FloatParallelArray type, it is important to realize, that we're not actually running any computations yet! The type represents a computation and doesn't perform it. This means that when you write a computation that adds some arrays and performs a few other operations, Accelerator only remembers what do you want to do with the data. The computation is executed later when we create a target that knows how to perform the operations. If you're familiar with LINQ to SQL, you already know this idea - writing calculation with the FloatParallelArray type is just like writing a LINQ query. Executing the computation later corresponds to running the LINQ query as a SQL code. The two available targets allow us to run the code on a multi-core CPU (only on 64bit) and on a GPU using DirectX 9 (in this case, the code is translated to shader code).

Enough with theory for now and let's look at some actual example. I'll start with a very brief code that shows how to use Accelerator in C# and then we'll look at some more interesting F# code. Assume we have an array of float values named nums. The following example builds a computation that calculates an average value of an element and two its neighbors for each element in the array:

var input = new FloatParallelArray(nums);
var sum =
ParallelArrays.Shift(input, 1) + input +
ParallelArrays.Shift(input, -1);
var output = sum / 3.0f;


We start by creating a new FloatParallelArray instance that stores the input data for the program (this represents a constant computation). Then we write a calculation that adds three arrays - the input array and the input array shifted by 1 both to the left and to the right. Finally, on the last line, we divide the array by 3. This is also a point-wise operation, so it actually divides each element of the array by 3. Now that we have a FloatParallelArray named output, which represents the computation, we can run it using GPU:

var target = new DX9Target();
var res = target.ToArray1D(output);


The code to run the computation is quite simple. We first initialize a target. We're using DX9Target, which executes computations using GPU. If we wanted to run the operation using multi-core CPU, we could use X64MulticoreTarget (but note that this works only on x64). If you run the code, you'll get results like this:

7, 5, 5, 1, 5, 1, 0, 7, 6, 5
6, 6, 4, 4, 2, 2, 3, 4, 6, 5


If you know a little bit about F#, you probably wouldn't have any troubles using Accelerator from F#. Since F# is a fully-compatible managed language, we can use any .NET libraries in a very similar way as in C#. However, we can use a couple of tricks to make the F# code nicer and more readable than the corresponding C# version...

## Calculating PI in F#

Before we look at the calculation itself, we'll write a few lines of code that will make the rest of program nicer. This will be shared with the upcoming articles in the series and in principle, we could encapsulate the functionality in an F# library. However, I'll make the code as simple as possible and just write it again in each project.

### Using Accelerator from F#

First of all, we'll need to add reference to the Microsoft.Accelerator.dll assembly. This is just a managed wrapper for a native library that also needs to be copied to the output directory. You'll need to find an appropriate version of native Accelerator.dll in the Accelerator installation directory (either x86 or x64 version) and copy it to bin/Debug in your project. Then we can start by writing a couple of helpers:

open Microsoft.ParallelArrays

type PA = Microsoft.ParallelArrays.ParallelArrays
type FPA = Microsoft.ParallelArrays.FloatParallelArray

// Initialization of constant 2D arrays
let gridSize = 4096
let shape = [| gridSize; gridSize; |]
let zero, one = new FPA(0.0f, shape), new FPA(1.0f, shape)


We start by opening the namespace that contains all the Accelerator functionality. To make the code a little bit more succinct, we also define two type aliases. The FPA type represents a computation that returns an array of floats when executed. The PA type represents the static class which contains operations for working with these computations.

Next, we specify the size of the array we'll use during the computation - we'll work with 2D arrays of the size 4096x4096. Under the cover, this will be used as a size of a texture on the GPU that's used to perform the computation using shaders. Even though we could work with one-dimensional arrays as well, using 2D arrays is a good idea, because it can be easier processed on the GPU.

We'll also create two simple FPA instances that represent constant computations. The first one contains zeros and the second one contains ones. The second parameter of the constructor specifies the size of the array we're creating (using an array of integers with two elements). Since many of the operations that Accelerator can perform are point-wise, we'll need the two constants (zero and one) when encoding the algorithm.

As the last thing before looking at the actual computation, we'll also write three functions that wrap functionality that's available using static methods in C# (we could of course use static methods directly, but using functions will lead to a more idiomatic F# code):

// Calculating with 2D float arrays
let sqrt (a:FPA) = PA.Sqrt(a)
let sum (a:FPA) = PA.Sum(a)
let select (cond:FPA) (a:FPA) (b:FPA) = PA.Select(cond,a,b)


The sqrt function is a point-wise operation that calculates square roots of all elements of the array. sum is a little bit more interesting - it adds all numbers from the source array and returns an array that contains just a single element, which is the sum (for example sum of [1; 2; 3; 4] will be an array ).

The most interesting operation is probably the select function. Just like others, it is a point-wise operation. It takes three arrays of floats and expects arrays of the same size. It compares the value of the first array (cond) with zero. If the value is greater than or equal to zero, it returns a value from the first array (a), otherwise it returns value from b For a one-dimensional array, this corresponds to the following C# code: res[i] = cond[i]>=0 ? a[i] : b[i] (for each index i). Now, let's finally move on to the PI number calculation...

### Calculating PI using Monte Carlo

We'll implement the PI calculation using a Monte Carlo method. The idea of the algorithm is that we'll generate a large number of random points in a square and then count how many of them are inside a circle inscribed in the square. You can get a better idea by looking at the image at the beginning of the article - the image shows a circle inside a window and a large number of randomly generated points. Points inside the circle are in green and other points are in red. We'll need to count the fraction of points that are inside the circle.

To start, we'll need to generate a large number of random points. We'll generate two 2D arrays with random values to represent X and Y coordinates of points. This can be done using Array2D.init function, which initializes an array using the function that calculates an element value (in our example, the function is called generator):

let rnd = new Random()
let generator x y = float32(rnd.NextDouble())

let px = new FPA(Array2D.init gridSize gridSize generator)
let py = new FPA(Array2D.init gridSize gridSize generator)


Once we have .NET arrays with random values, we use them to create constant computations represented as the FloatParallelArray type from Accelerator. Next, we'll write a function that takes a target (an object that knows how to evaluate FloatParallelArray computations) and calculates an approximate value of PI using our random points:

let calculatePi (target:Target) =
// Calculate distances of points from the center
let pxc, pyc = (px - 0.5f) * 2.0f, (py - 0.5f) * 2.0f
let distances = sqrt ((pxc * pxc) + (pyc * pyc))

// Count points with distance less than one
let inCircle = select (one - distances) one zero
let count = sum(inCircle)

// Evaluate the count and calculate PI
let result = target.ToArray1D(count)
result. / (float32(gridSize * gridSize) / 4.0f)


The first two blocks of code work with the FPA type, which means that we're just composing a computation, which we'll execute later. We start by adjusting the random values a little bit. The numbers are in range 0 to +1, so we subtract 0.5 and multiply the values by 2, to get arrays with values in range -1 to +1. Next, we create a new array that represents distances of points from the point (0, 0). Note that each i, j coordinate in the 2D array represents a single point.

Once we have the distances value, we want to count the number of points that are inside the circle - this means points with distance less than or equal to 1. This can be implemented using a trick. We subtract distances from one and get an array with value larger than or equal to 0 when the point is inside the circle. Then we use the select function to get 1 for points inside the circle and 0 for all other points (as a reminder - this use of the select function corresponds to the following C# code: (one - distances)[i] >= 0 ? one[i] : zero[i]). If we then sum all values inside the 2D array, we'll get the count of points for which the result was 1 (meaning points inside the circle).

As the last step, we need to calculate PI from the number of points inside the circle and the total number of points inside the square. The result of summing is stored in an array, because Accelerator works with arrays. However, it contains only one element, so we get the value using result.. Then we divide it by the total number of points and also by 4, because our square was of size 2 by 2 (values ranging from -1 to +1). And that's our value of PI! The last step is to actually run the computation using Accelerator:

let dxTarget = new DX9Target()
let pi = calculatePi(dxTarget)
printfn "Approximate PI value: %f" pi


As you can see, running the calculation is now pretty simple. We created an instance of DX9Target to run the computation on GPU and passed it to our function that does all the work. Alternatively, we could also use X64MulticoreTarget to run the computation in parallel on a multi-core CPU.

## Summary

This article was a brief introduction of a very interesting Accelerator library and a demonstration of using this library from F# in the simples possible way. We've seen how to use the FloatParallelArray, which is a core type representing a computation in Accelerator and we've implemented a PI calculation using it. In the next article of this series we'll look at one more example - a famous Conway's Game of Life. Later, we'll also look how to run "usual" F# code (meaning, code that doesn't use FloatParallelArray explicitly) with Accelerator using F# quotations.