Keep your multi-core CPU busy with F#

The growth of computer CPU speed is slowly being replaced by the growth of number of CPUs (or cores) in the computer at least for the close future. This causes a revolution in the way software is written, because traditional and most widely used way of writing concurrent applications using threads is difficult and brings several serious issues. Some predictions say that within a few years, almost every computer will have about 16 cores, so there is a huge need for programming paradigms or idioms that help developers write concurrent software easily (see also The Free Lunch Is Over [^] written by Herb Sutter).

Functional programming languages (especially pure functional languages) are interesting from this point of view, because the program doesn't have side-effects which makes it very easy to parallelize it (programs in pure functional languages can't have any side-effects by design, in other functional languages like F# the side-effects can be eliminated by following functional programming style).

Finding primes example

Let's say I want to find all prime numbers between 1 million and 1.1 million (this is example of an operation that can be nicely divided between more processors). First we will need function to test whether n is a prime:

// Tests whether n is prime - expects n > 1
let is_prime n =
  // calculate how large divisors should we test..
  let max = int_of_float (Math.Sqrt( float_of_int n ))
  // try to divide n by 2..max (stops when divisor is found)
  not ({ 2 .. max } |> Seq.filter ( fun d -> n%d = 0) |> Seq.nonempty)

To find all primes in the specified range we could use the following code:

let primes = [1000000 .. 1100000] |> List.filter is_prime

This code subsequently executes the is_prime function for all numbers that we want to test, but with multiple CPU cores it would be nice if the function divided the numbers into several parts and executed every part on different thread, so application would take benefit from multiple cores available in the system. The is_prime function doesn't have any side-effects so executing it several times in parallel can't change the result of operation (if the order of primes in the returned list doesn't change).

I wrote a function that does exactly what I described in the previous paragraph. To execute the operation in parallel, you can use the following code (the only difference is that I replaced List module with my ParallelList module):

let primes = [1000000 .. 1100000] |> ParallelList.filter is_prime

On my notebook (with Intel Core Duo processor) the first code (using List.filter) takes about 2,3sec and using ParallelList.filter the operation takes only 1.3sec. The program isn't 2 times faster, because there is some overhead for creating and synchronizing threads, but in this case the speed increase is significant.

Aside from the filter function, the ParallelList module contains also the map function (which does the same thing as the There is also a function set_thread_count that you can use to configure how many threads should be used when executing parallel operations (the default value is 2).

Performance and future work

The performance of these functions is the key issue. Currently the ParallelList functions work can't be used for small number of repetitions of simple function, because the overhead is larger than the profit from parallel execution. If the operation takes less than 0.01ms and the number of repetitions is less than 1000 the List functions are usually better, but for operations taking longer time the results of ParallelList are better even with smaller number of repetitions. For operation taking about 0.1ms the ParallelList gives better results for more than 200 repetitions and for operation taking more than 1ms the number of repetitions is not very important (ParallelList is better even for 10 repetitions). These are inaccurate results that I got on my notebook, so if you're thinking of using the ParallelList, be sure to do some tests in your scenario! You can see some tests that I did in the demo project in stats.fs source file.

As I said earlier, ParallelList supports only filter and map functions, so implementing more functions would be useful. It would be also useful to provide some alternatives for functions that can't be executed in parallel (like fold_left) that could be used in some situations. I'd also like to implement functions for working with other collection types like Seq (IEnumerable) and array in the future.

I'm interested in your ideas and suggestions, so if you find something that could be improved in the code, or if you have any other idea, let me know!


Discuss on twitter, .
Send corrections via GitHub pull requests.