Keep your multi-core CPU busy with F#
The growth of computer CPU speed is slowly being replaced by the growth of number of CPUs (or cores) in the computer at least for the close future. This causes a revolution in the way software is written, because traditional and most widely used way of writing concurrent applications using threads is difficult and brings several serious issues. Some predictions say that within a few years, almost every computer will have about 16 cores, so there is a huge need for programming paradigms or idioms that help developers write concurrent software easily (see also The Free Lunch Is Over [^] written by Herb Sutter).
Functional programming languages (especially pure functional languages) are interesting from this point of view, because the program doesn't have side-effects which makes it very easy to parallelize it (programs in pure functional languages can't have any side-effects by design, in other functional languages like F# the side-effects can be eliminated by following functional programming style).
Finding primes example
Let's say I want to find all prime numbers between 1 million and 1.1 million (this is example of
an operation that can be nicely divided between more processors). First we will need function to test
whether n is a prime:
// Tests whether n is prime - expects n > 1 let is_prime n = // calculate how large divisors should we test.. let max = int_of_float (Math.Sqrt( float_of_int n )) // try to divide n by 2..max (stops when divisor is found) not ({ 2 .. max } |> Seq.filter ( fun d -> n%d = 0) |> Seq.nonempty)
To find all primes in the specified range we could use the following code:
let primes = [1000000 .. 1100000] |> List.filter is_prime
This code subsequently executes the is_prime function for all numbers that we want to test,
but with multiple CPU cores it would be nice if the function divided the numbers into
several parts and executed every part on different thread, so application would take benefit from multiple
cores available in the system. The is_prime function doesn't have any side-effects so
executing it several times in parallel can't change the result of operation (if the order of primes in the
returned list doesn't change).
I wrote a function that does exactly what I described in the previous paragraph. To execute the operation
in parallel, you can use the following code (the only difference is that I replaced List module with
my ParallelList module):
let primes = [1000000 .. 1100000] |> ParallelList.filter is_prime
On my notebook (with Intel Core Duo processor) the first code (using List.filter) takes about
2,3sec and using ParallelList.filter the operation takes only 1.3sec.
The program isn't 2 times faster, because there is some overhead for creating and synchronizing threads, but in
this case the speed increase is significant.
Aside from the filter function, the ParallelList module contains also the
map function (which does the same thing as the List.map). There is also
a function set_thread_count that you can use to configure how many threads should be used when
executing parallel operations (the default value is 2).
Performance and future work
The performance of these functions is the key issue. Currently the ParallelList functions work
can't be used for small number of repetitions of simple function, because the overhead is larger than the profit
from parallel execution. If the operation takes less than 0.01ms and the number of repetitions is less than 1000
the List functions are usually better, but for operations taking longer time the results of ParallelList
are better even with smaller number of repetitions. For operation taking about 0.1ms the ParallelList
gives better results for more than 200 repetitions and for operation taking more than 1ms the number of repetitions is
not very important (ParallelList is better even for 10 repetitions). These are inaccurate results that I got
on my notebook, so if you're thinking of using the ParallelList, be sure to do some tests in your scenario!
You can see some tests that I did in the demo project in stats.fs source file.
As I said earlier, ParallelList supports only filter and map functions, so implementing
more functions would be useful. It would be also useful to provide some alternatives for functions that can't
be executed in parallel (like fold_left) that could be used in some situations. I'd also like to implement
functions for working with other collection types like Seq (IEnumerable) and array
in the future.
I'm interested in your ideas and suggestions, so if you find something that could be improved in the code, or if you have any other idea, let me know!
Downloads
Tags: F# and MSR
(Published: March 24, 2007 23:13)
- RE: Keep your multi-core CPU busy with F# by Tom Kirby-Green (3/25/2007 8:04:35 AM)
Very interesting stuff, timely and useful. I'm looking forward to seeing where you take this. It's clearly PLINQ-like - but available now rather than 2-3 years out! Good luck with the project! :-)
- RE: Keep your multi-core CPU busy with F# by Jules (3/25/2007 1:08:43 PM)
Fold can be parallelized for associative functions. A ParallelList.fold would be cool :)
- RE: Keep your multi-core CPU busy with F# by Jonathan Allen (3/26/2007 1:27:41 AM)
Correct me if I'm wrong, but isn't it better to check only previously found primes than to try every possible divisor?
- RE: Keep your multi-core CPU busy with F# by Tomas (3/26/2007 11:35:01 PM)
Tom: Yes.. it is very similar to what PLINQ will probably do (I guess that PLINQ was inspied by functional programmin too :-))
- RE: Keep your multi-core CPU busy with F# by Tomas (3/26/2007 11:38:39 PM)
Jules: Thanks for the suggestion, I will add Fold for associative functions. It could be also useful to implement the following fold alternative:
// first 'fold' parts of the list (using first function)
// than join the results from every part
val fold_join : ('b -> 'a -> 'b) -> ('b -> 'b -> 'b) -> 'b -> 'a list - RE: Keep your multi-core CPU busy with F# by Tomas (3/26/2007 11:50:45 PM)
Jonathan: Yes.. testing only primes as a divisors would be better, but I'd have to find all primes from 1 to X (which might take some time when testing numbers around 1M). For finding primes in the interval starting from 1 the algorithm called 'Sieve of Eratosthenes' would be better.
Send comments to this article
Hyperlinks are detected automatically and you can use **bold** and __italic__ to format your comment.


