Log in

No account? Create an account
13 September 2008 @ 04:14 pm
Cilk: multithreaded programming model of the future  
Many years back, I read with great interest about the Cilk project at MIT, headed up by Charles Leiserson (that is: the L in CLR, Corman, Leiserson, and Rivest, authors of "Introduction to Algorithms"). I was a bit saddened when I looked back at that work a year or so ago and saw very little had been done recently. And it seemed that the world seemed to be standardizing on OpenMP, which to me, really seems an obviously inferior solution.

So, I was really excited when I noticed today that Cilk was not standing still, they were just working on starting a spin-off company, Cilk Arts, to further develop and commercialize the technology. Apparently it now is now based on C++ instead of C, and is implemented both for GCC and for MS VC++.

I find it quite interesting, because it could make it possible for mere mortals to write efficient and correct multithreaded algorithms. The genius is the simplicity. There's two new basic keywords added to C: cilk_spawn, and cilk_sync. Spawn spawns a new thread, and sync waits for all threads spawned by the current function to complete. That's basically the whole system. They've also added a cilk_for, which more efficiently implements spawning off each iteration of the loop.

Here's a trivial program from their examples:
int fib (int n) {
      if (n<2) return (n);
      else {
        int x,y;
        x = cilk_spawn fib(n-1);
        y = fib(n-2);
        return (x+y);

You might be thinking right now: okay, so what? I can implement those macros, there's no cleverness there. But, if you just go off and literally implement what I said, you'll quickly find that the overhead of thread spawning and waiting overshadows any possible efficiency gain you might get.

But, not so in Cilk: spawn is incredibly cheap to execute. Why? Because it doesn't actually spawn off another thread. Instead, it marks the current location as a place where another thread can start executing, if it has run out of its own work to do. And, here is the great part: when some other thread is idle, it will steal work from the outer-most stack frame of the current thread at a "spawn" location. Thus ensuring that the other thread can do the most work on its own before having to come back to get more work. They call this the work-stealing scheduler. Think about that one for a while.

It means you can start adding parallelism to almost any part of your program, even fairly small loops or recursive functions, without worrying excessively that the overhead of spawning off threads will erase any gains from parallelism. And as you manage to push your way outwards, adding parallelism to larger parts of your application, as you verify that they don't mutate global state, you don't need to remove the spawning, either. You don't need to know where is most efficient to spawn off worker threads, you can just do it everywhere.

For me, the biggest value here is that it can help solve the problem of converting an existing serial program to a parallel one. Let's say you have an giant program, that mutates global state all over the place, shares state everywhere, and you despair at the prospect of ever having it efficiently use multiple processors. Now, Cilk won't really help you clean up your program. But what does promise to do is, as you incrementally clean up small pieces, it allows you to add the parallelism right there, in that small piece, and have your program actually gain efficiency as you do so. And, if eventually you manage to clean up larger parts of the program, and later add parallelism to an outer-more loop, it won't require you to go back and remove it from the inner loop.

And, that's a Big Deal.

I encourage you to read more on their website. The docs for Cilk++ are public, and there are links to a bunch of papers written over the years. Unfortunately, it seems Cilk++ itself isn't publicly available yet (and I haven't tried it either), but the earlier research project Cilk, upon which Cilk++ is based, is available.

Now, if someone would just port it to Common Lisp. :)
(Anonymous) on September 14th, 2008 12:54 am (UTC)
Cilk++ is (almost) available
Hi James,

Thank you for the post.

In fact, Cilk++ is "sorta" available - we have been alpha testing the product with friends and design partners, and there's room available in our Early Visibility Program: http://www.cilk.com/Home/sign-up-page

Cilk++ fits best for applications that have the following characteristics:
1) Application performance is compute bound
2) Performance can be improved by accelerating serial (i.e., single-threaded) portions of the app
3) The application is written in C/C++, and can be compiled with a Visual Studio or GNU compiler

(No experience with parallel programming is needed to use Cilk++)


Jp Calderonejcalderone on September 15th, 2008 03:05 pm (UTC)
Thanks for the post. I think I've heard of Cilk before but I never bothered to really look at it. I read a bunch of the documentation last night and it seems pretty cool. Of course, I'm not *so* interested in concurrency models for C or C++, but I am trying to think of ways this technique might be applied to certain other languages/runtimes. There may be insurmountable boundaries to implementing this idea for CPython (aside from using it in extension modules written in C) but maybe they could be viable on PyPy.
(Anonymous) on August 18th, 2009 06:13 am (UTC)
Using plain C with Cilk++
Does Cilk++ work with plain C code that has been "cilkified"? I'm interested in eventually using the cilk_for utility in some legacy C code. But the fibonacci example is not working with cilk++ (using the new cilk++ underscored keywords, of course). It won't be too hard to spawn off my for loops in plain cilk, but I'm sure that cilk_for would do it better.