September 13th, 2008

Cilk: multithreaded programming model of the future

Many years back, I read with great interest about the Cilk project at MIT, headed up by Charles Leiserson (that is: the L in CLR, Corman, Leiserson, and Rivest, authors of "Introduction to Algorithms"). I was a bit saddened when I looked back at that work a year or so ago and saw very little had been done recently. And it seemed that the world seemed to be standardizing on OpenMP, which to me, really seems an obviously inferior solution.

So, I was really excited when I noticed today that Cilk was not standing still, they were just working on starting a spin-off company, Cilk Arts, to further develop and commercialize the technology. Apparently it now is now based on C++ instead of C, and is implemented both for GCC and for MS VC++.

I find it quite interesting, because it could make it possible for mere mortals to write efficient and correct multithreaded algorithms. The genius is the simplicity. There's two new basic keywords added to C: cilk_spawn, and cilk_sync. Spawn spawns a new thread, and sync waits for all threads spawned by the current function to complete. That's basically the whole system. They've also added a cilk_for, which more efficiently implements spawning off each iteration of the loop.

Here's a trivial program from their examples:
int fib (int n) {
      if (n<2) return (n);
      else {
        int x,y;
        x = cilk_spawn fib(n-1);
        y = fib(n-2);
        cilk_sync;
        return (x+y);
        }
   }



You might be thinking right now: okay, so what? I can implement those macros, there's no cleverness there. But, if you just go off and literally implement what I said, you'll quickly find that the overhead of thread spawning and waiting overshadows any possible efficiency gain you might get.

But, not so in Cilk: spawn is incredibly cheap to execute. Why? Because it doesn't actually spawn off another thread. Instead, it marks the current location as a place where another thread can start executing, if it has run out of its own work to do. And, here is the great part: when some other thread is idle, it will steal work from the outer-most stack frame of the current thread at a "spawn" location. Thus ensuring that the other thread can do the most work on its own before having to come back to get more work. They call this the work-stealing scheduler. Think about that one for a while.

It means you can start adding parallelism to almost any part of your program, even fairly small loops or recursive functions, without worrying excessively that the overhead of spawning off threads will erase any gains from parallelism. And as you manage to push your way outwards, adding parallelism to larger parts of your application, as you verify that they don't mutate global state, you don't need to remove the spawning, either. You don't need to know where is most efficient to spawn off worker threads, you can just do it everywhere.

For me, the biggest value here is that it can help solve the problem of converting an existing serial program to a parallel one. Let's say you have an giant program, that mutates global state all over the place, shares state everywhere, and you despair at the prospect of ever having it efficiently use multiple processors. Now, Cilk won't really help you clean up your program. But what does promise to do is, as you incrementally clean up small pieces, it allows you to add the parallelism right there, in that small piece, and have your program actually gain efficiency as you do so. And, if eventually you manage to clean up larger parts of the program, and later add parallelism to an outer-more loop, it won't require you to go back and remove it from the inner loop.

And, that's a Big Deal.

I encourage you to read more on their website. The docs for Cilk++ are public, and there are links to a bunch of papers written over the years. Unfortunately, it seems Cilk++ itself isn't publicly available yet (and I haven't tried it either), but the earlier research project Cilk, upon which Cilk++ is based, is available.

Now, if someone would just port it to Common Lisp. :)