Parallel map: more JRuby concurrency mischief

Like my last post this is more for my future benefit but if anybody else finds it useful then that's cool too. Unlike the last one the fruits of my tinkering yielded a nice linear speedup.

Ok, let's parallelize Array#map. We'll break down the task as follows:

  1. Split the array into chunks
  2. Execute the chunks in asynchronously, in parallel, waiting for them all to complete
  3. Merge the chunks into a new array and return it

How many chunks is optimal? There's no definitive answer; In the past I've opted for a very large number of small sub-arrays, e.g. for concurrent divide & conquer reductions where the minimal array length was some low power of the number of processors (I've played with associative reduction algorithms in the past). For our #pmap method I'm just going to split the original array up into as many chunks as there are logical cores on my machine. How do you find that out in JRuby? Java to the rescue again:

Now we need a pool of workers to assign tasks to. As parallel mapping is strictly CPU bound, a thread pool with fixed thread count but an unbounded work queue is probably most appropriate:

That's our ExecutorService up & running, we just need to do a bit of housekeeping before we can write our Array#pmap method. This is where Java's baroque-complexity boilerplate rears its ugly head (wouldn't it be nice if Executors could take lambdas as arguments for mass invocation?). Basically we implement the Callable interface- I instantiate my Task implementation with a block which the executor calls when it executes:

So now we're good to go. Array#pmap here takes an executor as a first argument because I didn't want the class to be responsible for starting / shutting down the work queue, but that's just an implementation decision.

Once the original array is mapped to a set of Task classes, they can be handed to the executor. I call ExecutorService#invokeAll because it blocks until all the submitted work is done. It returns an array of FutureTasks which can be dereferenced immediately (we want the method to return something!):

So does all that tomfoolery actually buy you any more performance? Time for a highly unscientific benchmark, incrementing an array of the first million Fixnums:

Not bad- I ran it on a Sandy Bridge i5 with 2 cores and 4 CPU threads.

More important than a cheesy parallel mapping imeplementation, I've learned (or is that re-learned?) two axioms about playing with concurrency in JVM hosted languages:

  1. As in the previous post, executors expect some kind of anonymous inner class as an argument in the absence of closures. You need to be aware of the cost of converting a ruby closure to a Java Runnable, Callable, Future etc; think of I/O-bound problems where allocating extra objects for each request/event is almost certainly not a good thing. You'll definitly save time writing in straight Java as you won't have to worry whether or not the closure you just used is going to throw all your performance gains out the window.
  2. Don't discount the JVM's ability to make a fool of your benchmarks. The one above isn't worth the screen real-estate it occupies due to its simplicity and duration relative to startup / shutdown time of the VM. Both JIT and Server modes take time to hit a 'quiescent state' where performance stabilises. Don't just look at wall-time, profile your code properly. Lots has been said about this elsewhere so I won't rehash what others have articulated better, but be aware HotSpot has its own performance quirks.

The example in its entirety is here.

Filed under  //   concurrency   jruby   ruby  

Comments [0]

Easy thread safety with JRuby

EDIT: Beauty, as they say, is pain. In exchange for liberating your code from locks, piling the work onto a queue is approximately ~10x slower than a traditional approach. A quick profile suggests that the expense of block creation is non-trivial. It's still nicer to look at though, right?

More for my benefit, but it's handy that JRuby lets you use asynchronous Java queues. Want to make something like this thread-safe?

Just wrap the assignment in a Runnable (JRuby coerces blocks/Procs/lambdas into Runnables for you) and submit it to a single-threaded Executor. All calls to inc and dec execute one at a time, in-order:

Tasks are applied FIFO, @value stays consistent without locks- sound familiar?

Filed under  //   ruby  

Comments [0]

Notable omissions in the James-Joyce CS Section

Don't get me wrong- the JJ has most of the works you'll need as a CS undergrad, especially if Java's the only thing you're exposed to (although what that says about the reader I leave as an exercise). However it's not that hard to push the boundaries and find a few glaring seminal works either omitted completely or severely lacking in quantity.

It's not sufficient to have a solitary copy of a significant text and 20 copies of a lesser one when the latter is not only objectively better but more likely to age well, a particularly acute problem in a CS section. "Underwater basket-weaving Synergies with Java" or whatever might get mediocre kids through exams in 2011, and maybe that's all the CSI department cares about, in which case I'm wasting my time convincing them that shelling out for a few more copies of SICP every so often would get a decent return on investment. But on the off-chance that they give a shit about something other than producing fodder for cube-farms I'm compiling a list of what I think is currently found wanting.

An omission qualifies as the absence or shortage of physical copy from the shelf; I don't care if they have a link to an electronic copy on the catalogue search, I'm referring to dead trees:

If When I find any more I'll update the list.

Filed under  //   diatribe  

Comments [1]

Avoiding stack overflow in Ruby with trampolines

Languages that don't support tail-call elimination out of the box can make people think twice about using recursion. Ever received a SystemStackError when doing something like this?

Ok, the example's contrived, but you've no doubt come across cases where you'd like to avoid using an iterative solution even if it's just for the sake of elegance. Luckily in Ruby this can be easily fixed by:

  1. Having your method return a thunk rather than a direct tail-call
  2. Using a trampoline to avoid growing the stack

A thunk is essentially just the suspended application of a function. Taking our example, it's just a matter of transforming the tail call like so:

The trampoline implementation is equally trivial. It takes a thunk as an argument and iteratively calls it until something other than a continuation is returned:

So now if we start the ball rolling with our trampoline call:

We can run increment as long as we like without blowing the stack. As there's no base case it'll continue indefinitely- a worst-case example. Using continuation-passing here trades a time-penalty in higher-order function calls for the advantage of not increasing the amount of space needed to perform the computation. Not too hard really, is it?

Alternatively we could use Kernel#callcc to replay the computation until we have a return value. Here we get a continuation object to use with callcc { ... }, give @k the correct execution context by reassigning the thunk as before, then call it until we're done:

If you want to get a proper grasp of CPS you could do a lot worse than get hold of Dybvig's book and go through all the call/cc examples.

Filed under  //   ruby  

Comments [2]

LiveConnect on OS X Lion

For anybody who's ever had to get java <-> javascript interaction working but was wondering why Apple's JDK couldn't find netscape.javascript.JSObject, the elusive plugin.jar file is kept in /Library/Java/Home/lib. If you install it into your local maven repository you can get yourself into all kinds of trouble writing applets in clojure.

Filed under  //   clojure   javascript  

Comments [0]

The disruptor pattern for the uninitiated

The concept of a single-threaded business-logic processor (BLP) which feeds from, and into concurrent and durable disruptors started a large discussion on HN the other day. The LMAX technical paper gives a more elaborate explanation of the architecture but if you can't face reading through it then consider this discussion on StackOverflow.

I can see its utility in a problem domain like trading platforms, but I don't think it's a good solution for embarrassingly parallel applications like signal processing, cryptography etc.

Comments [0]

Better living through fib functions

...or why posting benchmarks of fib a function in x language:

  1. is a slightly puerile approach to discussing performance and scalability
  2. ignores any number of reasons why people would choose node.js, other than throwing out received knowledge for the hell of it

But with the discussion raging (it now appears Haskell is the cure after all) I thought I'd venture a clojure solution:

And the requisite benchmark of questionable utility:

With that we've proved:

  1. A fib function can indeed be written in clojure, and its return value piped to a web browser
  2. Um, that's about it

Presumably the OP will be enraged to find that clojure has its own web server (piggybacking off aleph here). That said I didn't realise retro CGI scripting was so avant-garde in 2011; guess I'm not 'deck' enough to realise but then I don't have a fixie or an assymetric haircut either :/

On a serious note I haven't dabbled with node much because I just don't can't warm to javascript; I find the language constructs native to lisp dialects (for me that's clojure and scheme) much more elegant and alluring, but it's a subjective thing.

I only wrote this post because I was at a loose end but it is surely proof by contradiction that this dialogue achieves nothing. You need more than contrived benchmarks to qualify the shortcomings in a language. Leave them to college kids like me with nothing better to do.

Filed under  //   clojure   diatribe  

Comments [0]

Filed under  //   concurrency   stm  

Comments [0]

SI- Because rvm and rbenv are overkill

It's said that more than any other programming community Lispers tend to dislike working on problems that have already been solved. But this dogmatic avoidance of duplication doesn't seem to have permeated the rubyists' zeitgeist to the same degree.

I'm not in the habit of regularly compiling ruby interpreters for the hell of it- it's something I'd do maybe twice a year tops. I recently rebuilt rubinius and ruby 1.9 for Lion and that'll probably be it for 6 months. If you keep all your interpreters in userland and you aren't such an idiot that you can't install them yourself, how is it that the nebulous concept of a ruby version manager has come into vogue?

Switching interpreters shouldn't be any more complex than having a script to symlink the executable paths for you:

A couple of clarifications:

  • I install all my rubies into ~/.ruby/
  • The current ruby gets all its executables symlinked into ~/.ruby/current/bin
  • I added ~/.ruby/current/bin onto the end of my $PATH
  • That is all

I have it saved in /usr/local/bin/si. SI is short for 'switch interpreter'. A monkey can use it, although none have been known to do so.

The hash of interpreters is hard-coded and I can't even remember what else I did to get it working but that's not the point. It's a 15 minute job I tackled and forgot about 2 years ago. Let's use some initiative, stop writing version managers and do some bloody work.

Filed under  //   diatribe   ruby  

Comments [0]

More scripting with Clojure

The old man wanted to pay $49.99 for some p.o.s shareware app to compare and remove duplicate files in Windows. Seeing this as something of a challenge, I got a stopwatch and went to see what alternative I could come up with in half an hour or so:

Turns out in clojure, the answer is just enough to do the job.

Sometimes less is more (thanks to this post for computing an md5 on a file). And throwing in pmap performs the checksumming in parallel without any extra thought required. Job.

Filed under  //   clojure  

Comments [0]