A quick preview of the actor API

Since the actor / future API is starting to mature, I thought it'd be nice to show what concurrent programming in JavaScript actually looks like. To begin with, here's the canonical actor ping-pong example; two actors collaboratively incrementing a counter without explicitly sharing any state:

You can also safely hot-swap the message handler inside an actor. When you call upgrade, the supplied function gets pushed onto an internal stack and used as the new handler. Calling downgrade pops it off and uses the previous function (or the original one if you're at the bottom of the stack).

Because message processing is mutually exclusive (any given actor will only process one message in its mailbox at any one time) this is generally quite safe to perform on a live system- no restarts! I say generally because messages passed to actors must be immutable, and neither JavaScript nor Java can enforce that sufficiently strongly to stop the ignorant or the strong-willed from throwing caution to the wind. For now, you just have to be disciplined. But it's a promising start, anyway.

Comments [0]

Rhinode can talk to the internets!

I've held off talking about Rhinode up until now, primarily because going into detail about the actor system, STM and concurrent I/O in JavaScript is all a bit meaningless without a concrete example to flesh it out. Well that's no longer an impediment, because Rhinode now has a structured and extensible (albeit incomplete) event system and a TCP library emulating the core functions from the net module in node.js. What does this mean in practice? For starters it means running this example from the node.js documentation in no longer throws a stack trace:

More profoundly, each 'connection' event gets executed inside an actor whose work can be distributed across Rhinode instances, all transparently to the developer. Put another way, I've hit one of my short-term goals for this little toy project: to provide an almost seamless way for developers to transplant their JS to the JVM, scaling to multiple cores in the process, without having to resort to the somewhat less refined tools currently available to node.js. Happy hacking. Lots more to come...

GitHub repo is here.

Comments [0]

Installing Solaris 11 from a customized PXE server

While the Oracle documentation goes to great lengths explaining the process of setting up a netboot server in Solaris, if you're bootstrapping off a machine running another flavour of UNIX/Linux, this isn't particularly helpful. Ok, go:

The basic strategy is as follows:

  1. Setup the PXE server
  2. Drop the Solaris grub binary, kernel and boot archive into the server
  3. Set up an HTTP server to deliver the remainder of the system

First make sure you have a functional PXE server. This article covers most of the fundamental points, just ignore the bit about dropping a Debian kernel. I'm using dnsmasq for DHCP/DNS resolution and tftpd-hpa for my TFTP server. For the purposes of this exercise I'll assume the root of the TFTP server is /tftpboot.

When you've that done, grab a copy of the Solaris Automated Install .iso (referred to as AI in the Oracle doc). Extract its contents somewhere. Our PXE server's only role is to serve up the customized Oracle grub binary and kernel so the client can load a minimal Solaris subsystem. For that to happen we need to copy boot/platform, boot/grub/pxegrub and boot/grub/menu.lst to the root of the tftp server. The /tftpboot substructure should look something like this:

And the grub config will look something like this (make sure the ip and path for the last argument points to where the system will be downloaded over http):

Make sure dnsmasq knows to serve up grub image for PXE boots. Note that forcing a 150 option down the wire seems to be crucial for SunOS; maybe it was just the machines I tested it on:

Now you need to serve up the bulk of the OS over http. Use whatever web server you prefer (I'm using nginx) and drop extracted .iso folder into the webroot (I renamed it solaris). Double check you can reach the URL referred to in the grub config.

At this stage you should be good to go; the AI installs a basic headless system. If you want gnome, you'll have to run:

I find netbooting straight to grub gives you the flexibility to configure & boot any number of OSes easily without having to mess around with the tftp or dhcp config; just download a netbootable kernel of whatever operating system you need, then add a couple more lines to your grub config.

Comments [0]

DNS resolution using JNDI

This is the kind of area where standing on the shoulders of JDK-giants pays dividends: the JNDI interface for looking up DNS records is surprisingly simple:

Create a context, sling in a domain and record type, and off you go. Another reason why implementing rhinode has involved so little work on my part... I've not looked, but I suspect the original node.js implementation is a little longer than 40 LOC :)

Comments [0]

JavaScript / CSS minification for JRuby

I didn't think I'd have to do this, but after seeing that the ruby-yui-compressor gem forks a process every time, I thought that, whilst using JRuby, that's unequivocally Doing It Wrong™. So I wrapped the YUI Java compressor library in a gem and called it JMinify. If anybody needs to compress their assets and they're using JRuby, knock yourselves out.

Filed under  //   ruby  

Comments [0]

Parallel map: more JRuby concurrency mischief

Like my last post this is more for my future benefit but if anybody else finds it useful then that's cool too. Unlike the last one the fruits of my tinkering yielded a nice linear speedup.

Ok, let's parallelize Array#map. We'll break down the task as follows:

  1. Split the array into chunks
  2. Execute the chunks in asynchronously, in parallel, waiting for them all to complete
  3. Merge the chunks into a new array and return it

How many chunks is optimal? There's no definitive answer; In the past I've opted for a very large number of small sub-arrays, e.g. for concurrent divide & conquer reductions where the minimal array length was some low power of the number of processors (I've played with associative reduction algorithms in the past). For our #pmap method I'm just going to split the original array up into as many chunks as there are logical cores on my machine. How do you find that out in JRuby? Java to the rescue again:

Now we need a pool of workers to assign tasks to. As parallel mapping is strictly CPU bound, a thread pool with fixed thread count but an unbounded work queue is probably most appropriate:

That's our ExecutorService up & running, we just need to do a bit of housekeeping before we can write our Array#pmap method. This is where Java's baroque-complexity boilerplate rears its ugly head (wouldn't it be nice if Executors could take lambdas as arguments for mass invocation?). Basically we implement the Callable interface- I instantiate my Task implementation with a block which the executor calls when it executes:

So now we're good to go. Array#pmap here takes an executor as a first argument because I didn't want the class to be responsible for starting / shutting down the work queue, but that's just an implementation decision.

Once the original array is mapped to a set of Task classes, they can be handed to the executor. I call ExecutorService#invokeAll because it blocks until all the submitted work is done. It returns an array of FutureTasks which can be dereferenced immediately (we want the method to return something!):

So does all that tomfoolery actually buy you any more performance? Time for a highly unscientific benchmark, incrementing an array of the first million Fixnums:

Not bad- I ran it on a Sandy Bridge i5 with 2 cores and 4 CPU threads.

More important than a cheesy parallel mapping imeplementation, I've learned (or is that re-learned?) two axioms about playing with concurrency in JVM hosted languages:

  1. As in the previous post, executors expect some kind of anonymous inner class as an argument in the absence of closures. You need to be aware of the cost of converting a ruby closure to a Java Runnable, Callable, Future etc; think of I/O-bound problems where allocating extra objects for each request/event is almost certainly not a good thing. You'll definitly save time writing in straight Java as you won't have to worry whether or not the closure you just used is going to throw all your performance gains out the window.
  2. Don't discount the JVM's ability to make a fool of your benchmarks. The one above isn't worth the screen real-estate it occupies due to its simplicity and duration relative to startup / shutdown time of the VM. Both JIT and Server modes take time to hit a 'quiescent state' where performance stabilises. Don't just look at wall-time, profile your code properly. Lots has been said about this elsewhere so I won't rehash what others have articulated better, but be aware HotSpot has its own performance quirks.

The example in its entirety is here.

Filed under  //   concurrency   jruby   ruby  

Comments [0]

Easy thread safety with JRuby

EDIT: Beauty, as they say, is pain. In exchange for liberating your code from locks, piling the work onto a queue is approximately ~10x slower than a traditional approach. A quick profile suggests that the expense of block creation is non-trivial. It's still nicer to look at though, right?

More for my benefit, but it's handy that JRuby lets you use asynchronous Java queues. Want to make something like this thread-safe?

Just wrap the assignment in a Runnable (JRuby coerces blocks/Procs/lambdas into Runnables for you) and submit it to a single-threaded Executor. All calls to inc and dec execute one at a time, in-order:

Tasks are applied FIFO, @value stays consistent without locks- sound familiar?

Filed under  //   ruby  

Comments [0]

Notable omissions in the James-Joyce CS Section

Don't get me wrong- the JJ has most of the works you'll need as a CS undergrad, especially if Java's the only thing you're exposed to (although what that says about the reader I leave as an exercise). However it's not that hard to push the boundaries and find a few glaring seminal works either omitted completely or severely lacking in quantity.

It's not sufficient to have a solitary copy of a significant text and 20 copies of a lesser one when the latter is not only objectively better but more likely to age well, a particularly acute problem in a CS section. "Underwater basket-weaving Synergies with Java" or whatever might get mediocre kids through exams in 2011, and maybe that's all the CSI department cares about, in which case I'm wasting my time convincing them that shelling out for a few more copies of SICP every so often would get a decent return on investment. But on the off-chance that they give a shit about something other than producing fodder for cube-farms I'm compiling a list of what I think is currently found wanting.

An omission qualifies as the absence or shortage of physical copy from the shelf; I don't care if they have a link to an electronic copy on the catalogue search, I'm referring to dead trees:

If When I find any more I'll update the list.

Filed under  //   diatribe  

Comments [1]

Avoiding stack overflow in Ruby with trampolines

Languages that don't support tail-call elimination out of the box can make people think twice about using recursion. Ever received a SystemStackError when doing something like this?

Ok, the example's contrived, but you've no doubt come across cases where you'd like to avoid using an iterative solution even if it's just for the sake of elegance. Luckily in Ruby this can be easily fixed by:

  1. Having your method return a thunk rather than a direct tail-call
  2. Using a trampoline to avoid growing the stack

A thunk is essentially just the suspended application of a function. Taking our example, it's just a matter of transforming the tail call like so:

The trampoline implementation is equally trivial. It takes a thunk as an argument and iteratively calls it until something other than a continuation is returned:

So now if we start the ball rolling with our trampoline call:

We can run increment as long as we like without blowing the stack. As there's no base case it'll continue indefinitely- a worst-case example. Using continuation-passing here trades a time-penalty in higher-order function calls for the advantage of not increasing the amount of space needed to perform the computation. Not too hard really, is it?

Alternatively we could use Kernel#callcc to replay the computation until we have a return value. Here we get a continuation object to use with callcc { ... }, give @k the correct execution context by reassigning the thunk as before, then call it until we're done:

If you want to get a proper grasp of CPS you could do a lot worse than get hold of Dybvig's book and go through all the call/cc examples.

Filed under  //   ruby  

Comments [4]

LiveConnect on OS X Lion

For anybody who's ever had to get java <-> javascript interaction working but was wondering why Apple's JDK couldn't find netscape.javascript.JSObject, the elusive plugin.jar file is kept in /Library/Java/Home/lib. If you install it into your local maven repository you can get yourself into all kinds of trouble writing applets in clojure.

Filed under  //   clojure   javascript  

Comments [0]