Now it gets interesting (Pt.1)

I haven’t done any full-time web development for more than 5 years, and that’s been a conscious decision. Engineering talent-density seems to vary inversely with proximity to customer-facing code. My extremely jaded view of a web developer is someone who scraped through college courses like compiler design or a variant of SICP and subsequently internalised Steve Yegge’s tongue-in-cheek modus-operandum:

You know how to Program. You’re a Programmer. Now you just need to learn APIs and stuff.

This attitude will land you a comfortable job straight out of college, and if you’re semi-articulate and like to shout a lot, you’ll go far. Many workplaces such as consulting firms, ad agencies, e.t.c where customers hit you with predictable, repetitive and discrete workloads need this kind of manpower.  Startups don’t.

That’s unfortunate, because today’s nontrivial web applications are crying out for programmers with a strong CS interest. Things like React or Elm wouldn’t have seen the light of day without people with an interest in FP and persistent data structures. Speaking of React,  you might have noticed the notion of immutability mentioned a lot in the docs. It’s no accident; shared mutable state used to be something that would only haunt people doing heavy concurrent programming [read: not in a browser]. If you’ve ever been unable to use a site on an iPad because it was stuttering so much, you’ll know that modern UI frameworks are doing a hell of a lot of work passing data around the application. You’ve got to conclude that even with sequentially fast interpreters and a simple [read: nonexistent ;)] threading model, the free ride is over for JS developers. You see, mutating objects like this:

var shoppingCart = {
  "apples": 2,
  "oranges": 3

shoppingCart.sh_t = "bananas";

…has a few drawbacks: first up, modern JS interpreters ‘de-virtualize‘ calls to object members when they’re defined: in essence, creating classes with fast member lookup times in the background. Tacking on properties after you’ve defined an object means those generated lookups are either dirty and have to be recreated (expensive), or you have to go through some kind of slow v-table-like lookup (also expensive). Have a look at this for more info on how V8 does it.

Secondly, since you’re altering the value of the same object in-situ, you’ve lost the original state; an observer has no idea what changed and how you can deterministically arrive at this place. In a concurrent environment you’d also need some kind of coordination mechanism so different threads of execution see a correct view of shoppingCart at the right time (consistency + visibility). On top of this, broadcasting changes in mutable state to a potentially HUGE set of observers can be incredibly CPU intensive. Accumulate enough of them, working away, blocking DOM updates and you end up with Jank. Ever had a conversation like- “oh shit, the number of AngularJS watch functions grows linearly with the number of products on the page! We need to rethink our approach before this launches! And I was gonna hit the gym tonight :/”? Not cool. On top of this, passing mutable object references into impure functions will turn your nice application into spaghetti pretty quick. How often do you encounter bugs because some function is silently adding/removing/updating an object member or array value? If it’s like the majority of JS codebases I’ve seen, this probably happens alarmingly frequently.

React pulls in a fundamental tenet of functional programming languages: mutate nothing. In concurrent environments, this obviates the need for programmers to deal with locking mechanisms in their own code. In the browser, the notion of immutability totally removes the concept of subscribing to a changing reference: data objects are treated as values, and work is only done when a value changes.

Why should I care? It’s just one more API to learn, right?

Yeah, I guess you’re right: you can hit up the todo-list example on the React.js site and bullshit your way through another framework. Meanwhile I can attain catharsis by looking for javascript-tagged questions on Stackoverflow and inform the OP that web developers are all morons. Both are fun endeavours but neither are terribly good paths for self-improvement. Instead I was hoping to pique your interest in some more abstract but enduring concepts by dipping into immutable.js and one of its core structures: the Hash Array Mapped Trie.


This highly relevant video explains everything.

You see, if all your functions are deterministic and all your values are immutable, how do you effect change in a program? By creating new values based on your existing ones. People in Java-land may have come across things such as:

ImmutableMap foo = ...
// take foo, add 'new' => 'kvp'
ImmutableMap bar = ImmutableMap
    .put("new", "kvp")

That keeps foo unchanged and is totally nondestructive but incurs a full copy of foo into the new bar, all for the sake of adding a single new key-value pair :/ Do this a few million times in a loop and you’ll get something much slower than the traditional imperative version. So immutability gives us safety + correctness at the expense of performance; can we do anything about that?

This is where immutable, persistent data structures* come in. While plain immutable structures prevent explicit modification outright, persistent ones outwardly allow alteration but internally generate new values, leaving the original untouched, all with good complexity guarantees (i.e. close to a mutable equivalent).  Put another way, imagine a variant of

shoppingCart.sh_t = "bananas";

above which:

  1. Returns a new object containing all of shoppingCart plus the new member
  2. Leaves the old reference of shoppingCart untouched
  3. Has roughly the same of performance of adding a KVP to a mutable object (no O(n) deep copies)

Got it? Good.

When the Clojure programming language had to come up with an immutable but performant alternative to the classic Java HashMap, Rich Hickey based the implementation on a 2001 paper by Phil Bagwell describing a hash structure based on a trie. What’s more relevant in the context of this rant is that Facebook has decided that a JavaScript implementation is feasible to integrate with React and Flux.

So how does a HAMT work?

I’ll get the surprise out of the way: it’s based on hashing like a regular map/dictionary/whatever-you-call-it. The difference is what a HAMT does with that hash. A typical map implementation would look something like this:


Essentially we have a group of buckets and a function which the key is passed through; the output of the function dictates which bucket it lands in. If we have a crappy hash function or not enough buckets, multiple keys end up in the same bucket. We then have to resolve those collisions, commonly by some kind of ‘probing’ strategy where you stick your value in the next consecutive free bucket or have some kind of chain structure inside each bucket. Either way it degrades to an O(n) lookup in the worst case.

HAMTs split the hashed value over a trie (sorry, more data structures). How? If the hash function yields an n-bit number then the bits in it can be partitioned into various t-size chunks, each of which is a node in the trie. Let’s see an example- ECMAScript only specifies one 64-bit wide number type, so we’ll go with n=64 and t=8**:


In part 2 we look at the significance of tries, space-efficient ways of representing them and how to do an HAMT lookup.

* if you’re in an interview situation, asking a candidate to make the distinction between immutable and persistent data structures is one way of seeing if they’ve been doing more than hacking out Spring templates in Eclipse for the last 5 years.

** if you’ve been writing in a language, including JavaScript for more than a couple of years, you should know the size and layout of the fundamental types.

Leave a Reply

Your email address will not be published. Required fields are marked *