21 August 2013

Lexical Closures without Garbage Collection

The problem and the solution

When designing my Kitten programming language, I wanted to support lexical closures without the need for a tracing garbage collector. The problem of closure is simple: anonymous functions often use local variables from their enclosing scope; in a memory-safe language, those variables should continue to exist if the inner function escapes, after the outer function exits. Take this classic example of function currying in JavaScript:

// curriedAdd :: Number -> (Number -> Number)
function curriedAdd(x) {
  return function(y) {
    return x + y;
  };
}

var inc = add(1);

Here, x must exist after curriedAdd has exited, in order for us to safely and meaningfully call the resulting anonymous function. Since Kitten has no mutable variables, there is no observable difference between capturing a variable by reference and capturing it by value. So the Kitten compiler takes a definition like this:

def curriedAdd (Int -> (Int -> Int)):
  ->x
  { x + }

def inc (Int -> Int):
  1 curriedAdd @

And rewrites it to copy the value of x into the closure of the anonymous function. In low-level pseudocode, this is what’s happening under the hood:

def curriedAdd:
  setlocal0
  $(local0){ closure0 __add_int }

Now there are no references to the local variable within the inner function—only the value of x is captured when the function object is constructed on the stack. The runtime cost is minimal, and no garbage is generated.

The rest of the story

Saying “by value” and “no garbage” is accurate, but not a complete picture. It would be inefficient to always deep-copy all closed values, particularly if they were to exceed the size of a machine word.

So while integers, booleans, and floating-point numbers are all copied, heap-allocated objects such as vectors can instead be shared using reference counting. Coming from such languages as C++, I like that this preserves deterministic memory behaviour. And in the absence of mutation and lazy evaluation, cycles are impossible, so reference counting alone suffices to eagerly reclaim all garbage.

This transformation is equally valid in other languages, as long as we ignore mutation. You can easily rewrite the JavaScript example above to use explicit closure:

// curriedAdd :: Number -> (Number -> Number)
function curriedAdd(x1) {
  return function(x2, y) {
    return x2 + y;
  }.bind(this, x1);
}

It also works when we add mutation. Here’s a function encapsulating an impure counter from a starting value:

// counter :: Number -> (() -> IO Number)
function counter(x) {
  return function() {
    return x++;
  };
}

var c = counter(0);
console.log(c());    // 0
console.log(c());    // 1
console.log(c());    // 2

We can fake pointers using Array to make this work without lexical closure:

// counter :: Number -> (() -> IO Number)
function counter(x) {
  return function(px) {
    return px[0]++;      // dereference
  }.bind(this, [x]);     // reference
}

However, reference cycles are now possible, so a reference-counting collector would need an auxiliary cycle detector to reclaim all garbage.

Optimisations and future work

Production tracing collectors outperform naïve reference-counting collectors by a wide margin. We can make a reference-counting collector that closes that gap through simple optimisations:

  • Reducing the number of bits used in the reference count—most objects have a small number of references.
  • Creating objects with a zero reference count, and only retaining when the object escapes—the vast majority of objects die in the scope they were born.
  • Eliminating redundant reference manipulation, considering only the net reference change in a basic block—most reference changes do not result in deallocation.

I intend to implement such optimisations in the future, as Kitten evolves. My bet is that as a statically typed, high-level language with low-level semantics, Kitten will someday be suitable for games and simulations. There, deterministic memory behaviour is a big win, since low latency is often more important than high throughput. Still, the ultimate direction of the project remains to be seen—I’m just discovering the language as I go. :)