06 September 2011

Iterators as a Model of Lazy Computation

As I often do, I'm working on a pet language project, which I'll describe in future posts. One of the key concerns for this language is laziness: computation should be deferred as long as possible. For various reasons I'm writing the implementation in C++, so I was looking for a convenient means of expressing lazy computation in an imperative language.

I'm very fond of the work of Alexander Stepanov, who is largely responsible for the popularity of generic programming, one of the driving principles of the C++ standard library. He advocated type-agnostic algorithms expressed in terms of iterators (and iterator ranges) as a means of generically implementing families of operations on any data structure that can be traversed linearly.

It occurred to me that iterators are, for eager imperative languages, an excellent model of lazy computation. An iterator models a lazy function over a list, which can accept or return any number of values it pleases at a time. Composition of these functions is modelled as iterator adaption: composable iterators are those that are constructable from an iterator range of compatible type.

Fundamentally, an iterator is an object that can be dereferenced and advanced. However, in order to be used properly by generic functions in the standard library, C++ iterators must fulfill a number of other requirements. These details are not immediately relevant to the implementation of algorithms, and readability of algorithms expressed as iterators consequently may suffer in C++.

For imperative programmers, it may require some unusual thinking in order to properly decompose algorithms in terms of iterator adaption. At present, the implementation of my language project is decomposed into the following steps, each of which is modelled by an iterator type that is then aliased for convenience.
  • Convert the input stream to a raw bytes.
  • Buffer the bytes to allow backtracking.
  • Interpret the bytes as UTF-8 and produce UTF-32 code points.
  • Partition those code points into tokens.
  • Parse those tokens into terms.
  • Establish an execution context in which to evaluate those terms.
  • Run the interpreter.
All of these operations are composed and executed by just one line of code:

run(interpreter(terms(tokens(utf32(buffered(characters(stream)))))));

Only run() is an imperative operation: it's a function that sends the result of std::copy()ing the iterator range to a no-op output iterator called ignore_iterator. Every other identifier here is an aliased name of an iterator type. In order to evaluate an expression, the minimum number of terms needed to constitute a valid expression is read, and consequently also the minimum number of tokens and characters. Waste not.

The good thing about iterators here is that they strongly encourage you to fully encapsulate all of the state that you use, at least as much as an ordinary function. Iterators must conform to very strict requirements in order to be used with the usual algorithms and adapters, and those that break those expectations will simply not work properly. (Unfortunately in C++ that usually means spectacular crashes.)


The type aliases involved in the above sample are also relatively simple:

typedef std::istreambuf_iterator  <char>       characters;
typedef buffer_iterator           <characters> buffered;
typedef utf8::unchecked::iterator <buffered>   utf32;
typedef lex_iterator              <utf32>      tokens;
typedef term_iterator             <tokens>     terms;
typedef context_iterator          <terms>      interpreter;

You'll note the use of the standard C++ std::istreambuf_iterator as well as utf8::unchecked::iterator from the excellent iterator-centric UTF8-CPP library. The Boost.Iterator library contains a number of general-purpose iterators and adapters, which may be useful depending on your particular application.

Note that each operation must refer to the type of the previous operation: this is another reason I'm not terribly fond of C++ for this. Type inference doesn't apply to constructors, so in order to have the compiler infer the type of the composed iterator, you would have to replace the type aliases with template functions that forward to the constructors of the appropriate types. Ugh. C++ requires entirely way too much machinery to accomplish what ought to be simple operations.

It's equally possible to model interactive applications in this manner. Loops could be represented as cyclical iterators adapting linear sequences to circular ones, and loops with exit conditions could be represented helically. I haven't proved it, but it seems apparent that iterators can model any general recursive algorithm, because at their heart they're just lazy functions plus some state. It also occurs to me that you could use this technique to very easily and efficiently compile a lazy functional language into an imperative one.

Anyway, I hope this gives my nonexistent readers some interesting ideas. If anyone knows of an imperative language that would be good for this sort of thing, do let me know. I grow tired of C++.