12 February 2012

Why Concatenative Programming Matters


There doesn’t seem to be a good tutorial out there for concatenative programming, so I figured I’d write one, inspired by the classic “Why Functional Programming Matters” by John Hughes. With any luck it will get more people interested in the topic, and give me a URL to hand people when they ask what the heck I’m so excited about.

Foremost, there seems to be some disagreement over what the term “concatenative” actually means. This Stack Overflow answer by Norman Ramsey even goes so far as to say:
“…it is not useful to use the word ‘concatenative’ to describe programming languages. This area appears to be a private playground for Manfred von Thun. There is no real definition of what constitutes a concatenative language, and there is no mature theory underlying the idea of a concatenative language.”
This is rather harsh, and, well, wrong. Not entirely wrong, mind, just rather wrong. But it’s not surprising that this kind of misinformation gets around, because concatenative programming isn’t all that well known. (I aim to change that.)

Concatenative programming is so called because it uses function composition instead of function application—a non-concatenative language is thus called applicative. That’s the defining difference, and it’s as “real” a definition as I need, because, well, it’s the one that people are using. Bang. Descriptivism.

One of the problems with functional programming is that the oft-touted advantages—immutability, referential transparency, mathematical purity, &c.—don’t immediately seem to apply in the real world. The reason “Why Functional Programming Matters” was necessary in the first place was that functional programming had been mischaracterised as a paradigm of negatives—no mutation, no side-effects—so everybody knew what you couldn’t do, but few people grasped what you could.
There is a similar problem with concatenative programming. Just look at the Wikipedia introduction:
A concatenative programming language is a point-free programming language in which all expressions denote functions and the juxtaposition of expressions denotes function composition. The combination of a compositional semantics with a syntax that mirrors such a semantics makes concatenative languages highly amenable to algebraic manipulation.
This is all true—and all irrelevant to your immediate problem of why you should care. So in the next sections, I will show you:
  • How concatenative programming works
  • How typing in a concatenative language works
  • How concatenative languages are efficient
  • What concatenative languages are not good for
  • Where to go for more information

Now let’s get started!

♦ ♦ ♦

The Basics

In an applicative language, functions are applied to values to get other values. λ-calculus, the basis of functional languages, formalises application as “β-reduction”, which just says that if you have a function (f x = x + 1) then you can substitute a call to that function (f y) with its result (y + 1). In λ-calculus, simple juxtaposition (f x) denotes application, but function composition must be handled with an explicit composition function:
compose := λf. λg. λx. f (g x)
This definition says “the composition (compose) of two functions (f and g) is the result of applying one (f) to the result of the other (g x)”, which is pretty much a literal definition. Note that this function can only be used to compose functions of a single argument—more on this later.

In concatenative languages, composition is implicit: “f g” is the composition of f and g. However, this does not mean that function application becomes explicit—it actually becomes unnecessary. And as it turns out, this peculiar fact makes these languages a whole lot simpler to build, use, and reason about.

The untyped λ-calculus is a relatively simple system. However, it and the many systems derived from it still need three kinds of term—variables, lambdas, and applications—as well as a number of rules about how to correctly replace variable names when applying functions. You have to deal with name binding, closures, and scope. For a supposedly low-level language, it has quite a bit of inherent complexity.

Concatenative languages have a much simpler basis—there are only functions and compositions, and evaluation is just the simplification of functions. It is never necessary to deal with named state—there are no variables. In a sense, concatenative languages are “more functional” than traditional functional languages! And yet, as we will see, they are also easy to implement efficiently.

♦ ♦ ♦


Let’s say we want to multiply a couple of numbers. In a typical concatenative language, this looks like:
2 3 ×
There are two weird things about this.
First, we use postfix notation (2 3 ×) rather than the prefix notation (× 2 3) that is common in the functional world, or the infix notation (2 × 3) that most languages provide for convenience. There is nothing stopping a concatenative language from having infix operators, but for the sake of consistency, most stick to postfix notation: “f g” means (g ∘ f), i.e., the reverse of function composition. This is actually rather nice notation, because it means that data flows in the order the functions are written in.

Second, we said all terms denote functions—so what the heck are 2 and 3 doing in there? They sure look like values to me! But if you tilt your head a little, you can see them as functions too: values take no arguments and return themselves. If we were to write down the inputs and outputs of these functions, it would look like this:
2 :: () → (int)
3 :: () → (int)
As you may guess, “x :: T” means “x is of type T”, and “T1 → T2” means “a function from type T1 to type T2”. So these functions take no input, and return one integer. We also know the type of the multiplication function, which takes two integers and returns just one:
× :: (int, int) → (int)
Now how do you compose all of these functions together? Remember we said “f g” means the (reverse) composition of f and g, but how can we compose 2 with 3 when their inputs and outputs don’t match up? You can’t pass an integer to a function that takes no arguments.

The solution lies in something called stack polymorphism. Basically, we can give a generic, polymorphic type to these functions that says they’ll take any input, followed by what they actually need. They return the arguments they don’t use, followed by an actual return value:
2 :: ∀A. (A) → (A, int)
3 :: ∀A. (A) → (A, int)
× :: ∀A. (A, int, int) → (A, int)
“∀A.” means “For all A”—in these examples, even if A has commas in it. So now the meaning of the expression “2 3” is clear: it is a function that takes no input and returns both 2 and 3. This works because when we compose two functions, we match up the output of one with the input of the other, so we start with the following definitions:
2 :: ∀A. (A) → (A, int)
3 :: ∀B. (B) → (B, int)
We match up the types:
(A, int) = (B)
By substituting, we get a new polymorphic type for 3 within the expression:
3 :: ∀C. (C, int) → (C, int, int)
This matches the non-polymorphic type:
3 :: ∀A. (A, int) → (A, int, int) = ∀B. B → (B, int)
So the final type of the expression becomes:
2 3 :: ∀A. (A) → (A, int, int)
Going through the same process for multiplication, we get:
2 3 :: ∀A. (A) → (A, int, int)
× :: ∀B. (B, int, int) → (B, int)
A = B
2 3 × :: ∀A. (A) → (A, int)
This is correct: the expression “2 3 ×” takes no input and produces one integer. Whew! As a sanity check, note also that the equivalent function “6” has the same type as “2 3 ×”:
6 :: ∀A. (A) → (A, int)
So already, concatenative languages give us something applicative functional languages generally can’t: we can actually return multiple values from a function, not just tuples. And thanks to stack polymorphism, we have a uniform way to compose functions of different types, so the flow of data in our programs doesn’t get obscured, as it were, by the plumbing.

♦ ♦ ♦

Cool Stuff

In the above example, we worked from left to right, but because composition is associative, you can actually do it in any order. In math terms, (f ∘ g) ∘ h = f ∘ (g ∘ h). Just as “2 3 ×” contains “2 3”, a function returning two integers, it also contains “3 ×”, a function that returns thrice its argument:
3 × :: (int) → (int)
(From now on I’ll omit the ∀ bit from type signatures to make them easier to read.)

So we can already trivially represent partial function application. But this is actually a huge win in another way. Applicative languages need to have a defined associativity for function application (almost always from left to right), but here we’re free from this restriction. A compiler for a statically typed concatenative language could literally:
  1. Divide the program into arbitrary segments
  2. Compile every segment in parallel
  3. Compose all the segments at the end
This is impossible to do with any other type of language. With concatenative programming, a parallel compiler is a plain old map-reduce!

Because all we have are functions and composition, a concatenative program is a single function—typically an impure one with side effects, but that’s by no means a requirement. (You can conceive of a pure, lazy concatenative language with side-effect management along the lines of Haskell.) Because a program is just a function, you can think about composing programs in the same way.

This is the basic reason Unix pipes are so powerful: they form a rudimentary string-based concatenative programming language. You can send the output of one program to another (|); send, receive, and redirect multiple I/O streams (n<, 2&>1); and more. At the end of the day, a concatenative program is all about the flow of data from start to finish. And that again is why concatenative composition is written in “reverse” order—because it’s actually forward:

| 2 |
  |   +---+
  |   | 3 |
  |   +---+
  |     |
  V     V
| *       |

♦ ♦ ♦


So far, I have deliberately stuck to high-level terms that pertain to all concatenative languages, without any details about how they’re actually implemented. One of the very cool things about concatenative languages is that while they are inherently quite functional, they also have a very straightforward and efficient imperative implementation. In fact, concatenative languages are the basis of many things you use every day:
  • The Java Virtual Machine on your PC and mobile phone
  • The CPython bytecode interpreter that powers BitTorrent, Dropbox, and YouTube
  • The PostScript page description language that runs many of the world’s printers
  • The Forth language that started it all, which still enjoys popularity on embedded systems
The type of a concatenative function is formulated so that it takes any number of inputs, uses only the topmost of these, and returns the unused input followed by actual output. These functions are essentially operating on a list-like data structure, one that allows removal and insertion only at one end. And any programmer worth his salt can tell you what that structure is called.

It’s a stack. Yep. Just a stack.

Consider the information flowing between functions in the expression “2 3 × 4 5 × +”, which, if you’re not up on your postfix, is equal to (2 × 3 + 4 × 5):


3(2, 3)
4(6, 4)
5(6, 4, 5)
×(6, 20)

Moving from left to right in the expression, whenever we encounter a “value” (remember: a nullary self-returning function), we push its result to the stack. Whenever we encounter an “operator” (a non-nullary function), we pop its arguments, perform the computation, and push its result. Another name for postfix is reverse Polish notation, which achieved great success in the calculator market on every HP calculator sold between 1968 and 1977—and many thereafter.

So a concatenative language is a functional language that is not only easy, but trivial to run efficiently, so much so that most language VMs are essentially concatenative. x86 relies heavily on a stack for local state, so even C programs have a little bit of concatenativity in ’em, even though x86 machines are register-based.

Furthermore, it’s straightforward to make some very clever optimisations, which are ultimately based on simple pattern matching and replacement. The Factor compiler uses these principles to produce very efficient code. The JVM and CPython VMs, being stack-based, are also in the business of executing and optimising concatenative languages, so the paradigm is far from unresearched. In fact, a sizable portion of all the compiler optimisation research that has ever been done has involved virtual stack machines.

♦ ♦ ♦

Point-free Expressions

It is considered good functional programming style to write functions in point-free form, omitting the unnecessary mention of variables (points) on which the function operates. For example, “f x y = x + y” can be written as “f = (+)”. It is clearer and less repetitious to say “f is the addition function” than “f returns the sum of its two arguments”.

More importantly, it’s more meaningful to write what a function is versus what it does, and point-free functions are more succinct than so-called “pointful” ones. For all these reasons, point-free style is generally considered a Good Thing™.

However, if functional programmers really believe that point-free style is ideal, they shouldn’t be using applicative languages! Let’s say you want to write a function that tells you the number of elements in a list that satisfy a predicate. In Haskell, for instance:

countWhere :: (a -> Bool) -> [a] -> Int
countWhere predicate list = length (filter predicate list)

It’s pretty simple, even if you’re not so familiar with Haskell. countWhere returns the length of the list you get when you filter out elements of a list that don’t satisfy a predicate. Now we can use it like so:

countWhere (>2) [1, 2, 3, 4, 5] == 3

We can write this a couple of ways in point-free style, omitting predicate and list:

countWhere = (length .) . filter
countWhere = (.) (.) (.) length filter

But the meaning of the weird repeated self-application of the composition operator (.) isn’t necessarily obvious. The expression (.) (.) (.)—equivalently (.) . (.) using infix syntax—represents a function that composes a unary function (length) with a binary function (filter). This type of composition is occasionally written .:, with the type you might expect:

(.:) :: (c -> d) -> (a -> b -> c) -> a -> b -> d
(.:) = (.) . (.)

countWhere = length .: filter

But what are we really doing here? In an applicative language, we have to jump through some hoops in order to get the basic concatenative operators we want, and get them to typecheck. When implementing composition in terms of application, we must explicitly implement every type of composition.

In particular, there is no uniform way to compose functions of different numbers of arguments or results. To even get close to that in Haskell, you have to use the curry and uncurry functions to explicitly wrap things up into tuples. No matter what, you need different combinators to compose functions of different types, because Haskell doesn’t have stack polymorphism for function arguments, and it inherently can’t.

Writing point-free expressions demands concatenative semantics, which just aren’t natural in an applicative language. Now contrast a concatenative example:

define countWhere [filter length]
(1, 2, 3, 4, 5) [2 >] countWhere

It’s almost painfully literal: “to count the number of elements in a list that match a predicate is to filter it and take the length”. In a concatenative language, there is no need at all for variables, because all you’re doing is building a machine for data to flow through:

| (1, 2, 3, 4, 5) |
    |               +-------+
    |               | [2 >] |
    |               +-------+
    |                 |
|   |    countWhere   |     |
|   V                 V     |
| +-----------------------+ |
| | filter                | |
| +-----------------------+ |
|   |                       |
|   V                       |
| +--------+                |
| | length |                |
| +--------+                |
|   |                       |

When you’re building a diagram like this, just follow a few simple rules:
  1. Each block is one function
  2. A block takes up as many columns as needed by its inputs or outputs, whichever are more
  3. When adding a block, put it in the rightmost column possible:
    • If it takes no inputs, add a column
    • If there aren’t enough arrows to match the block, the program is ill-typed 

♦ ♦ ♦


Notice the use of brackets for the predicate [2 >] in the preceding example? In addition to composition, the feature that completes a concatenative language is quotation, which allows deferred composition of functions. For example, “2 >” is a function that returns whether its argument is greater than 2, and [2 >] is a function that returns “2 >”.

It’s at this point that we go meta. While just composition lets us build descriptions of dataflow machines, quotation lets us build machines that operate on descriptions of other machines. Quotation eliminates the distinction between code and data, in a simple, type-safe manner.

The “filter” machine mentioned earlier takes in the blueprints for a machine that accepts list values and returns Booleans, and filters a list according to the instructions in those blueprints. Here’s the type signature for it:
filter :: (list T, T → bool) → (list T)
There are all kinds of things you can do with this. You can write a function that applies a quotation to some arguments, without knowing what those arguments are:
apply :: ∀A B. (A, AB) → (B)
You can write a function to compose two quotations into a new one:
compose :: ∀A B C. (AB, BC) → (AC)
And you can write one to convert a function to a quotation that returns it:
quote :: (T) → (() → T)

♦ ♦ ♦

The Dark Side

Say you want to convert a math expression into concatenative form:
f x y z = y2 + x2 − |y|
This has a bit of everything: it mentions one of its inputs multiple times, the order of the variables in the expression doesn’t match the order of the inputs, and one of the inputs is ignored. So we need a function that gives us an extra copy of a value:
dup :: (T) → (T, T)
A function that lets us reorder our arguments:
swap :: (T1, T2) → (T2, T1)
And a function that lets us ignore an argument:
drop :: (T) → ()
From the basic functions we’ve defined so far, we can make some other useful functions, such as one that joins two values into a quotation:
join2 :: (T1, T2) → (() → (T1, T2))
join2 = quote swap quote swap compose
Or a function that rotates its three arguments leftward:
rot3 :: (T1, T2, T3) → (T2, T3, T1)
rot3 = join2 swap quote compose apply
And hey, thanks to quotations, it’s also easy to declare your own control structures:

define true  [[drop apply]]      # Apply the first of two arguments.
define false [[swap drop apply]] # Apply the second of two arguments.
define if    [apply]             # Apply a Boolean to two quotations.

And using them is like—actually, just is—using a regular function:

["2 is still less than 3."]    # "Then" branch.
["Oops, we must be in space."] # "Else" branch.
2 3 <                          # Condition.
if print                       # Print the resulting string.

Those particular definitions for true and false will be familiar to anyone who’s used Booleans in the λ-calculus. A Boolean is a quotation, so it behaves like an ordinary value, but it contains a binary function that chooses one branch and discards another. “If-then-else” is merely the application of that quotation to the particular branches.

Anyway, getting back to the math, we already know the type of our function ((int, int, int) → (int)), we just need to deduce how to get there. If we build a diagram of how the data flows through the expression, we might get this:

   +---+  +---+  +---+
   | x |  | y |  | z |
   +---+  +---+  +---+
     x      y      z
     |      |      |
     |      |      V
     |      |    +------+
     |      |    | drop |
     |      |    +------+
     |      V
     |    +----------+
     |    | dup      |
     |    +----------+
     |      y      y
     |      |      |
     |      |      V
     |      |    +----------+
     |      |    | dup      |
     |      |    +----------+
     |      |      y      y
     |      |      |      |
     |      |      V      V
     |      |    +----------+
     |      |    | *        |
     |      |    +----------+
     |      |     y^2
     |      |      |
     |      V      V
     |    +----------+
     |    | swap     |
     |    +----------+
     |     y^2     y
     |      |      |
     |      |      V
     |      |    +-----+
     |      |    | abs |
     |      |    +-----+
     |      |     |y|
     |      |      |
     V      V      V
   | rot3            |
    y^2    |y|     x
     |      |      |
     |      |      V
     |      |    +----------+
     |      |    | dup      |
     |      |    +----------+
     |      |      x      x
     |      |      |      |
     |      |      V      V
     |      |    +----------+
     |      |    | *        |
     |      |    +----------+
     |      |     x^2
     |      |      |
     |      V      V
     |    +----------+
     |    | swap     |
     |    +----------+
     |     x^2    |y|
     |      |      |
     |      V      V
     |    +----------+
     |    | -        |
     |    +----------+
     |   x^2-|y|
     |      |
     V      V
   | +        |
So our final function can be written:
f = drop dup dup × swap abs rot3 dup × swap − +
Well…that sucked.

♦ ♦ ♦

A Lighter Note

You’ve just seen one of the major problems with concatenative programming—hey, every kind of language has its strengths and weaknesses, but most language designers will lie to you about the latter. Writing seemingly simple math expressions can be difficult and unintuitive, especially using just the functions we’ve seen so far. To do so exposes all of the underlying complexity of the expression that we’re accustomed to deferring to a compiler.

Factor gets around this by introducing a facility for lexically scoped local variables. Some things are simply more natural to write with named state rather than a bunch of stack-shuffling. However, the vast majority of programs are not predominated by mathematical expressions, so in practice this feature is not used very much:
“Out of 38,088 word and method definitions in the source code of Factor and its development environment at the time of this writing, 310 were defined with named parameters.”—Factor: A Dynamic Stack-based Programming Language
One of the great strengths of concatenative languages, however, is their ability to refactor complex expressions. Because every sequence of terms is just a function, you can directly pull out commonly used code into its own function, without even needing to rewrite anything. There are generally no variables to rename, nor state to manage.

So in practice, a lot of expressions can be refactored using a wide array of handy functions, of the sort that commonly end up in a standard library. With some refactoring, that math expression might look like this:
square = dup ×
f = drop [square] [abs] bi − [square] dip +
Which doesn’t look so bad, and actually reads pretty well: the difference between the square and absolute value of the second argument, plus the square of the first. But even that description shows that our mathematical language has evolved as inherently applicative. It’s better sometimes just to stick to tradition.

♦ ♦ ♦

Whither Hence

So you’ve got the gist, and it only took a few dozen mentions of the word “concatenative” to get there. I hope you’re not suffering from semantic satiation.

You’ve seen that concatenative programming is a paradigm like any other, with a real definition and its own pros and cons:

  • Concatenative languages are simple and consistent (“everything is a”)
  • They are amenable to dataflow programming
  • Stack languages are well studied and have good performance
  • You can easily roll your own control structures
  • Everything is written in point-free style (for better or worse)

If you’re interested in trying out a mature, practical concatenative language, check out Factor and the official blog of the creator, Slava Pestov. Also see Cat for more information on static typing in concatenative languages.

I’ve been idly working on a little concatenative language called Kitten off and on for a while now. It’s dynamically typed and compiles to C, so you can run it just about anywhere. I wanted a language I could use for a site on a shared host where installing compilers was irritating. That shows you the extent of my language geekery—I’d rather spend hours writing a language than twenty minutes figuring out how to install GHC on Bluehost.

Anyway, the implementation is just barely complete enough to play around with. Feel free to browse the source, try it out, and offer feedback. You’ll need GHC and GCC to build it, and I imagine it works best on Linux or Unix, but there shouldn’t be any particularly horrible incompatibilities.

This would also be a good time to mention that I’m working on a more serious language called Magnet, which I mentioned in my last article about how programming is borked. It’s principally concatenative, but also has applicative syntax for convenience, and relies heavily on the twin magics of pattern matching and generalised algebraic data types. Hell, half the reason I wrote this article was to provide background for Magnet. So expect more articles about that.

Edit (20 April 2013) The above information is no longer accurate. Kitten is currently being rewritten; the work that was done on Magnet has been incorporated. Kitten is now statically typed and, until the compiler is complete, interpreted.

And that about does it. As always, feel free to email me at to talk at length about anything. Happy coding!

12 November 2011

Languages versus Implementations

In the world of programming languages, there is a clear distinction between the language itself, and the implementation of the language. There is only one abstract Python, but there are many concrete implementations of Python. Only one Haskell, but several implementations thereof.

However, this is a false dichotomy. Each implementation is necessarily different from the others, mainly because implementing an identical specification in identical fashion is boring and doesn’t solve any problems.

Outside of those implementations we make as a learning process, we programmers typically create new implementations of existing languages in search of better performance or better user experience. I don’t expect anyone to agree with me that changing the performance characteristics of a language is the same as a change to the language, but that is what I believe.

More importantly, the implementors of a language will often try to make their implementation stand out as more productive, more useful to the programmer, by adding extensions. Whether or not these extensions are intended for widespread adoption by other implementations, and whether you call them extensions or pragmas or whatever else, the effect is the same. A program that is accepted by one implementation is either not accepted by another, or it runs differently under the different implementations.

The moment this happens, the two implementations are implementing different languages—with a lot of overlap, certainly, but nevertheless distinct. This divergence happens inadvertently as well: because of bugs, one implementation will likely fail or produce incorrect output for a program that another implementation handles just fine, and vice versa. These are still, by the same reasoning, implementing different languages.

So next time you go to accuse someone of conflating the theoretical and the practical, consider that they may simply be ignoring theory for the sake of practicality.

03 November 2011

What’s My Best Time of Day?

Since I find myself quite suddenly in my last year of college, I now realise that I have access to something unique and oddly exciting: statistics about my college career. I decided to take a look at my grade data for the past few years, and see if I couldn’t discern a few things:

  1. How much did my inability to get up early affect my grades?
  2. What are the hours at which I am most capable of productive work?
  3. How much does what I’m learning affect my grades?
  4. In what seasons am I most productive?
  5. What is the best amount of work per week for me?

The results were really interesting. First, I grouped classes by the time the class started, and calculated the average weighted GPA (0.0–4.0) for each group. The chart below shows that I am neither a morning nor an evening person—midday and midafternoon are when I do my best work, but right after lunch, I’m apparently just as useless as in the morning.

Next, I grouped classes by my subjective impression of how much I learned. This was less interesting: like most people, I do best when I’m learning something, but also fine when the work is easy.

Then I discovered something quite counter to my expectations. When I grouped my grades by season of the year, I had thought winter would be my worst quarter, because of the stress and lack of sunlight. It turns out that my grades tell a different story:

I don’t know why this is, but my hunch was that I work better under stress. Autumn and spring are my favourite seasons, when I feel happiest. To test this hypothesis from a different angle, I grouped quarters by number of hours per week of class time.

Apparently, when given a sane amount of work, I do fine. Past a certain point I can’t keep up, but then, beyond that threshold of stress, my performance improves significantly. Really weird stuff.

I hope to be able to apply this information to make my life easier and perhaps get more things done. If you can obtain hard data about your work habits, you can derive a surprising amount of useful information about how you can improve them. That in turn will make it easier for you to be productive. Try it—you’ll like it.

19 October 2011

Programming Languages Suck at UX

I wish I could say that programming languages are notorious for their terrible usability—unfortunately, very few people seem to have noticed, and those that have noticed don’t seem to be particularly upset about it. Frankly, that pisses me off, so I’m going to rant about it, and hopefully give the problem some of the notoriety it deserves. I’d like to point out that this is by no means an exhaustive list of the (many, many) problems with programming languages!

(If you don’t have the time or desire to read all of the ranting, at least skim the boldfaced points. There’s some good stuff in there.)

The Humanity!

When it comes to any kind of functional design, user experience is not just some tangential concern: it is of the utmost importance. The moment you make a design choice that doesn’t benefit the user, you’re doing it wrong, and your design will be bad. I’ll get more articulate in a minute, but first I want to admonish all language designers. You are wrong and bad.

We’re all guilty of it. No one can design something useful that will satisfy everybody’s expectations. But we can avoid making language design choices that benefit the computer without benefiting the programmer—such choices are nonsensical and absurd.

If you’re not designing for humans, why are you designing at all?

That doesn’t mean we shouldn’t explore new computational paradigms, new ways of reasoning about and optimising programs. What it means is that we should do so in the spirit of making programming easier and more enjoyable, so that developers can focus on the problems that they want to solve, rather than the linguistic obstacles that get in their way.

♦ ♦ ♦

Non-linguistic Languages

I find it baffling that most designers of programming languages give no concern to the language aspect. We are linguistic creatures by nature, and we have strong intuitions about how language is supposed to work.

Larry Wall is a notable (partial) exception to the norm, appropriate considering he studied linguistics back in the day. Perl is what you might call a semi-naturalistic programming language. This has nothing to do with “natural-language programming”, which is largely bunk: natural language itself is not suitable for programming computers any more than Nahuatl is suitable for talking to a Welshman.

A naturalistic language, then, is one that exhibits usability characteristics of natural languages. I’ll wager that everyone’s first languages are either spoken or signed, followed shortly by the written forms (if any) of those languages. A second language can be much easier to learn if it is more similar to your native tongue, and I think this holds for programming languages as well.

We learn new things in terms of what we already know.

One of my favourite things about Perl (well, not Perl 6) is the fact that it has noun classes, inflected with sigils (basically typed dereferencing operators). $x (“the scalar x”) is a completely different thing from @x (“the array x”), which too is a completely different thing from %x (“the hash x”). Agreement in type is analogous to grammatical agreement in number or gender, which is very common in human language.

Different things should look different; related things, related.

Another thing Perl mimics in natural language is implicit reference. $_ is like the pronoun “it”, a default thing, the current subject of discussion, which in many situations can be assumed. Programming languages use explicit reference almost exclusively. In order to perform a series of operations on a value, the programmer must explicitly name that value for every operation. Oddly enough, this is even the case in the highly English-like Inform.

In one of the object-oriented languages of which everyone seems to be so fond, this could be as simple as keeping track of the current object “under discussion” and allowing it to be assumed where an object would otherwise be expected.

Languages should let us elide repetition.

One thing I like about concatenative languages such as Forth and Factor is that there is essentially no named state—you can create variables as a convenience, but at the heart of it, everything is just dataflow.

Computer languages are (characteristically) so wholly unlike and inferior to natural language that it’s almost comical to call them languages at all. You express ideas every day in your native tongue that have no analogue in any programming language in existence. In most languages, you can apply productive rules to derive new terms, whereas basically all programming languages are purely isolating with positional syntax.

Naturalistic programming languages will never be pretty. They are not minimal, or elegant, or simple, but despite all that, they are intuitive and useful. More importantly, they meet users’ expectations about how language is supposed to work.

Humans have expectations. Do not judge them; only exploit them.

All of the “ugly” features of natural languages evolved for specific reasons, and the designs that evolution has wrought can be borrowed to create better designs for our artificial languages.

♦ ♦ ♦

Erroneous Errors

Real-world languages have loads of redundancy, which greatly improves error recovery; a programming language with judicious syntactic redundancy can issue warnings instead of errors for imperfect but understandable input, improving the user’s experience.

If a compiler can deliver detailed diagnostic messages, then it must do so; if there is not enough information to provide a meaningful diagnostic, then it shouldn’t provide any more information than is relevant to the programmer.

If I write int f(X x), where X is an undeclared type, the compiler should not do what GCC does, which is to write the following:

error: expected ‘)’ before ‘x’

This tells me nothing about what is actually wrong, but in an effort to be specific, it gives me misleading specific information. If I were to insert ) before x, I would then get:

error: expected declaration specifiers before ‘x’

Followed by further, largely nonsensical errors that depend on what follows the declaration of f(). It should say either something both specific and helpful, such as:

error: use of undeclared type ‘X’ in parameter list of function ‘f’

Or, if that is not possible, then at least something not so specific that it becomes incorrect:

error: the parameter list of function ‘f’ is invalid

Errors should be useful, or vague enough not to be misleading.

This is not just an implementation issue. Languages are often structured in such a way that it is very difficult to provide meaningful error messages, because constructs are too context-dependent or informationally sparse to obtain a meaningful message from an appropriately narrow and targeted view of the source.

♦ ♦ ♦

Terrible Typography

Most programmers don’t seem to care about legibility. If they did, they would probably be angry that we still use fixed-width fonts for programming. Our programming languages aren’t designed in such a way that they look any good in proportional-width fonts, and monospaced fonts are needed to carry out tabular formatting in editors that don’t support the sanest known way to handle tabulation.

Sigh. At least most of us agree that it’s a good idea to keep code width low. It is a bit sad, though, that we call the rule of thumb the “80-column rule”.

I yearn for the day when we measure code width in ems.

Monospaced fonts arose in the first place due to technical limitations: first, because typewriters could only move a fixed distance per character typed, and second, because it was easier to address characters on a graphical display as a regular grid of fixed-size sprites. Fixed-width fonts are a typographical oddity that survived through tradition and little else.

Programming notation is the way it is primarily because of the fallacy that programming is mathematics. In reality, writing software is also very much like, well, writing.

Programming notation should evoke mathematics and prose alike.

In prose, for example, punctuation merely explicates the structure and organisation of the text, and gives hints as to cadence and prosody. In programming languages, as in mathematics, punctuation abbreviates common operations—but it is also abused to stand in for structures that, in mathematical notation, would ordinarily be indicated with richer formatting.

The formatting problem is a side-effect of the fact that we still live in the Stone Age when it comes to our notational character set. No offense to ASCII, but it was already showing its age in the late 80s when Unicode showed up.

But thanks to the peculiar tenacity of fixed-width typefaces, most programming languages can still be comfortably written on a 50-year-old typewriter. How’s that for backward compatibility?

♦ ♦ ♦

Impossible Input Methods

On account of the limited circa-1960 palette of characters we use in our languages, we’re constantly making notational compromises, approximating glyphs with digraphs and trigraphs. Should “<=” really mean “≤” when it actually looks like “⇐”?

When I tutored computer science, I saw students write => instead of >= many a time, because they expected the conceptual reverse of <= to be the visual reverse of it, as with ≤ and ≥. Similarly, the students frequently confused the left and right sides of assignment—i.e., they would write y = x when they meant x = y—because with “=” there is no visual indication of the direction of assignment, nor of the fact that mutating assignment is not the same as mathematical equality.

Why don’t we use hyphens for hyphenation, minus signs for subtraction, and dashes for ranges, instead of the hyphen-minus for all three? Why do we use straight quotes ("") when curved quotes (“”) can be nested (or contain straight quotes) without escaping? There’s a wealth of time-tested typographical convention in both mathematics and prose that programming language designers simply discard without a second thought. Without a first thought.

Throw away traditions that don’t work; respect those that do.

These compromises would be totally unnecessary thanks to Unicode, but our editors and input methods lag so far behind that it’s still infeasible to comfortably enter many non-ASCII characters without dedicated editor support.

And no such support exists, because a language that uses “->”, which can be written in any editor, is more marketable than one that uses “→” and relies (however little!) on outside tools. On Linux I can use a compose key, which is only a mild inconvenience, but on Windows I’m stuck with Alt codes or Character Map.

We value terseness in a language because it increases the information density of the code, so we can work with more meaning at once, both per screen and per brain. Punctuation symbols are generally the most concise way to express a concept, and some symbolic notation in a language is absolutely good, insofar as it reduces cognitive load and eye movement. But the legibility of a punctuation-heavy language suffers just as greatly as that of one without any punctuation at all.

♦ ♦ ♦

The Fear

Perhaps the biggest problem with programming language design is that, because it is so bad, people are afraid to use tools that can help them. They are afraid of the whitespace sensitivity of Python, because they assume it will violate their expectations, and therefore cause them a hassle. In reality, whitespace sensitivity in Python and Haskell are totally innocuous, and actually quite helpful—because they are designed with some thought behind them.

When you only know the bad, you quietly assume all things are bad.

It’s no wonder people stick so tenaciously to a single language or language family. Their expectations were violated time and again when they were learning how to program, so they assume that learning a new language is always like that—and sadly, it often is. They don’t want their expectations to be violated again, so they stick to what they know. Even if it’s bad, at least it’s familiar.

We need to get rid of that kind of thinking—not just so that language design can move forward, but so that we can quit worrying about languages and get shit done.

15 October 2011

Tabs vs. Spaces? I’ll Do You One Better

Now, I don’t ordinarily like to get involved in holy wars, because arguing about programming serves largely to waste time that could be spent actually programming. But I’m making an exception in this case in order to make, shall we say, a modest proposal.

While doing research for an article, I rediscovered elastic tabstops, which are a fairly sane way of managing indentation and tabulation, especially if you like to use proportional-width fonts for programming. Unfortunately, it’s not as widely implemented as it ought to be, and the reference implementation (a gedit plugin) leaves something to be desired in terms of efficiency.

The search for an Emacs implementation (which, sadly, proved fruitless) led me to all kinds of juicy “discussions” concerning the merits of tabs versus spaces. There are many arguments for both, but summarising and responding to them are beyond the scope of this article.

The peculiar thing to me was that people kept stressing the semantic nature of the tab character versus the space character. In typewriter terms, spaces are for moving the cursor a fixed width, while tabs are for moving until the next tabstop.

On a typewriter and in word-processing programs, you can easily change the global positions of the tab stops, but formatting code is a subtler problem, and I think, as long as we’re sticking to tabs versus spaces, elastic tabstops are the best solution.

But who says we must stick to just tabs and spaces? There are plenty of other ASCII control characters that just sit around doing nothing these days; why don’t we repurpose them? As long as we’re thinking semantically, the ASCII control characters offer some interesting possibilities in the way of code formatting and semantic markup:

  • HT (Horizontal Tabulation) would be used only for formatting of tabular data, in the same manner as elastic tabstops—but never for indentation.
  • VT (Vertical Tabulation) would serve to terminate rows in tabular data, allowing table cells to vary in height, e.g., by containing linefeeds.
  • SI (Shift In) and SO (Shift Out) could be repurposed to mark increases and decreases, respectively, in indentation. This has the benefit of not requiring indentation characters to be repeated at the start of every line, but the redundancy of doing so would improve degradation for tools that rely heavily on indentation, such as Python.
  • BS (Backspace) would appear at the beginning of a line to “dedent” code such as a line or case label in C, which is subordinate to its parent, yet not equal to its siblings.
  • FF (Form Feed) could separate logical sections of code, and thus be used like the “region” markers available in many IDEs. You still see some old source files with this convention from time to time.
  • US, RS, and GS (Unit, Record, and Group Separators) could provide further logical division of code, or semantically mark up data types and their related operations.

And that’s just in the C0 range. Consider the possibilities with some control codes from the C1 set:

  • BPH (Break Permitted Here) would indicate where lines should be wrapped before other text-wrapping methods are applied. This would eliminate the need to manually insert linefeeds to break long lines.
  • NBH (No Break Here) like its cousin above, would provide an otherwise absent means of indicating that no line break is to be inserted at the current position.
  • SSA and ESA (Start and End of Selected Area) could be used by editors to save the current selection, preserving editor state in a central location in the file.
  • HTS and VTS (Horizontal and Vertical Tabulation Set) would cause a tabstop to be set at the current position, overriding the automatic behavior of the elastic tabstops.
  • HTJ (Horizontal Tabulation with Justification) would produce a right-aligned rather than left-aligned field in a table.
  • PLD and PLU (Partial Line Down and Up) could automatically produce rich formatting for subscripts and superscripts.

Imagine just throwing a few simple control characters in your files, and being able to interact with beautifully typeset, richly formatted code, with all styling defined by local CSS so you never have to see code formatted in a way you don’t like.

Control characters have the advantage of being immune to interpretation by a lot of tools. Existing editors and compilers would need only minimal changes to accept code with semantic markup. An appropriately designed editor could even convert code on the fly for backward compatibility with unmodified compilers.

I hope you get that I’m not really serious about this—an earnest proposal for something as broad as “inline language-agnostic semantic code markup” would need to take a lot more into consideration than some old telecom codes.

I think many would agree that the idea has some merit, but few would agree on any particular implementation of it. Then again, who knows? Literate programming has its adherents, and that’s not so very different in spirit.

(Oh, and for the love of all that is holy, don’t anybody get the idea that comments containing HTML or LaTeX or whatever would be a good solution. Just try it and see how far you get.)