19 October 2011

Programming Languages Suck at UX

I wish I could say that programming languages are notorious for their terrible usability—unfortunately, very few people seem to have noticed, and those that have noticed don’t seem to be particularly upset about it. Frankly, that pisses me off, so I’m going to rant about it, and hopefully give the problem some of the notoriety it deserves. I’d like to point out that this is by no means an exhaustive list of the (many, many) problems with programming languages!

(If you don’t have the time or desire to read all of the ranting, at least skim the boldfaced points. There’s some good stuff in there.)

The Humanity!

When it comes to any kind of functional design, user experience is not just some tangential concern: it is of the utmost importance. The moment you make a design choice that doesn’t benefit the user, you’re doing it wrong, and your design will be bad. I’ll get more articulate in a minute, but first I want to admonish all language designers. You are wrong and bad.

We’re all guilty of it. No one can design something useful that will satisfy everybody’s expectations. But we can avoid making language design choices that benefit the computer without benefiting the programmer—such choices are nonsensical and absurd.

If you’re not designing for humans, why are you designing at all?

That doesn’t mean we shouldn’t explore new computational paradigms, new ways of reasoning about and optimising programs. What it means is that we should do so in the spirit of making programming easier and more enjoyable, so that developers can focus on the problems that they want to solve, rather than the linguistic obstacles that get in their way.

♦ ♦ ♦

Non-linguistic Languages

I find it baffling that most designers of programming languages give no concern to the language aspect. We are linguistic creatures by nature, and we have strong intuitions about how language is supposed to work.

Larry Wall is a notable (partial) exception to the norm, appropriate considering he studied linguistics back in the day. Perl is what you might call a semi-naturalistic programming language. This has nothing to do with “natural-language programming”, which is largely bunk: natural language itself is not suitable for programming computers any more than Nahuatl is suitable for talking to a Welshman.

A naturalistic language, then, is one that exhibits usability characteristics of natural languages. I’ll wager that everyone’s first languages are either spoken or signed, followed shortly by the written forms (if any) of those languages. A second language can be much easier to learn if it is more similar to your native tongue, and I think this holds for programming languages as well.

We learn new things in terms of what we already know.

One of my favourite things about Perl (well, not Perl 6) is the fact that it has noun classes, inflected with sigils (basically typed dereferencing operators). $x (“the scalar x”) is a completely different thing from @x (“the array x”), which too is a completely different thing from %x (“the hash x”). Agreement in type is analogous to grammatical agreement in number or gender, which is very common in human language.

Different things should look different; related things, related.

Another thing Perl mimics in natural language is implicit reference. $_ is like the pronoun “it”, a default thing, the current subject of discussion, which in many situations can be assumed. Programming languages use explicit reference almost exclusively. In order to perform a series of operations on a value, the programmer must explicitly name that value for every operation. Oddly enough, this is even the case in the highly English-like Inform.

In one of the object-oriented languages of which everyone seems to be so fond, this could be as simple as keeping track of the current object “under discussion” and allowing it to be assumed where an object would otherwise be expected.

Languages should let us elide repetition.

One thing I like about concatenative languages such as Forth and Factor is that there is essentially no named state—you can create variables as a convenience, but at the heart of it, everything is just dataflow.

Computer languages are (characteristically) so wholly unlike and inferior to natural language that it’s almost comical to call them languages at all. You express ideas every day in your native tongue that have no analogue in any programming language in existence. In most languages, you can apply productive rules to derive new terms, whereas basically all programming languages are purely isolating with positional syntax.

Naturalistic programming languages will never be pretty. They are not minimal, or elegant, or simple, but despite all that, they are intuitive and useful. More importantly, they meet users’ expectations about how language is supposed to work.

Humans have expectations. Do not judge them; only exploit them.

All of the “ugly” features of natural languages evolved for specific reasons, and the designs that evolution has wrought can be borrowed to create better designs for our artificial languages.

♦ ♦ ♦

Erroneous Errors

Real-world languages have loads of redundancy, which greatly improves error recovery; a programming language with judicious syntactic redundancy can issue warnings instead of errors for imperfect but understandable input, improving the user’s experience.

If a compiler can deliver detailed diagnostic messages, then it must do so; if there is not enough information to provide a meaningful diagnostic, then it shouldn’t provide any more information than is relevant to the programmer.

If I write int f(X x), where X is an undeclared type, the compiler should not do what GCC does, which is to write the following:

error: expected ‘)’ before ‘x’

This tells me nothing about what is actually wrong, but in an effort to be specific, it gives me misleading specific information. If I were to insert ) before x, I would then get:

error: expected declaration specifiers before ‘x’

Followed by further, largely nonsensical errors that depend on what follows the declaration of f(). It should say either something both specific and helpful, such as:

error: use of undeclared type ‘X’ in parameter list of function ‘f’

Or, if that is not possible, then at least something not so specific that it becomes incorrect:

error: the parameter list of function ‘f’ is invalid

Errors should be useful, or vague enough not to be misleading.

This is not just an implementation issue. Languages are often structured in such a way that it is very difficult to provide meaningful error messages, because constructs are too context-dependent or informationally sparse to obtain a meaningful message from an appropriately narrow and targeted view of the source.

♦ ♦ ♦

Terrible Typography

Most programmers don’t seem to care about legibility. If they did, they would probably be angry that we still use fixed-width fonts for programming. Our programming languages aren’t designed in such a way that they look any good in proportional-width fonts, and monospaced fonts are needed to carry out tabular formatting in editors that don’t support the sanest known way to handle tabulation.

Sigh. At least most of us agree that it’s a good idea to keep code width low. It is a bit sad, though, that we call the rule of thumb the “80-column rule”.

I yearn for the day when we measure code width in ems.

Monospaced fonts arose in the first place due to technical limitations: first, because typewriters could only move a fixed distance per character typed, and second, because it was easier to address characters on a graphical display as a regular grid of fixed-size sprites. Fixed-width fonts are a typographical oddity that survived through tradition and little else.

Programming notation is the way it is primarily because of the fallacy that programming is mathematics. In reality, writing software is also very much like, well, writing.

Programming notation should evoke mathematics and prose alike.

In prose, for example, punctuation merely explicates the structure and organisation of the text, and gives hints as to cadence and prosody. In programming languages, as in mathematics, punctuation abbreviates common operations—but it is also abused to stand in for structures that, in mathematical notation, would ordinarily be indicated with richer formatting.

The formatting problem is a side-effect of the fact that we still live in the Stone Age when it comes to our notational character set. No offense to ASCII, but it was already showing its age in the late 80s when Unicode showed up.

But thanks to the peculiar tenacity of fixed-width typefaces, most programming languages can still be comfortably written on a 50-year-old typewriter. How’s that for backward compatibility?

♦ ♦ ♦

Impossible Input Methods

On account of the limited circa-1960 palette of characters we use in our languages, we’re constantly making notational compromises, approximating glyphs with digraphs and trigraphs. Should “<=” really mean “≤” when it actually looks like “⇐”?

When I tutored computer science, I saw students write => instead of >= many a time, because they expected the conceptual reverse of <= to be the visual reverse of it, as with ≤ and ≥. Similarly, the students frequently confused the left and right sides of assignment—i.e., they would write y = x when they meant x = y—because with “=” there is no visual indication of the direction of assignment, nor of the fact that mutating assignment is not the same as mathematical equality.

Why don’t we use hyphens for hyphenation, minus signs for subtraction, and dashes for ranges, instead of the hyphen-minus for all three? Why do we use straight quotes ("") when curved quotes (“”) can be nested (or contain straight quotes) without escaping? There’s a wealth of time-tested typographical convention in both mathematics and prose that programming language designers simply discard without a second thought. Without a first thought.

Throw away traditions that don’t work; respect those that do.

These compromises would be totally unnecessary thanks to Unicode, but our editors and input methods lag so far behind that it’s still infeasible to comfortably enter many non-ASCII characters without dedicated editor support.

And no such support exists, because a language that uses “->”, which can be written in any editor, is more marketable than one that uses “→” and relies (however little!) on outside tools. On Linux I can use a compose key, which is only a mild inconvenience, but on Windows I’m stuck with Alt codes or Character Map.

We value terseness in a language because it increases the information density of the code, so we can work with more meaning at once, both per screen and per brain. Punctuation symbols are generally the most concise way to express a concept, and some symbolic notation in a language is absolutely good, insofar as it reduces cognitive load and eye movement. But the legibility of a punctuation-heavy language suffers just as greatly as that of one without any punctuation at all.

♦ ♦ ♦

The Fear

Perhaps the biggest problem with programming language design is that, because it is so bad, people are afraid to use tools that can help them. They are afraid of the whitespace sensitivity of Python, because they assume it will violate their expectations, and therefore cause them a hassle. In reality, whitespace sensitivity in Python and Haskell are totally innocuous, and actually quite helpful—because they are designed with some thought behind them.

When you only know the bad, you quietly assume all things are bad.

It’s no wonder people stick so tenaciously to a single language or language family. Their expectations were violated time and again when they were learning how to program, so they assume that learning a new language is always like that—and sadly, it often is. They don’t want their expectations to be violated again, so they stick to what they know. Even if it’s bad, at least it’s familiar.

We need to get rid of that kind of thinking—not just so that language design can move forward, but so that we can quit worrying about languages and get shit done.