(If you don’t have the time or desire to read all of the ranting, at least skim the boldfaced points. There’s some good stuff in there.)
The Humanity!
When it comes to any kind of functional design, user experience is not just some tangential concern: it is of the utmost importance. The moment you make a design choice that doesn’t benefit the user, you’re doing it wrong, and your design will be bad. I’ll get more articulate in a minute, but first I want to admonish all language designers. You are wrong and bad.
We’re all guilty of it. No one can design something useful that will satisfy everybody’s expectations. But we can avoid making language design choices that benefit the computer without benefiting the programmer—such choices are nonsensical and absurd.
♦ If you’re not designing for humans, why are you designing at all? ♦
That doesn’t mean we shouldn’t explore new computational paradigms, new ways of reasoning about and optimising programs. What it means is that we should do so in the spirit of making programming easier and more enjoyable, so that developers can focus on the problems that they want to solve, rather than the linguistic obstacles that get in their way.
♦ ♦ ♦
Non-linguistic Languages
I find it baffling that most designers of programming languages give no concern to the language aspect. We are linguistic creatures by nature, and we have strong intuitions about how language is supposed to work.
Larry Wall is a notable (partial) exception to the norm, appropriate considering he studied linguistics back in the day. Perl is what you might call a semi-naturalistic programming language. This has nothing to do with “natural-language programming”, which is largely bunk: natural language itself is not suitable for programming computers any more than Nahuatl is suitable for talking to a Welshman.
A naturalistic language, then, is one that exhibits usability characteristics of natural languages. I’ll wager that everyone’s first languages are either spoken or signed, followed shortly by the written forms (if any) of those languages. A second language can be much easier to learn if it is more similar to your native tongue, and I think this holds for programming languages as well.
♦ We learn new things in terms of what we already know. ♦
One of my favourite things about Perl (well, not Perl 6) is the fact that it has noun classes, inflected with sigils (basically typed dereferencing operators).
$x
(“the scalar x”) is a completely different thing from @x
(“the array x”), which too is a completely different thing from %x
(“the hash
x”). Agreement in type is analogous to grammatical agreement in number
or gender, which is very common in human language.♦ Different things should look different; related things, related. ♦
Another thing Perl mimics in natural language is implicit reference.
$_
is like the pronoun “it”, a default thing, the current subject of
discussion, which in many situations can be assumed. Programming
languages use explicit reference almost exclusively. In order to perform
a series of operations on a value, the programmer must explicitly name that value for every operation. Oddly enough, this is even the case in the highly English-like Inform.In one of the object-oriented languages of which everyone seems to be so fond, this could be as simple as keeping track of the current object “under discussion” and allowing it to be assumed where an object would otherwise be expected.
♦ Languages should let us elide repetition. ♦
One thing I like about concatenative languages such as Forth and Factor is that there is essentially no named state—you can create variables as a convenience, but at the heart of it, everything is just dataflow.
Computer languages are (characteristically) so wholly unlike and inferior to natural language that it’s almost comical to call them languages at all. You express ideas every day in your native tongue that have no analogue in any programming language in existence. In most languages, you can apply productive rules to derive new terms, whereas basically all programming languages are purely isolating with positional syntax.
Naturalistic programming languages will never be pretty. They are not minimal, or elegant, or simple, but despite all that, they are intuitive and useful. More importantly, they meet users’ expectations about how language is supposed to work.
♦ Humans have expectations. Do not judge them; only exploit them. ♦
All of the “ugly” features of natural languages evolved for specific reasons, and the designs that evolution has wrought can be borrowed to create better designs for our artificial languages.
♦ ♦ ♦
Erroneous Errors
Real-world languages have loads of redundancy, which greatly improves error recovery; a programming language with judicious syntactic redundancy can issue warnings instead of errors for imperfect but understandable input, improving the user’s experience.
If a compiler can deliver detailed diagnostic messages, then it must do so; if there is not enough information to provide a meaningful diagnostic, then it shouldn’t provide any more information than is relevant to the programmer.
If I write
int f(X x)
, where X
is an undeclared type, the compiler should not do what GCC does, which is to write the following:error: expected ‘)’ before ‘x’
This tells me nothing about what is actually wrong, but in an effort to be specific, it gives me misleading specific information. If I were to insert
)
before x
, I would then get:error: expected declaration specifiers before ‘x’
Followed by further, largely nonsensical errors that depend on what follows the declaration of
f()
. It should say either something both specific and helpful, such as:error: use of undeclared type ‘X’ in parameter list of function ‘f’
Or, if that is not possible, then at least something not so specific that it becomes incorrect:
error: the parameter list of function ‘f’ is invalid
♦ Errors should be useful, or vague enough not to be misleading. ♦
This is not just an implementation issue. Languages are often structured in such a way that it is very difficult to provide meaningful error messages, because constructs are too context-dependent or informationally sparse to obtain a meaningful message from an appropriately narrow and targeted view of the source.
♦ ♦ ♦
Terrible Typography
Most programmers don’t seem to care about legibility. If they did, they would probably be angry that we still use fixed-width fonts for programming. Our programming languages aren’t designed in such a way that they look any good in proportional-width fonts, and monospaced fonts are needed to carry out tabular formatting in editors that don’t support the sanest known way to handle tabulation.
Sigh. At least most of us agree that it’s a good idea to keep code width low. It is a bit sad, though, that we call the rule of thumb the “80-column rule”.
♦ I yearn for the day when we measure code width in ems. ♦
Monospaced fonts arose in the first place due to technical limitations: first, because typewriters could only move a fixed distance per character typed, and second, because it was easier to address characters on a graphical display as a regular grid of fixed-size sprites. Fixed-width fonts are a typographical oddity that survived through tradition and little else.
Programming notation is the way it is primarily because of the fallacy that programming is mathematics. In reality, writing software is also very much like, well, writing.
♦ Programming notation should evoke mathematics and prose alike. ♦
In prose, for example, punctuation merely explicates the structure and organisation of the text, and gives hints as to cadence and prosody. In programming languages, as in mathematics, punctuation abbreviates common operations—but it is also abused to stand in for structures that, in mathematical notation, would ordinarily be indicated with richer formatting.
The formatting problem is a side-effect of the fact that we still live in the Stone Age when it comes to our notational character set. No offense to ASCII, but it was already showing its age in the late 80s when Unicode showed up.
But thanks to the peculiar tenacity of fixed-width typefaces, most programming languages can still be comfortably written on a 50-year-old typewriter. How’s that for backward compatibility?
♦ ♦ ♦
Impossible Input Methods
On account of the limited circa-1960 palette of characters we use in our languages, we’re constantly making notational compromises, approximating glyphs with digraphs and trigraphs. Should “
<=
” really mean “≤” when it actually looks like “⇐”?When I tutored computer science, I saw students write
=>
instead of >=
many a time, because they expected the conceptual reverse of <=
to be the visual reverse of it, as with ≤ and ≥. Similarly, the students frequently confused the left and right sides of assignment—i.e., they would write y = x
when they meant x = y
—because with “=
” there is no visual indication of the direction of assignment, nor of the fact that mutating assignment is not the same as mathematical equality.Why don’t we use hyphens for hyphenation, minus signs for subtraction, and dashes for ranges, instead of the hyphen-minus for all three? Why do we use straight quotes (
""
) when curved quotes (“”
) can be nested (or contain straight quotes) without escaping? There’s a wealth of time-tested typographical convention in both mathematics and prose that programming language designers simply discard without a second thought. Without a first thought.♦ Throw away traditions that don’t work; respect those that do. ♦
These compromises would be totally unnecessary thanks to Unicode, but our editors and input methods lag so far behind that it’s still infeasible to comfortably enter many non-ASCII characters without dedicated editor support.
And no such support exists, because a language that uses “
->
”, which can be written in any editor, is more marketable than one that uses “→” and relies (however little!) on outside tools. On Linux I can use a compose key, which is only a mild inconvenience,
but on Windows I’m stuck with Alt codes or Character Map.We value terseness in a language because it increases the information density of the code, so we can work with more meaning at once, both per screen and per brain. Punctuation symbols are generally the most concise way to express a concept, and some symbolic notation in a language is absolutely good, insofar as it reduces cognitive load and eye movement. But the legibility of a punctuation-heavy language suffers just as greatly as that of one without any punctuation at all.
♦ ♦ ♦
The Fear
Perhaps the biggest problem with programming language design is that, because it is so bad, people are afraid to use tools that can help them. They are afraid of the whitespace sensitivity of Python, because they assume it will violate their expectations, and therefore cause them a hassle. In reality, whitespace sensitivity in Python and Haskell are totally innocuous, and actually quite helpful—because they are designed with some thought behind them.
♦ When you only know the bad, you quietly assume all things are bad. ♦
It’s no wonder people stick so tenaciously to a single language or language family. Their expectations were violated time and again when they were learning how to program, so they assume that learning a new language is always like that—and sadly, it often is. They don’t want their expectations to be violated again, so they stick to what they know. Even if it’s bad, at least it’s familiar.
We need to get rid of that kind of thinking—not just so that language design can move forward, but so that we can quit worrying about languages and get shit done.
People who don't know about typography would have difficulty noticing the difference between an en-dash, an em-dash, a hyphen and a minus sign. It would be a *really* bad idea to introduce those distinctions into a language, which may be copied and pasted through dozens of different transmission media, through some mix of Windows 1252, UTF-8, ISO-8859-1; if your code isn't 7-bit clean ASCII, you're asking for problems.
ReplyDeleteThe problem there isn't input (though that's also a problem - contra Compose, the fewer modifier keys we have and need to use, the better), it's heterogeneity of communication media.
On other points, you're just wrong. We try to reduce the number of concepts in languages, not arbitrarily increase them; if you want to apply multiple operations to a single thing without repeating the thing explicitly, the tool you should reach for is function composition, not implicit (global!) arguments. Mentioning Perl in a favourable light is an excellent way to discredit your argument.
But more generally, the function abstraction is how we elide repetition.
And in *all* programming languages, we "apply productive rules to derive new terms" - how exactly do you think a language grammar made up out of productions and non-terminals works? Or are you really sold on the idea of intra-identifier syntax, where you lose even greppability?
@Barry Kelly:
ReplyDeleteYour thinking about encoding is stuck in the past. We should use a modern encoding and update existing tools to handle it. There is no excuse for an editor not to support Unicode nowadays, full stop.
I’m not saying we should arbitrarily increase the number of concepts in a language—and I’m not saying implicit arguments are necessarily a good idea. I like concatenative languages a lot, and mostly because they’re pointfree, so I’m the last person who needs to be told the benefits of function composition.
I mention Perl favourably (kind of) because it’s got a few good ideas behind it. Obviously there are plenty of bad things about it as well, but I didn’t mention them because they weren’t relevant.
I was unclear in my use of the word “term”. What I meant is that every lexeme has only one form, the lemma—programming languages generally have little notion of agreement (except abstractly, in type systems) nor of tense, so there is generally no inflection or derivation.
I disagree with many of the points you've made but don't have the time to fully answer them. I will however point you towards the fortress programming language which you seem to have ignored.
ReplyDelete@YHVH:
ReplyDeleteOkay. You can always email me or comment another time. I’ve been aware of Fortress since the first announcement about it. At the time I disagreed with their Unicode-heavy syntax, but in retrospect I think they did make a few good choices when it comes to separating code content from presentation.
Programming *is* math. I find it strange that you can say this while also mentioning Haskell.
ReplyDeleteWould you also say that legalese sucks at UX?
Have you considered switching your comment box to use a variable-width font?
I used to also think that perl's human-linguistic design was positive, but have been convinced otherwise. Anecdotally, if I don't write perl every day, I find it hardest to go back and read it weeks or months later than any other language.
ReplyDeleteThere is now also research on this point; a recent paper compared learning and retention between perl, a language with "randomly" generated syntax, and a research language designed for learnability. Perl performed no better than the random language.
You may find Rich Hickey's "Simple vs Easy" talk provides some insight on these issues.
@Corbin Simpson:
ReplyDeleteI don’t think programming is strictly mathematics, because it includes a number of similarities to prose. I would say that legalese has bad UX, yes, mainly because it’s a misguided attempt to use natural language itself for formal specification.
Unfortunately I don’t seem to be able to change the font in my comment box, owing to the iframe that Blogger uses.
On "Erroneous Errors": check this out! http://clang.llvm.org/diagnostics.html
ReplyDeleteHow is there zero mention of Ruby in the article when this is exactly the problem Matz was trying to solve when he created Ruby??
ReplyDelete@Anonymous:
ReplyDeleteI’m not a big Ruby user. It is a nice language, though.
Any thoughts about HyperTalk?
ReplyDelete(A lot of people who were new to programming adapted to it quickly, but I never used it.)
Alan Kay was not happy about using the term, "programming language". I never found out what direction he was thinking in. I consider programming languages text-based user interfaces. The service, iwantsandy, is an example of what I mean: http://lifehacker.com/321644/sandys-your-personal-assistant-via-email
(Twitter bought it and shut it down. Many people were sad about that. Same with HyperCard.)
"Agreement in type is analogous to grammatical agreement in number or gender, which is very common in human language."
ReplyDeleteCommon, sure, but desirable? I've tried to learn many languages, and the ones with junk like number and gender are *by far* the hardest to learn.
Arbitrary inconsistent spelling is common in human languages, too, but that doesn't mean it's a good thing. It just means that historical precedent is an extremely strong force with human language: we make schoolchildren suffer through years of memorizing these things rather than simply fix them to be more consistent.
If I had a dollar for every time a non-native English speaker has asked me to explain the difference between "a"/"the"/"some"/...
"In one of the object-oriented languages of which everyone seems to be so fond, this could be as simple as keeping track of the current object “under discussion” and allowing it to be assumed where an object would otherwise be expected."
ReplyDeleteYou mean like how 'this' can be omitted in C++/C#/Java when it is not ambiguous to do so? Or the 'with' statement in Visual Basic?
"You express ideas every day in your native tongue that have no analogue in any programming language in existence."
Humans are great at inferring context and dealing with ambiguities - computers not so much.
"Naturalistic programming languages will never be pretty. They are not minimal, or elegant, or simple, but despite all that, they are intuitive and useful."
You called Perl a naturalistic programming language, yet I find it to be far from intuitive.
"where X is an undeclared type, the compiler should not do what GCC does, which is to write the following"
clang greatly improves error messages like these.
*Should “<=” really mean “≤” when it actually looks like “⇐”?*
FWIW, I can easily type a number of mathematical symbols on my keyboard layout: ∩∪∘→↦∴∵⇒⇔π±≠∑∏÷∞Ø≝∀∈∉×Δ⊆⊂⊃⊃⊇ℕℤℙℚℝℂℍ Yes, its not standard, but I find it most useful to be able to use them in documentation without having to look up unicode tables or anything like that.
"because with “=” there is no visual indication of the direction of assignment, nor of the fact that mutating assignment is not the same as mathematical equality."
I've started to favour using 'x <- y' to assign y to x and 'x = y' to compare x and y in my toy language designs.
"There’s a wealth of time-tested typographical convention in both mathematics and prose"
Thats a very good point actually. Besides your examples I cannot think of any uses for it yet, but then I haven't thought about it really, so I'm sure I'll warm up to it more as I do! Now, how to we get programmers to use keyboard layouts like mine where I can actually type some of these symbols? Sadly, until all this stuff is easy to type, people will avoid your language because it uses weird symbols that are hard or awkward to type. APL used a nice set of symbols, but few people own an APL keyboard anymore...
"but our editors and input methods lag so far behind that it’s still infeasible to comfortably enter many non-ASCII characters without dedicated editor support."
Exactly.
"but on Windows I’m stuck with Alt codes or Character Map."
I mapped them to Alt Gr + Shift + letter. This is easy for me to type since Alt Gr is easy to hit with my right thumb. I use unshifted Alt Gr for symblos that I use A LOT but are annoying to type, like ( and ).
"whitespace sensitivity of Python"
Until recently, I was unsure if I liked this for my own language designs. It never bothered me in Python, but I didn't know if I wanted it in a language I designed. Now I love this idea and try to use it where it makes sense. IMHO, it is because of the whitespace sensitivity that Python often looks a bit like psudocode. Sure, stuff like 'for x in y' helps, but the whitespace is what I think has the largest impact on readability. Unfortunately, I still hear people complain about it the same way people complain about the parentheses in Lisp.
"It’s no wonder people stick so tenaciously to a single language or language family. Their expectations were violated time and again when they were learning how to program, so they assume that learning a new language is always like that"
Completely agree.
You can easily set up a compose key for Windows with WinCompose (https://github.com/SamHocevar/wincompose). I used it all the time when I was writing proofs for my CS degree but didn't want to bother with LaTeX.
ReplyDeleteOne pro for the restriction to using ASCII is that ASCII symbols (!@#$%^&*()?.,:<>) form an alphabet; therefore if someone sees <= then they know how to type it on any device, and they know how to say it aloud to someone else: "greater than, equals". \forall has this benefit, whereas the logographic "∀" is opaque: if you've never learned the symbol, you cannot pronounce it, and I dare say find it harder to remember.
ReplyDeleteYes, I must share this article post with my good friends. You done really nice work. Hope in future you share more good blog like this. Now it's time to avail Limo’s limo service West Palm Beach for more information.
ReplyDelete