28 September 2011

How to Be Productive

Blindingly Obvious Truths

This is how to be productive. Most of these things are absurdly, staggeringly obvious. That’s kinda my style. Anyway, since not very many of us are the kind that get shit done, we apparently need to be reminded every once in a while. Pin this to your wall—pin it to all of your walls—and look at it to give yourself a mental kick in the ass when you know you should be working.

Get out of bed.

If, like me, you have delayed sleep phase syndrome, then fix it. If it’s hard to get up when the sun comes up, then your sleep schedule is probably not healthy, and it is negatively affecting your productivity. We are a diurnal species. Deal with it.

Be healthy.

It ought to go without saying that you need good diet and exercise habits to be healthy. What you may not realise is that health is happiness, and for many of us, happiness is productivity. (I can’t say all, because surely there are some dark tormented programmers out there whose work is fueled by the darkness and torment in their dark tormented souls of tormentful darkfulness. Or something.)

For instance, when my blood sugar is low, I hate the world—then I eat a sandwich and everything’s fine. If you feel like hell all the time, your body is probably trying to tell you something. Sleep right, eat well, exercise occasionally, spend time with friends, find a mate, and mate with them often.

Work on one thing.

Multitasking is doing many things at once—badly. Innumerable studies have shown that multitaskers are just fooling themselves into believing they’re improving their productivity. You can be a true “jack of all trades” if you work on one thing at a time. Don’t distract yourself from what you’re working on with thoughts of what you’re not working on. It’s an endless cycle of self-subversion that can only be broken with focus.

If I want to write a blarticle, I just do it. If I need to do homework, I just—well, okay, first I complain a lot, but then I just do it. I didn’t post a vlog entry for months, and then I said “no, this will not do” and made something. It was shitty. It doesn’t matter. I did it.

Work till you’re done.

Have a project that needs finishing? Don’t waste your valuable time making small changes. Get your hands dirty and implement that feature you want. No one else is going to do it, and you have the time. It probably won’t even be as hard as you think. If it seems guaranteed to be difficult, you might be fixating on a difficult solution when there is a much simpler one. Reflect.

When you’re done, stop.

You must take time to relax and play. If you work hard, you earn the right to relax hard. You will feel the profound satisfaction that comes from creating something through your own focused efforts, and your sleep will be easy and deep.

Now go, and do.

27 September 2011

Simulation is Abstraction is Power

A more abstract programming language is more powerful. This is one of precious few things that most software developers agree on. C is more powerful than assembly, C++ is more powerful than C, and Lisp is more powerful than most developers realise—or will ever take advantage of. And that’s fine.

But whence cometh abstraction? Sure, you can say that all abstractions are generalisations of some category of operations or structures, but that’s shallow and basically useless. There’s more to it.

Abstraction comes from simulation. Computers run simulations of idealised computation; more abstract languages are better at simulating the ideal. Low-level languages deal with real hardware, while high-level languages simulate ideal hardware. A few examples:

  • References simulate unstructured memory.
  • Garbage collection simulates unbounded memory.
  • Dynamic typing simulates lack of typing.
  • Threads simulate parallel execution.
  • Tail recursion simulates unbounded recursion.

(Depending on how far you want to go with the analogy, recursion simulates a whole bunch of things, including self-reference, fractal structure, and induction.)

Simulation is about removing dependency. A program can depend on manual memory management, but it cannot, by definition, depend on garbage collection. With memory management fully abstracted away, memory management ought to be a nonexistent concept from the perspective of the language.

Your program probably pretends it has infinite memory anyway. Provided your program doesn’t crash, the system does have unbounded memory as far as you know. Most programs that are working properly don’t run out of memory unless they’re deliberately designed to, for lack of a better phrase, fuck shit up.

Similarly, as we all know, threads may actually run on separate cores, and thus truly in parallel—that’s sorta the point. But the thing about a language that offers powerful abstractions for parallelism is that you don’t need to care. It’s taken care of. As far as you need to know, the system can perform an infinite number of computations at once.

Abstractions are (imperfect) simulations of limitlessness. As a language designer, you can (and should) offer pragmatic limitations and safeguards for the benefit of the fallible developer. But to make a more powerful language, you must remove limitations. When you do so, you discover the nature of the underlying simulation—and simulation is abstraction is power.

22 September 2011

Tricky Programming Concepts Aren’t

What Teachers do Wrong

Teachers use subversive language. They say things like “this is a bit tricky” before introducing a concept. It’s intended to remind you to pay attention, but mostly it turns students off—it says “this is hard; I don’t expect you to understand it”.

Frankly, that’s bullshit, and it has appeared—and caused problems—in nearly every class I’ve ever taken. It’s especially visible in mathematics and the sciences, but foreign language, social studies, literature, and even drafting are not immune.

The fine arts, notably, do avoid it: it’s not conceptually difficult to improve your process and observational skills—and no art teacher I’ve ever encountered pretends it will be anything other than hard work and dedication. (Paul Graham got this one right: hacking and painting are very similar.)

I may have an irresponsibly intertwingled worldview, but it seems to me that most things are not conceptually difficult. At all. Even the most high-level concept, explained correctly, can be absorbed and applied by any person of average intelligence. The important thing is the explanation.

Learning should not be about overcoming difficult ideas: it should be about realising how simple those ideas actually are.

I think the main reason that fine art instruction is effective is that, when a student says “I can’t draw/paint/compose/&c.”, the teacher’s natural response is to say “you can—you must; just practice”.

And they do, and they get better. Most people are not going to become master painters or composers, of course, for the same reason that most people won’t become masters of anything. But perhaps more of the people who are capable of mastery would strive for it if we used encouraging, practical teaching methods like we know we ought to.

♦ ♦ ♦

What Programming is About

Programming is fundamentally about doing three things with a problem:

  • Discovery
    First, know what problem you need to solve. Finding interesting problems is half the fun of research. If you don’t mind solving uninteresting problems, you probably aren’t a very interesting programmer, or even very interested in programming.
  • Understanding
    Once you know the problem, you need to understand it deeply. You need to understand its subproblems, and their subproblems, all the way down to the units that make up your problem domain. (This is recursion.)
  • Expression
    When you understand the problem, you can express it. This is not just writing the code you designed earlier, as some “software engineering” evangelists will have you believe. Expression itself contains discovery and understanding, which leads to further expression. (This is tail-recursion. It’s also why the “waterfall” method is flawed.)

This is exactly like the process of essay writing: you discover your thesis, engage in research to understand your topic, then begin the process of expressing yourself in a logically connected essay. While figuring out your expression, you discover and understand further details and incorporate them into further expression.

An essay is ideally just a proof of a thesis; similarly, a program is ideally a solution to an intellectual problem. Both have practical effects in the real world, and at the heart of it, they really are the same thing.

♦ ♦ ♦

“Tricky” Programming Concepts

Two things are consistently taught in such a way that beginning programmers fail to properly understand them. Things that are so stunningly simple and pervasive that it’s baffling that anyone could fail to understand them.

I’m talking of course about pointers and recursion.

What baffles me is how people seem to be so inclined to analogy when the vast majority of programming concepts are stupendously literal.

A pointer points. A reference refers. These are the same thing.

(Don’t the C++ folks in the audience give me any crap about C++ references. They’re not magic, they’re non-nullable immutable pointers—just another reference type.)

The address of a thing is not the thing itself, but it is good enough (and often, in fact, better). I can steal anything in your house, break your windows, burn it down, and all I need to know is where your house is. I can use the words “burn your house down”, and even though I have not actually burnt your house down, you know exactly what I’m referring to.

Reference is a fucking pillar of language. I say “fucking” because it’s fucking important. We use finite-sized symbols to refer to concrete objects and abstract concepts alike, and it takes no more cognitive effort to say the word “apple” than it does to say the word “universe”. (Referring to something is a constant-time operation.)

Most people have no trouble understanding the concept of reference, of course—but then they fail to see why it’s important, least of all in programming. But again, you don’t even need analogies here—the explanations are comically, tautologically simple:

  • A linked list is a list of things that are linked.
  • A tree is a hierarchy of things that are linked.
  • A queue is a queue of things.
  • A stack is a stack of things.

If someone doesn’t understand these things, they’re probably trying too hard. To implement a linked list, just open your eyes and think about what the term means: it is a list, and at each place there must be a thing, and the address of the next thing if there is one.

That’s all.

Recursion is even more important to programming, even less understood by beginners, and yet even more visible in daily life. If you remember one thing, remember this:

References are for referencing, and recursion is for self-referencing.

A mind-squishingly huge number of natural things have fractal structure, from clouds to mountains to plants to coastlines to trees. A branch of a tree looks just like a whole tree, only smaller. Because of this, doing stuff to part of a tree is basically like doing stuff to a whole tree.

You can thus write instructions for doing stuff to entire trees very concisely: to snuggle a whole tree, first snuggle the base of the tree, then snuggle each branch as though it were a whole tree.

This works in less silly examples as well: to sort people by height, pick one, move all the shorter people to his left, and all the taller people to his right, then sort each side. Bam. You’ve just invented quicksort.

♦ ♦ ♦

Don’t Be Fooled

So I think Joel Spolsky got it severely, totally, irresponsibly wrong in his article The Perils of JavaSchools, wherein he basically claims that understanding pointers and recursion is an aptitude, not a skill.

Not only is it the same kind of subversive language that teachers mistakenly use, it betrays a deeply mistaken belief that the concepts of pointers and recursion are somehow inherently difficult.

They are not.

If you think all I’m saying is “programming is not as hard as people make it out to be”, then you’ve missed the point. Programming is hard, like writing and design, because it is intimately related to both. It’s just that the basics of programming are no harder than the basics of any other kind of design.

Pointers and recursion are among the most basic things in the world. The real world, the one in which we live—and that’s why they show up in computing in the first place.

♦ ♦ ♦

My sincerest thanks to all of the kind folks at Hacker News (and now /r/programming!) for reading, upvoting, commenting, and +1ing. Your many praises and criticisms are deeply appreciated, for giving me the motivation and humility to write even better content in the future.

17 September 2011

What’s Wrong with Lisp

Lisp is a brilliant language. You can do things in it that are impossible in other languages, and that’s wicked cool. But though I think there is an upper limit to language expressiveness, I don’t believe Lisp represents that limit. Here’s why.

A quote I see from time to time—and for some reason have been seeing a lot lately—comes from the author of perhaps my favourite book, Le Petit Prince, the marvelous Antoine de Saint-ExupĂ©ry:

“A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away.”

In short, minimalism = perfection, and Lisp (well, Scheme anyway) favours it: uniform syntax, a few special forms, and then it’s turtles all the way up.

But even though I’d be the first to say that there’s a lot to take away from a thorough study of Lisp, there’s also a lot that could be taken away from Lisp itself, and for that reason I can only recommend it as an intermediate step in a programmer’s journey to linguistic nirvana.

See, this may come as a shock, but Lisp is not the most powerful language. Just as there are countless things that are ugly in most languages but mundane (if not beautiful) in Lisp, so too are there things that are ugly in Lisp and beautiful in other languages. They’re easy to see when you go looking for them.

It all boils down to the ability to concisely represent patterns. When a programmer in language X sees repetitive code, he refactors it using the tools available in X. To the programmer of some better language Y, even the most beautiful solution in X is imperfect, because he sees it as a workaround for the fact that X does not have what Y does. A Band-Aid solution, if you will.

Lisp macros are seen as the final word in abstraction and beauty. See repetitive code, make a macro, repeat. But aren’t macros the same kind of Band-Aid solution that exists in every language? Lisp just has nicer Band-Aids. (Ones with spaceships and unicorns on them.) And make no mistake, there is a language with nicer Band-Aids than Lisp.

To paraphrase Paul Graham: when you look down the spectrum of programming language power, you know you’re looking down; but when you look up, all you see are weird languages. Lisp programmers look down on all other languages—but what languages look weird to them?

I advise Lisp programmers to get off their pedestal and look through the weird, the esoteric, and the insane, to reach the power of what lies beyond. Question your assumptions. Why the oh-so-important parentheses? Simple: nesting! So why do you need nesting? To create scope. Why do you need scope? Variables, of course. So why do you need variables? To store values, obviously. What is the purpose of a value? To model state. Why do you need state? You don’t.

If you model everything in terms of data flow, for example, you no longer need state or any of the other things that come with it. You can write your program solely in terms of function composition. You don’t even need scoping (lambda abstraction) or application at all. You don’t need an absurd Haskellian type system to keep things functionally pure, either. 

The point is that lambda calculus is only the beginning. It is (for lack of a better term) the assembly language of functional programming. A language that directly models the operations of a Turing Machine is plainly rather boring, but what baffles me is that nobody seems to notice that the same is true of one that models lambda calculus. Lisp dresses it up, but only very little. We can do much better.

Let us take more away, to reveal what more is yet to be.

15 September 2011

How to Pronounce Hexadecimal Numbers

In 1999, the IEC introduced kibibyte and friends, because they were pissed off that kilobyte means 1000 bytes, but programmers wanted it to mean 1024 bytes and were insistently using it that way, correctness be damned. Unfortunately, nobody really gave a damn, and even though some things (Ubuntu for instance) show data sizes in KiB, nobody actually says “kibibyte”.

Someone asked recently on the Programmers Stack Exchange how hexadecimal numbers ought to be pronounced. In a valiant effort to pull an IEC and come up with a useful thing that nobody will use, here’s my system of pronunciation for hex numbers. The system is based on English (specifically American English) because it’s the lingua franca of programming. Enjoy.

Hexadecimal Number Name Decimal Equivalent
0x0000 0001 One 160 = 1
0x0000 000a A (ay) = ten 10
0x0000 000b B (bee) = eleven 11
0x0000 000c C (cee) = twelve 12
0x0000 0010 Tex 1 × 161 = 16
0x0000 0020 Twentex† = two tex 2 × 161 = 32
0x0000 0030 Thirtex† = three tex 3 × 161 = 48
0x0000 0040 Fortex† = four tex 4 × 161 = 64
0x0000 0050 Fiftex† = five tex 5 × 161 = 80
0x0000 0060 Sixtex = six tex 6 × 161 = 96
0x0000 0070 Seventex = seven tex 7 × 161 = 112
0x0000 0080 Eightex = eight tex 8 × 161 = 128
0x0000 0090 Ninetex = nine tex 9 × 161 = 144
0x0000 00a0 Tentex = ten tex 10 × 161 = 160
0x0000 00b0 Eleventex = eleven tex 11 × 161 = 176
0x0000 0100 One hundrex 162 = 256
0x0000 1000 One thousax 163 = 4096
0x0001 0000 Tex thousax = one thoutex‡ 164 = 65536
0x0010 0000 One hundrex thousax = tex thoutex 165 = 1048576
0x0100 0000 One milliox = one hundrex thoutex 166 = 16777216
0x1000 0000 One billiox†† = one thousax thoutex 169

† By analogy with “thirty”, “forty”, &c.

†† This system is based on the short scale—one billiox equals one thousax milliox, and one trilliox is one thousax billiox.

‡ I introduced the term “thoutex”, equal to 0x10000, because it’s more obvious that 0x0001 0000 is “one thoutex” rather than “tex thousax”: four-digit grouping makes more sense in hex, but three-digit grouping is the default in English, so I had to come up with something, and the hypothetical “thouten” sounded good in my mind to adapt.

Fractions are formed as in English, by adding -th: one texth, one hundrexth, one thousaxth, one thoutexth, &c.

Example: 0xa400f20.cd

  • Ten hundrex fourtex thousax, fifteen hundrex twentex and twelvetex thirteen hundrexths
  • Ay hundrex fourtex thousax, eff hundrex twentex and ceetex dee hundrexths
  • Ay milliox, four hundrex thousax, eff hundrex twentex and ceetex dee hundrexths

Happy pronouncing! :)

14 September 2011

Thoughts on Metaprogramming in Very

I'm currently implementing the metaprogramming system in my pet language, Very. The language is inspired by Forth and Lisp, both of which are known for their strong metaprogramming capabilities. Here's how metaprogramming works in Very.

Very is a stack-based, concatenative language: words (the concatenative analogue of functions) manipulate a stack of values to perform computation. Very macros are words that run at compile-time: while words usually manipulate values, macros usually manipulate code. The interpreter reads and parses the source, then repeatedly expands all macro invocations until none remain, before evaluating the result.

Very also has an analogue of Lisp's reader macros: in Lisp, you can specify a callback function that is executed by the source reader when it encounters some predefined character or group of characters. Unfortunately, reader macros must explicitly operate on a stream object, whereas proper macros operate solely on S-expressions.

A Lisp reader macro can function like an ordinary macro by calling read on the stream it receives, to retrieve S-expressions rather than characters, but this interface is awkward, and offers no facility for performing manipulations at the raw token level.

In Very, you can declare that a particular sequence of characters forms a token, which is then treated by the reader as an indivisible unit. Very reader macros are simply ordinary macros, whose name may have been declared as a token. Macros have the ability to manipulate the incoming queue of source at whatever level they choose—there are words for reading (and unreading) terms, raw tokens, and characters. A character is just a number; a string, a list of numbers; and a term, a number or list of terms. It's that simple.

So Very offers better abstractions than Lisp for source manipulation madness: you could implement any language entirely in Very macros, and have it run under the Very interpreter, and doing so would be a lot less painful than in Lisp. The Very web framework will make extensive use of macros to allow arbitrary HTML and CSS to function as ordinary values, and such goodies as automatic validation will come practically for free.

One thing I've been worried about as a result of all this metaprogramming insanity is how to support syntax highlighting of Very code. Languages such as Very that cannot be parsed without being evaluated aren't exactly amenable to tool support, because writing complete tools requires writing a nearly-complete implementation of the language.

I've decided to offer a compile-time facility for semantic annotation, which at the heart of it would tag regions of source with labels. An editor could run Very code through a preprocessing phase to obtain an annotated and escaped version of the source, which would contain enough information to correctly highlight the code.

Semantic annotations could also be used as compiler hints for static typing, parallelisation, and other optimisations. If possible, I'd like to implement annotations as macros—conversely, it may be possible to implement macros by tagging regular words with a compile-time-word annotation. Will update will my progress on that insanity when I come to it.

13 September 2011

This “Trello” Business

So Fog Creek Software has launched a new thingamajig called Trello. Here are my initial thoughts.

Everything needs a purpose in order to survive. (Even if that purpose is to have no purpose. How Zen.) It should be easy to say, concisely, what your story is about, what your software does, and what your product is for. Why did you make it and why do I care?

So what's Trello about? Projects, basically. That's concise enough that it holds my attention, and doesn't immediately shout “useless gimmick” or “totally misguided”. You get a pretty interface (boards) to what is essentially groups (lists) of lists (cards). I usually have an immediate negative reaction to coinage of proprietary slang, and now is no different: I don't know why I'm organising lists of “cards” on a “board”. The physical analogy just isn't doing it for me.

Anyway, by default, you get three top-level groups, presumably for tasks: future (to-do), present (doing), and past (done). It's an intuitive model that people will like, but, well, it's not unique.

And no goal is worth pursuing unless it's unique. Your product will be successful if people want it, and people usually want things that either haven't been done before, or haven't been done right. Project management solutions have been done. Scads of times, in myriad different ways. So the tacit assumption that Trello makes is that its model for project and team organisation is superior to those of its competitors. (Yes, free services have competitors.)

So how is it different? Trello makes it very easy to collect bundles of junk related to a task, and sort them chronologically. You can add comments, checklists, and files, as well as attach coloured “labels” to tasks. I would say this is where it really shines, but it just doesn't excite me.

(For one thing, the interface is still a tad buggy: when adding items to a checklist in Firefox 6.02, the most recent item replaces the previous item, so only one item displays until the card is closed and reopened. Also, I ran into some synchronisation issues when it came to creating, closing, and reopening boards, as well as archiving—because there's no such thing as “delete” in the Trello world—and unarchiving cards. This mainly consisted of duplicate items and things that were supposed to be gone having some trouble taking the hint. Most things seemed to work fine, however, and I expect the wrinkles will get ironed out soon.)

The real problem is that I don't need it. I can make my own to-do list in whatever format I please. (Emacs org-mode is excellent for this, but any document will do.) I can put that to-do list in a Dropbox folder with whatever other files I'm working on, and share it with whomever I please. I can set up any old task tracker and wiki on my server and get essentially all of the features that I'd use in Trello. And if I didn't have a server, I would use one of the many available freemium services.

To my mind, the only clear advantage to Trello is that it centralises, simplifies, and prettifies these things for, well, non-technical users. And yet I somehow doubt that many of them will need digital project-management software...

In short, I think it's a beautiful thing that will perish for lack of an audience. But I definitely hope to be proven wrong.

07 September 2011

My Current Language Project

I was thinking about redoing my website in anticipation of a game I may or may not be making. None of the languages or frameworks out there at the moment really speak to me, though. I've been reading about Factor lately, and I think it's a superb language. I thought about using it, but there were some things I wanted to add, and some things about its philosophy that I didn't agree with. So I decided to make yet another language for web development.

The code-name of the project is Very, because for some reason I've been fond of naming my projects adverbs lately. It's a dynamically typed, concatenative, lazy language with homoiconic syntax, closely related to Forth and Lisp.

Very has very few builtins: the core special forms include only a few stack combinators (dup, pop, swap, quote, apply, compose) and some operations for introducing definitions. Fewer stack combinators are of course possible, but these represent a good balance of minimalism and efficiency. It's the definition special forms that are particularly interesting.

The def special form is nothing special: it creates a word by binding a name to a quotation that is applied when the word is invoked. This gives you general recursive functions. The extern special form gives you the ability to invoke functions through the C FFI, which is how many secondary builtins are defined.

The token special form is more subtle. By default, only parentheses, quoted strings, and whitespace-delimited identifiers (which include numbers) are recognised by the tokeniser. token modifies the lexer to treat a particular sequence of characters as a single token regardless of any adjacent nonwhitespace characters. This allows a program to introduce new special tokens as it is being read.

The macro special form is where things get really interesting. Macros are words that operate on a stack of terms that have been read since the point of invocation of the macro, allowing them to perform arbitrary operations on the source. Macros have access to the full power of the language, and may do anything that ordinary words can.

In addition, macros have the ability to manipulate the queue of terms that have yet to be read, meaning they are not restricted to the usual postfix syntax. Macros can invoke other words, including macros, directly at compile-time or indirectly through macro expansion.

The macro system will be used to provide high-level, user-friendly syntactic forms that correspond directly to efficient underlying operations, so those used to a more traditional infix imperative syntax will have no trouble, but those interested in working closer to the philosophy of the language are still encouraged to do so.

At the moment, I've got most of the basics in place, and am currently working on macros. I'm sure there will be plenty of examples to show off as I get more done. Until then.

06 September 2011

Iterators as a Model of Lazy Computation

As I often do, I'm working on a pet language project, which I'll describe in future posts. One of the key concerns for this language is laziness: computation should be deferred as long as possible. For various reasons I'm writing the implementation in C++, so I was looking for a convenient means of expressing lazy computation in an imperative language.

I'm very fond of the work of Alexander Stepanov, who is largely responsible for the popularity of generic programming, one of the driving principles of the C++ standard library. He advocated type-agnostic algorithms expressed in terms of iterators (and iterator ranges) as a means of generically implementing families of operations on any data structure that can be traversed linearly.

It occurred to me that iterators are, for eager imperative languages, an excellent model of lazy computation. An iterator models a lazy function over a list, which can accept or return any number of values it pleases at a time. Composition of these functions is modelled as iterator adaption: composable iterators are those that are constructable from an iterator range of compatible type.

Fundamentally, an iterator is an object that can be dereferenced and advanced. However, in order to be used properly by generic functions in the standard library, C++ iterators must fulfill a number of other requirements. These details are not immediately relevant to the implementation of algorithms, and readability of algorithms expressed as iterators consequently may suffer in C++.

For imperative programmers, it may require some unusual thinking in order to properly decompose algorithms in terms of iterator adaption. At present, the implementation of my language project is decomposed into the following steps, each of which is modelled by an iterator type that is then aliased for convenience.
  • Convert the input stream to a raw bytes.
  • Buffer the bytes to allow backtracking.
  • Interpret the bytes as UTF-8 and produce UTF-32 code points.
  • Partition those code points into tokens.
  • Parse those tokens into terms.
  • Establish an execution context in which to evaluate those terms.
  • Run the interpreter.
All of these operations are composed and executed by just one line of code:


Only run() is an imperative operation: it's a function that sends the result of std::copy()ing the iterator range to a no-op output iterator called ignore_iterator. Every other identifier here is an aliased name of an iterator type. In order to evaluate an expression, the minimum number of terms needed to constitute a valid expression is read, and consequently also the minimum number of tokens and characters. Waste not.

The good thing about iterators here is that they strongly encourage you to fully encapsulate all of the state that you use, at least as much as an ordinary function. Iterators must conform to very strict requirements in order to be used with the usual algorithms and adapters, and those that break those expectations will simply not work properly. (Unfortunately in C++ that usually means spectacular crashes.)

The type aliases involved in the above sample are also relatively simple:

typedef std::istreambuf_iterator  <char>       characters;
typedef buffer_iterator           <characters> buffered;
typedef utf8::unchecked::iterator <buffered>   utf32;
typedef lex_iterator              <utf32>      tokens;
typedef term_iterator             <tokens>     terms;
typedef context_iterator          <terms>      interpreter;

You'll note the use of the standard C++ std::istreambuf_iterator as well as utf8::unchecked::iterator from the excellent iterator-centric UTF8-CPP library. The Boost.Iterator library contains a number of general-purpose iterators and adapters, which may be useful depending on your particular application.

Note that each operation must refer to the type of the previous operation: this is another reason I'm not terribly fond of C++ for this. Type inference doesn't apply to constructors, so in order to have the compiler infer the type of the composed iterator, you would have to replace the type aliases with template functions that forward to the constructors of the appropriate types. Ugh. C++ requires entirely way too much machinery to accomplish what ought to be simple operations.

It's equally possible to model interactive applications in this manner. Loops could be represented as cyclical iterators adapting linear sequences to circular ones, and loops with exit conditions could be represented helically. I haven't proved it, but it seems apparent that iterators can model any general recursive algorithm, because at their heart they're just lazy functions plus some state. It also occurs to me that you could use this technique to very easily and efficiently compile a lazy functional language into an imperative one.

Anyway, I hope this gives my nonexistent readers some interesting ideas. If anyone knows of an imperative language that would be good for this sort of thing, do let me know. I grow tired of C++.

How College Has Been Worth It for Me

I've just started my fourth year at RIT, which is to be my last as an undergrad. I feel this puts me in a good position to reminisce, so here are some thoughts on my experiences in college.

First, college has not really been worth it to me academically. I've been programming for a very long time for someone my age, because I like it a lot. I always assumed that when I got to college there would be vast depths of further knowledge to explore in that field. It turned out that there weren't—at least not vast ones. Pretty much everything I've learned about programming lately I've learned in the same way I always have: on my own, because not very many people know or care about the kinds of things I'm interested in.

That's not to say that I haven't gained something by being here, but my knowledge has increased in breadth, not in depth. By far the best classes for me have been Creative Writing and the many Chinese language classes I've taken for my minor. I could have studied these things at any liberal arts college and probably learned just as much, but the fact is that I like these professors quite a lot and now I'm biased.

So how have I benefited from my post-secondary adventures? Socially. I made friends, broke up with a girlfriend who wasn't good for me, started vlogging, got a boyfriend, broke up with said boyfriend, and am now in an open relationship with a girl that I like quite a lot and wouldn't mind being involved with for quite a long time. I'm a lot more comfortable in public than I've ever been before.

So if you're wondering whether college is good for you academically as a self-taught individual: it probably isn't, unless (as I have) you study something completely different than what you've taught yourself already.

But you can't neglect the importance of spending time getting to know people who share your interests, people you'll probably get along with, people you never imagined you would get along with. You'll end up with mates, roommates, flatmates, classmates, and sex mates.

And the thing about humans is that we tend to mate for life.

How to Live on Your Language

So let's say you want to make a living off writing the next popular language. Assume "making a living" means the equivalent of reasonable pay at a full-time job (any job—not necessarily a technical one). If you really love what you do, you'll probably accept a little bit less to do it, so let's say $12.50 an hour, or $2000/month.

While working a stable job, you release your first version and set up a facility for donations. You then engage in continuous development and marketing, to a degree proportional to the number of users of your language. Let's estimate conservatively that 1% of your users will donate, and each of them will donate an average of $1/year.

That means that in order to get your $2000/month salary, you need to have 2000 donating users for each of the 12 months in the year. That's 2.4 million users total. Let's now assume that it takes 10 years for a language to become this popular: you must therefore acquire an average of 240 000 users per year, or 20 000 users per month.

If you're working the equivalent of full time (160 hours/month), your promotion strategy and implementation quality must be sufficient to gain an average of 125 users per hour. And that's repeat users, of course: if 20% of the people who try your language become repeat users, you actually need a conversion rate of 625 people/hour.

Even if every one of the people you convince directly convinces four more people to try your language—and for the sake of simplicity, assuming that they don't go on to try to convince others—then you're still back down to the 125 users/hour figure.

Now, this may seem totally unreasonable, but believe it or not it can still work: say your marketing strategy produces roughly linear growth over the 10-year period during which your language is gaining ground, and then plateaus. That means at the beginning you'll be converting an average of 0 users/hour, and 10 years later you'll be gaining 250 users. (Again, hourly. Perspective, here.)

That's an average increase of 25 users per hour per year: at the end of each year, you're converting 25 more people per hour—or 4000 more people per month—than you were at the start of the year.

So let's revisit that 2.4 million users ballpark: is it feasible to gain that many users in 10 years? If we accept the (inherently flawed, but usable nonetheless) statistics offered by Langpop as accurate, we get the following information about the top 7 languages currently trending through Yahoo search. If one result is accepted as representative of one user (I know, bear with me), these numbers indicate the rounded approximate average number of users gained per year since the language first appeared.
  1. C++: 500k
  2. C: 400k
  3. Java: 700k
  4. PHP: 400k
  5. Perl: 150k
  6. C#: 300k
  7. Python: 150k
This puts things back in the realm of possibility: if you make a language that's as popular as, say, Python, then in 20 years you will have enough users to make development and support (and marketing!) of that language into your full-time job.

Make one as popular as C#, and you can do it in 10. Cool!

…Except of course that putting it that way trivialises the vastly unlikely and difficult undertaking that is making a language so popular. But hey, if you've got a good idea, and you can manage to get to the top entirely on your own, without the contributions of any other developers who would take a cut of your donation money, then you're a genius, and you deserve it.

The original version of this article was posted on Programmers.SE and is duplicated here so I remember where I put it.

05 September 2011

Turing Completeness is not Practical Completeness

As a discussion comparing the relative merits of different programming languages grows longer, the probability of an appeal to Turing completeness as a measure of linguistic equivalence approaches 1. It's practically Godwin's law of programming languages.

Appealing as it is, the notion that whatever is possible in one language is possible in all is simply untrue. Turing completeness refers merely to the ability to implement any computable function. There are plenty of esoteric languages that have no or very limited I/O, and plenty of non-esoteric languages that have little to no ability to interface with libraries and other languages.

The fact that languages do differ in their strengths and weaknesses is precisely the main reason we use different languages at all. Different tools are best suited to different situations. The next time someone commits this fallacy, point them to the concept of the Turing tarpit, and if further investigation doesn't prove to them that theoretical equivalence is not the same as practical equivalence, invite them to implement a simple algorithm such as CRC32 in Brainfuck.