29 December 2011

The Web Is Wrong

The Analogies Are Wrong

Originally, web pages were static documents, and web browsers were static document viewers; there was text, some formatting, and images—if you could pay for the bandwidth to serve them. Hyperlinks were the really big thing, because they were the main point of user interaction—but what a powerful thing they were, and still are.

Then along came CGI and Java, and the web was transformed: all of a sudden, a web browser became a way to serve interactive content to users. A web page was no longer just a document, but a portal to a living, breathing world. You yourself could make content that would appear on a page for others to see. That was wicked cool, and it’s still a huge part of what makes the web so compelling today.

The more interactive the web has become, the more it has been used to connect people and their interests—and the less the “document” analogy has applied. At this point, the vast majority of pages that I use on a daily basis are not documents in any real sense: they are applications.

Nobody refers to Twitter, Google+, Facebook, Stack Exchange, Blogger, or even forums and message boards as documents. It’s laughable! Not only has the analogy been stretched so far that it’s absurd, but it has been absurd since about fifteen years ago. But that’s not all.

Human-Readability Is Wrong

Why are we still using text-based document formats for all of our web interaction? HTML is fine for writing documents, but why do we actually transmit it as text? Why do we make software that just writes HTML for us, serving pages to the browser whose source is nigh indecipherable to a human user? What is to be gained from using text-based documents for stylesheets and scripts? Human-readability? Not at all.

Programmers may be surprised to note that the vast majority of users do not ever view the sources of a web page. They don’t know what makes things tick, they don’t care, and even if they did, they wouldn’t be able to decipher it all without experience, especially if the source had been minified.

If you’re not yet convinced that this whole situation is outlandishly, hilariously wrong, let’s talk about minification. Minification is taking a text-based document—where the sole reason for being text-based is human-readability—and compressing it to such an extent that it is no longer human-readable, but still text-based. Now for absolutely no reason whatsoever. Have I made anybody facepalm yet?

Furthermore, a minified document still has to be parsed according to the rules of the unminified language. This means that minification relies on details of the parser, which for permissive HTML and subtle JavaScript could be fragile and dangerous. In addition, the browser must perform all of the parsing and internal compilation it would for unminified source, causing a totally unnecessary performance hit.

The Obvious Solution

At the very least, we should stop serving HTML, CSS, and JavaScript to users. Let’s instead serve things in concise, binary format—compiled documents and stylesheets, and, by far the most important…

Compiled JavaScript that runs in a standard stack-based VM, which, let me stress, could easily be targeted by other languages. Because as if the situation weren’t mad enough, nowadays we write languages that compile to JavaScript, just so that they can run in a web browser.

By using binary formats, browsers will be exchanging less data with servers in order to convey the same information, improving performance. Web page performance is an important consideration: if your page takes too long to load, no one will bother waiting for it to load, and you’ll lose traffic.

Serving binary data is just as reliable as serving textual data. The direct source of a page won’t be human-readable, but so what? The source of the desktop applications you use isn’t typically human-readable unless the source is open and you seek it out. What’s wrong with using the same principle for Web pages? And the compiled version will contain the same data, so you will always be able to see decompiled HTML or CSS if you want it.

The Non-Obvious Solution

But there is, in my opinion, an even better solution than that. In order to make a standard-issue web application, you must deal with a minimum of four languages: HTML, CSS, JavaScript, and a server-side language such as Perl, or Haskell, or whatever you like, really, as long as you can install a toolchain on your server.

But even if you don’t write one or more of these directly, you must still contend with them, and that becomes problematic when the abstractions leak. This is too complicated. Beginners need consistency, and the majority of developers are always going to be beginners, so we have no choice but to help them.

The multitude of languages is good, in a way: each language ostensibly serves a single purpose. HTML is for structure and content; CSS, for presentation; JavaScript, frontend interaction; and the server-side language, backend interaction. And the modularity that this brings is, in a way, a good thing. However, you can attain modularity in many sane ways, without defining complete domain-specific languages.

Moreover, you can do it without necessitating that code in one language be composed and transmitted by another language as text, which is awkward and unsafe. Injection attacks are ubiquitous for a reason.

So what if, instead of serving content in three languages (plus images and videos and what have you), we were to use just one language for presentation, styling, and interaction alike? A page would be served as a compiled bundle, and the browser would just run it in a protected VM. Content that doesn’t need to be loaded asynchronously could just be embedded in the bundle.

You could then take it a step further: web applications could be treated as sources of interactive, structured information. They could be be queried in structured fashion, vastly simplifying content scrapers; or composed with one another and with desktop applications, using something analogous to pipes. We could create powerful distributed applications, or share data across our networks however we like.

Final Remarks

This is, I believe, an important problem, and one I’m actively trying to solve. If enough people demonstrate interest, maybe we can make this a reality, and give the web the overhaul it needs. Because, at the moment, the web is just plain wrong.