15 February 2017

Lately in Kitten

At the turn of the new year, I resolved to work on Kitten every week of 2017, and so far it’s been going pretty well. With the success of This Week in Rust, I thought I’d make a point to continue documenting my thought processes when designing and implementing language features, rather than leave them in the dark in my notebooks. I hope it’ll be enlightening to fellow language designers, and enjoyable for anyone with an interest in programming languages.

In the News

Someone posted about the language on Hacker News, and it led to some interesting discussions and multiple offers from potential contributors. Matz even tweeted about it:

It was yet another reminder that I need to update the website, since I haven’t done so for about 2 years, and many details of the language have changed significantly in that time. I swear I’ll get around to it eventually. :P

Typechecker Fixes

“Instance checking” refers to verifying an inferred type against a type signature, by checking that the type is at least as polymorphic as the signature. We do this by skolemising the type (replacing all its type variables with fresh unique type constants) and then running a subsumption check. This is largely the same as unification, but it accounts for the fact that a function type is contravariant in its input type: if $mathjax((a \to b) \le (c \to d))$, then $mathjax(b \le d)$ (covariant), but $mathjax(c \le a)$ (contravariant). This is also known as the “generic instance” relation in System F, and is usually spelled $mathjax(\sqsubseteq)$.

Somewhere along the line, instance checking got broken; that meant that the inferred type of a function didn’t have to match its declared type, meaning a total violation of type safety. Fortunately, a “total violation” is normal when you have a typechecker bug—unsoundness can sneak in at any time. Fixing this required some simple tests and logging to track down the issue, which was a trivial case of swapping two variables: the declared and inferred types. D’oh!

Syntactic Additions and Fixes

Kitten uses \name as sugar for { name } when pushing a function to the stack. James Sully noticed that this notation can be generalised to accept any term. This resulted in a nice little notational improvement when passing a partially applied operator to a higher-order function:

// Before
[1, 2, 3] { (* 5) } map

// After
[1, 2, 3] \(* 5) map

The new as (Type, Type, …) operator lets you write the identity function specialised to a certain type, which is useful for documenting tricky code, and avoiding ambiguity in type inference:

// Documenting types
1 2.0 "three" as (Int32, Float64, List<Char>)

// Error: what type do we read the string as before showing it?
"1" read show

// Unambiguous
"1" read as (Int32) show

// Potential error: not clear what type we cast to
1.0 cast

// Unambiguous
1.0 cast as (Float32)

Finally, I’ve enabled support for Unicode versions of many common Kitten symbols; for example, you can write instead of ->. In the future, we might put this behind a feature flag, or at least warn about inconsistent mixing of Unicode symbols and their ASCII approximations.

Simplifying Internals

The compiler is essentially a database server. It takes requests in the form of program fragments from source files or interactive input, then tries to tokenise, parse, and typecheck them. Then it reports any errors, or commits the result of successful compilation to the “dictionary”, a key-value store of all the definitions in the program. Generating an executable is just one way to serialise this dictionary—it will also be used to generate documentation, TAGS files, syntax highlighting information, source maps, and so on.

However, the design of the dictionary grew organically as I needed different features, so it’s essentially just a hash table that the compiler reads and updates directly. I’ve been working on replacing it with a simpler design with only two API endpoints: query and update. In the future, this should make it easier to run the compiler as a language server, to ease editor integration.

I’ve also been trying to simplify the internal representation of program terms (e.g. #168), to make things easier to compile. Kitten’s compiler is somewhat unusual in that the same term representation is used throughout the compilation process—from parsing and desugaring, through typechecking, to code generation. This is feasible because Kitten is essentially just a high-level stack-based assembly language, with a healthy dose of static typing and syntactic sugar.

Some of these refactorings have exposed bugs that will need to be addressed before they can be landed. For example, when a closure captures a variable with a polymorphic type, it’s lifted into a generic definition; the compiler fails to generate an instantiation of it, leading to what amounts to a linking error later on.

I’m going to try to do more of this development in the open, via GitHub PRs, to give potential contributors more visibility into what I’m up to.

Code Generation

The old compiler generated C99 code; the new compiler is moving away from C as a target language, opting instead to generate assembly. This will give us more control over calling conventions and data representations. There is a half-baked backend for our first target architecture, x86-64, which I intend to flesh out in the coming weeks.

Working on the backend should give me reason to polish some of the lower-level bits of Kitten, such as +Unsafe code, copy constructors and destructors, and unboxed closures. The generated executables will still depend on libc for now, but it’s my long-standing goal to eventually require zero nontrivial runtime support.

Removing the Old Compiler

Kitten has been rewritten several times over the course of its 5-year history, as I’ve developed a vision of what exactly I’m trying to build. I’ve been referring to the latest rewrite as “the new compiler” for over a year now, and it’s high time to remove the “old” one. There are only about a dozen minor features and nice-to-haves present in the old compiler that haven’t yet been ported; soon I’ll have the pleasure of red-diffing over 7000 lines of code. :)

Looking Forward

It’s been a long journey already, and there’s so much left to do, but I think we’re on target for a release late this year. I’m always happy to welcome contributors—there’s room for all kinds of people, including compiler developers, testers, documentation writers, and UI designer/developers. If there’s something you’re interested in working on, I’ll gladly help you get up to speed.