__cat__enative programming, a somewhat overlooked paradigm of stack-based functional programming languages. My February 2012 blarticle on concatenative programming is an okay introduction for those interested. The language is very much in flux, but at the time of this writing, Kitten looks like this:

```
/* C-style comments */
// Top-level code
99 bottles_of_beer
decl bottles_of_beer (Int ->) // Separate type declarations
def bottles_of_beer: // Optional indentation-based syntax
-> x // Local variables
x verse // Postfix expressions
if x 1 > then:
x-- bottles_of_beer // Recursion instead of iteration
else {}
decl verse (Int ->)
def verse:
-> x
x wall newline
x beer newline
take newline
x-- wall newline
newline
decl beer (Int ->)
def beer:
bottles " of beer" print
decl wall (Int ->)
def wall:
beer " on the wall" print
decl take (->)
def take:
"take one down, pass it around" print
decl bottles (Int ->)
def bottles:
-> x
if x 0 = then:
"no more bottles"
else if x 1 = then:
"one bottle"
else:
x showi " bottles" cat
print
```

## Motivation

The premier concatenative programming language Factor is dynamically typed, alongside dynamically typed cousin Joy, and untyped Forth. There was a statically typed concatenative language Cat but that project is no longer maintained.

So it seems there is room in the world for a statically typed functional stack language. Kitten was originally dynamically typed, primarily because I am lazy, but the intent was always to move to static typing. It seems to me that the main reason that dynamic languages are so widely used—often in situations where they are not appropriate—is that they are so easy to implement. That can give implementors a shorter iteration time on new and useful features, as well as more time to work on documentation and tooling.

Dynamic languages are deceptive. You can absolutely write large software in a dynamic language, but at the point when dynamic types become a liability and static types become really valuable, it’s already too late! In the long run, static types ease maintainability, optimisation, and static reasoning for computers and humans alike. And that’s not to mention the great opportunities for refactoring, analysis, and visualisation tools.

But static types, without type inference, are just a bad joke. My main goal when designing Kitten’s type system was inferability. Having used Haskell for a few years, I’ve grown accustomed to the benefits of type inference, at least of the local variety. The meat of a program simply shouldn’t need type annotations for the compiler to tell you not only

*whether*it is type-correct, but also

*what*the type is of anything you care to ask about. Even simple type deduction such as C++’s

`auto`

or C#’s `var`

is much better than nothing.Concatenative languages pose some unique problems for type inference, and I’d like to share what I’ve learned while implementing static types in Kitten.

## Differences with Dynamic Languages

One of the typical features of dynamically typed concatenative languages is homoiconicity, the property that code and data have a common representation. In concatenative languages, this is the

*quotation*, an analog of Lisp’s list. This is a really powerful feature, and a valuable tool in a dynamic language. There’s a reason that, alongside Lisp, concatenative languages are among the most heavily metaprogrammed in existence. But homoiconicity and static types are basically incompatible kinds of awesome. A function constructed dynamically could have a dynamic effect on the stack, which a static type system can’t hope to make sense of.

Kitten therefore has to differentiate quotations into two categories: functions and vectors. There are separate primitives for function and vector operations, such as composition and concatenation—even though internally these can be implemented the same way.

## The Type System

Kitten’s type system is fairly simple. The base types include Booleans (

`Bool`

), characters (`Char`

), integers (`Int`

), floating-point numbers (`Float`

), homogeneous vectors (`[a]`

), and functions (`a -> b`

).All functions operate on stacks. They consume some number of parameters from their input stack and produce some number of return values, potentially with side effects such as I/O. The juxtaposition of functions denotes their composition—sending the output of one to the input of the next.

`2 +`

is a function of type `Int -> Int`

which adds two to its argument; it consists of the composition of two functions `2`

and `+`

, which have types `-> Int`

and `Int Int -> Int`

respectively.However, functions that actually

*quantify*over stacks pose significant challenges to inference. Functions may contain arbitrary code—of which you want to know the type, because they can be dynamically composed and applied, and those operations should be type-safe. So the basic concatenative

`compose`

combinator needs a type like the following:∀That is, for any stackrstu.r(s→t) (t→u) →r(s→u)

`r`with two functions on top, one of type (

`s`→

`t`) and one of type (

`t`→

`u`), it gives you back the same stack

`r`with a new function on top, of type (

`s`→

`u`). All of the variables in this type are quantifying over multiple types on the stack. This is the type that Cat uses for its

`compose`

combinator. Already this is strictly less powerful than the dynamic equivalent:- You can no longer dynamically compose functions with interesting stack effects:
`{1}`

and`{+}`

are composable, but`{drop}`

and`{drop}`

are not. - The associativity of dynamic composition is broken as a result:
`{a b} {c} compose`

does not necessarily equal`{a} {b c} compose`

.

*total*lunacy, and there’s going to be some collateral damage. So by construction, the

*type*of a dynamic composition must always be known statically.

#2 is more significant, in that the type checker is going to reject some dynamic compositions you might expect to be valid:

`{drop drop} {} compose`

is allowed, but `{drop} {drop} compose`

is not.The problem with quantifying over stacks, however, is that every ordinary function type is inherently polymorphic with respect to the part of the stack that it doesn’t care about.

`+`

in this system would have a type like ∀`r`.

`r`Int Int →

`r`Int. All higher-order functions become

*higher-rank*, and all recursion becomes polymorphic recursion. This makes typechecking significantly more complicated, and type inference undecidable in general.

In light of this, I opted for a simpler approach: model functions as multiple-input, multiple-output, and get rid of such combinators as

`compose`

and `apply`

that need to talk about stack types. These can be replaced with a handful of fixed-arity equivalents: to apply unary functions, binary functions, and so on.The process for composing two function types (

`a`→

`b`) and (

`c`→

`d`) is simple:

- Pair the values produced by
`b`and consumed by`c`, and ensure they match. - Let
`x`be those values consumed by`c`, not produced by`b`. - Let
`y`be those values produced by`b`, not consumed by`c`. - The type of the composed function is
`x + a → y + d`.

## Next Time

In a future post, I’ll talk about some of the more low-level details of Kitten, particularly the design of the upcoming VM. As the language takes shape, I’ll also begin offering tutorials and explanations of how everything comes together to write real-world software. Until then!

Thank you for this writeup! I'm working on a statically typed concatenative language for a learning project (nothing fancy, interpreted only) and haven't found much existing research on the topic. The way you sidestepped higher rank inference is elegant and I'm totally stealing that idea for my project. :)

ReplyDelete