03 September 2016

Arithmetic Types in Kitten

In order to motivate myself to work on my programming language, Kitten, I’ve decided to publish my notes on various aspects of the design as I figure them out. This is partly to solicit comments and drum up interest in the project, but mostly an exercise in ensuring my ideas are solid enough to explain clearly.

In these articles, “Kitten” will refer to the “ideal” language and the new work-in-progress compiler that approximates it.

Explicitness

In non-generic definitions, it should always be evident from the code what type of data you’re using. As such, the type of integer and floating-point literals can be specified with a type suffix: i8, i16, …, u8, …, f32, f64; the unadorned integer literal 1 is equivalent to 1i32, and the float literal 1.0 is equivalent to 1.0f64. I think signed 32-bit integers and 64-bit floats are reasonable defaults for common use cases.

In the future, we could relax this by allowing the suffix to be inferred, or by allowing the default suffix to be locally changed.

// Currently fails to typecheck; 3 could be deduced to mean 3i64.
(2i64 + 3)

// Currently not valid syntax; 2 and 3 could be defaulted to Int64.
assume (Int64)
(2 + 3)

Uniformity

Because overloading interacts rather poorly with type inference, it would be easiest for me to provide different operators for different types, such as + for integer addition and +. for floating-point addition, as in OCaml. I did this in the old compiler, and it wasn’t great for usability. I solve this through the use of traits—for each operator there is a single generic definition, with instances provided for all the common arithmetic types.

trait + <T> (T, T -> T)

instance + (Int32, Int32 -> Int32):
  _::kitten::add_int32

instance + (Float64, Float64 -> Float64):
  _::kitten::add_float64

Traits work like template specialisations in C++: if + is inferred to have the type Int32, Int32 -> Int32 at a given call site, then the compiler will emit a call to that particular instance; no generic code makes it into the final executable. And if no such instance exists, you’ll get a straightforward compilation error.

These traits will generally need to be inlined for performance reasons. Luckily, Kitten’s metadata system allows us to specify this easily enough:

about +:
  docs: """
    Adds two values of some type, producing a result of the same type.
    """
  inline: always

The about notation is a dumping ground for all metadata: documentation, examples, tests, operator precedence information, and optimization hints. The idea is that it’s better to have all of this information in a single structured format than to use some combination of magical comments, pragmas, attributes, and otherwise special syntax.

Safety

We want arithmetic operations to be safe, meaning that failure modes such as implicit overflow and trapping should be opt-in.

Therefore, we should provide an arbitrary-precision integer type (Int) as a sane default, as well as an array of signed and unsigned fixed-precision integer types in common sizes: Int8, Int16, Int32, Int64, and their UIntN counterparts.

Kitten has a permission system for reasoning about the permitted side effects of functions. Operations on fixed-precision integers (both signed and unsigned) are checked by default, requiring the +Fail permission to allow them to raise assertion failures in case of overflow.

instance + (Int32, Int32 -> Int32 +Fail) { … }

However, if most arithmetic operations can fail, then +Fail will proliferate through all the type signatures of calling code, which is a real drag, and might encourage people to use unchecked types more liberally than they should. We can solve this in two ways:

  • Make +Fail implicit in type signatures, and introduce a notation -Fail for removing it. This speaks to a more general notion of implicit permissions, which I think would be really valuable: it would let us introduce new implicit permissions in a backward-compatible way, allowing new code to opt out. For example, if we implicitly grant +Alloc to all functions, then they’re permitted to allocate memory on the heap—lists, closures, bignums, and so on. But for performance-critical code, you may want a static guarantee that your code performs no heap allocations, so you can opt out by specifying -Alloc.
  • Introduce a wrapper type, Checked<T>, which provides arithmetic operations that don’t require +Fail, but instead return an Optional<T> as the result. This would mean making the signatures of the arithmetic traits more general, or inlining the Optional into the representation of Checked.

Neither is unequivocally better, and they have different use cases, so I think it makes sense to provide both.

Modifiers

In addition to Checked<T>, Kitten provides wrapper types for different arithmetic modes:

  • Wrapped<T> is a T where overflow is defined to wrap modulo the range of T.
  • Unchecked<T> is a T where overflow is implementation-defined. I’m tempted to make this type require the +Unsafe permission, but that’s intended for operations that can violate memory safety, which Unchecked<T> cannot.
  • Saturating<T> is a T where overflow results in clamping to the range of T; this is useful in some signal-processing applications.

Platform Independence

All the basic arithmetic types should be provided on all platforms. That may require emulation in some cases, such as 64-bit arithmetic on a 32-bit processor. Implementations can provide additional platform-specific types such as Int128 or Float80.

SIMD operations are implemented on container-like types such as Vector4<Float32>; naïve-looking code such as:

vec { (* 2) } map

Should be lowered to the single instruction:

addps %xmm0, %xmm0

This is subtle, and I haven’t worked out all the details yet, but SIMD operations are important enough for performance-critical code that it seems worthwhile to consider them early on.

Licensing Problems

For arbitrary-precision types, I’m in a bit of a bind. I’m not equipped to write a native Kitten bignum library with performance on par with GMP, but because GMP is licensed under LGPL, it can’t be used freely. For instance, we can’t statically link it into generated executables, making distribution more difficult and negatively impacting performance.

TL;DR

  • Kitten has all the usual machine integer and float types that systems programmers love.
  • Literal types are explicit, with some sane defaults: signed Int32 and Float64.
  • Overflow is checked by default, with various type-system features for more control.
  • The standard library provides arbitrary-precision and SIMD operations.