26 August 2010

Macript

As I write, I'm reinstalling Linux because an upgrade fried my ability to boot. After backing up my home directory onto my Windows partition, I decided I needed something to do while downloading installation files. So I was going through some of the little one-off projects that I've done over the past few months, and decided to write about this one.

I call it Macript, a portmanteau of "macro" and "script". It's a short Perl program that reads a source file, scans it for anything that looks like a macro invocation, tests for whether a script by that name exists in the directory where Macript was invoked, and, if it does, expands the invocation to the standard output result of running that script with whatever arguments happen to be given via the macro.

That's pretty much it. And let me say, as a preprocessing step, it's much too useful. I would say that it's entirely changed my build process, but that would be a lie, because I'm pretty stuck in tradition. However, I suspect that a lot of people stand to benefit from a tool like this.

Say you want to perform some text-based program translation for which the ordinary C preprocessor is insufficient, or for which you simply don't want to resort to hairy macros or x-macros. Want to produce forward declarations for all of the functions in a source file? Easy. Want to perform conditional compilation based on the results of a configuration script? Piece of cake. Want to generate compile-time warnings about uses of functions that are marked // DEPRECATED in the source? Trivial.

All kinds of code generation and static analysis tasks become a breeze by adding this one simple preprocessing step to your build. I strongly encourage you to try it out, and I suspect you'll be pleased with the results. The best part is that it's not an enormous investment. Even software that relies heavily on Macript can almost certainly be rewritten to avoid it, but it just happens to automate a few things that can make life a whole heck (or at least a fourth of a heck) of a lot easier.

I promise you'll be able to find it on Sourceforge very soon. Stay tuned!

25 August 2010

Prog, Constify, and Pointer Single Loop

Hello again.

This summer holiday, in the absence of a job, I've spent a lot of time working on Prog, and recently I actually finished a compiler of sorts. It produces an intermediate format suitable for text-based translation (via the C preprocessor, for instance) into any target language that supports, whether directly or indirectly, a high-level translation from Prog. I'm working on the C++ target, and it's currently possible to compile trivial programs such as Hello World and FizzBuzz, among others.

So I didn't make my January release last year, but I will make it long before next year, and I'm happy about that. After all, I'm nineteen years old, and the longer I wait to release a language, the less impressed people will be by my age! I'm kidding, of course. I really just want the damn thing to be done so I can use it and enjoy it, and so that others can do the same.

On to the longer, more peculiar part of the title. I've been browsing Stack Overflow a lot lately—in fact, perhaps more than I should—and a topic came up recently that I thought was interesting enough to write about here.

The question concerned the possibility of a "constify" operation in C++, that would, for the remainder of the duration of the current scope, cause a variable to be treated as though it had been declared const. The original poster wanted to be able to write something like the following:

std::vector<int> v;
v.push_back(1);
v.push_back(2);
v.push_back(3);
constify v;
v.push_back(4); // This is now a compile-time error.

Now, the real use of such an operation doesn't really show in std::vector, especially since C++0x introduces an initializer_list constructor that makes it trivial to initialise a const std::vector. But to be able to initialise a const object by calling mutating methods on that object? I was intrigued.

Obviously the syntax would have to change somewhat. The simplest route was obviously to declare a local const reference bound to the original variable. My original constify macro looked like this:

#define constify(type, id) \
type const& id##_const(id), & id(id##_const)

The first declaration creates a const reference var_const bound to the constified variable var. It then binds another reference named var to var_const. It has to be done this way because type const& id(id); is erroneous, due to the fact that both mentions of id refer to the local variable, not the existing one.

To use this version of the macro, it had to be wrapped in its own scope, such as a loop body or bare block:

std::vector<int> v;
// ...
{
    constify(std::vector<int>, v)
    // ...
}

Using the macro outside a local scope resulted in a duplicate definition of the constified variable. I didn't like the requirement of wrapping the macro in its own scope, so I sought a more elegant solution. Nobody likes trying to mash a block into a macro invocation, so I decided to try turning the macro into something more like an inbuilt C++ construct, that could be used like so:

constify (var) {
    // var is const for the duration of this scope.
}

The first problem was easily solved: how to have the compiler deduce the type of the constified variable. C++0x introduces the decltype keyword, which allowed me to rewrite constify:

#define constify(id) \
decltype(id) const& id##_const(id), & id(id##_const);

So far so good. (NB: In non-C++0x compilers, there is often a typeof extension with a very similar effect.) But how to allow for the clean syntax? The answer was in what I'll call, for lack of a better term and only because I've never seen it before, the Pointer Single Loop idiom.

Basically, the constify macro had to introduce a for loop, which would provide the scope for the local variables var_const and var, while also executing only once. There is no way to introduce a loop control variable of a type unrelated to that of the constified variable, and no way to produce a local scope surrounding the expansion of the macro without also requiring that the loop body be mashed as a parameter into the invocation of constify.

Meet the Pointer Single Loop idiom. It relies on the fact that, for any variable t of type T, this loop executes only once:

for (T* p = &t; p; p = 0) {}

Further, since it's possible to construct a pointer to any type, and since we already have the name of the variable to be constified, it's trivial to construct the final implementation of constify:

#define constify(id) \
for (decltype(id) const& id##_const(id), \
    & id(id##_const), * constify_index = &id; \
    constify_index; constify_index = 0)

Like the earlier implementations of constify, this accepts an id and binds it to a local, immutable id via id_const. The difference is that it uses a Pointer Single Loop to execute the next statement once in the context of the new local variables. It requires only that the variable constify_index be reserved for the purpose of the loop idiom, and works exactly as expected:

std::vector<int> v;

v.push_back(1);
v.push_back(2);
v.push_back(3);

constify (v) {

    v.push_back(4); // This is a compile-time error.

}

This also works if v happens to be declared const already. In all, it was very easy to write, and I imagine that judicious use of C++0x features alongside an idiom of this sort could result in very sensible, readable extensions to C++, which provide real value to the language while retaining compatibility across standards and compilers. I can totally see Boost going for something like this.