Why C++: Template metaprogramming (part 1)

Hello! I'm working here at Zeit.io. I don't think there's much to be said about myself other than I'm an avid C++ user. After having had the pleasure of programming in a variety of languages in very different projects, I can safely say that C++ is the one that strikes me the most.

I'd like to take the opportunity and share some of my thoughts and experiences on certain aspects of the language on this blog. All example code will be either directly taken from or is based on real world code (mostly my own) and is primarily aimed at users with at least an intermediate skill level of C++.

Let's cut right to the chase. This post will be about one of my favorite practical examples of the usefulness of template metaprogramming; if you've never heard of that before, I suggest reading up on the topic or watching these videos (part1, part2)

The problem

Imagine the following: you're the author of an application server, written in C++, whose behaviour is supposed to be customized by its users via scripts. You've decided for a scripting language (that could be LUA, Ruby, ...). Naturally you will want to expose a set of functions and data specific to your application to that scripting language, so your users can fully leverage the awesome stuff you've been working on for so long.

Depending on the actual scripting language, the way to do just that varies. Let's focus on functions for now and assume you can register a function with the following C interface with the ScriptEngine:

// se is the instance of the script engine the user is using
// params is an array, the first member being a string that uniquely identifies the function, the remaining members being the parameters the user has passed into the call
result_type MyFunction(ScriptEngine* se, const data_type* params) noexcept;  

The call to actually register this function could look something like this:

// Initialized somewhere
ScriptEngine* se;

// Register MyFunction with the name "MyFunction" (this will be the first member in the "params" array, see above)
se->registerFunction(MyFunction, "MyFunction");  

And the actual implementation of that function:

result_type MyFunction(ScriptEngine* se, const data_type* params) noexcept  
{
  // Now you will have to transform the parameters in "params" in a way that your actual implementation can accept them
  // trivial case!
  auto result = MyFunctionImpl(params[1], params[2], params[3]);
  return to_result_type(result);
}

That doesn't look too bad for now - but now imagine there are about 300 functions you'd like to register this way. You'd need to write 300 wrapper functions (just like MyFunction), possibly list them in a static array together with their unique string identifiers, eventually registering them one by one at runtime in a loop. This is not only tedious and boring but also terribly error-prone.

Possible approaches

Here's a number of possible solutions you may come up with.

switch-statement

Instead of writing one wrapper function for each function you'd like to expose, you register a single wrapper function that contains a massive switch-statement, like this:

result_type MyFunction(ScriptEngine* se, const data_type* params) noexcept  
{
  const char* str = params[0]; // taking the name of the function
  unsigned int hash_val = hash(str); // for easy switching
  switch (hash_val)
  {
    case hash("MyFunction"): // hash is a constexpr function
    {
      // implementation like above
    }
    case hash("OtherFunction"):
      // ...and so on
  }
}

This doesn't solve the problem; you just moved all the annoying wrapper logic somewhere else and added some overhead with the switch-logic along the way.

The "hacker" way

Again, you only write a single wrapper function, in which you will take the parameters and push them onto the stack one by one (according to the calling convention in place), taking the corresponding function address (that you picked from a table using the string identifier as an index) and calling it. Obviously it only works if the data types passed through params are so trivial that they need no conversion logic, so you also need not be aware of the implementation function's signature. It also requires that the params array is (null)-terminated.

This is plain bad (but still fun to implement, nevertheless) and due to the very platform-dependent nature of this, there's not much point in providing a code example. You'll probably have to fiddle with assembly and you will lose all of the safety a static function call provides as well.

Solution using TMP

And here's how to do it the most elegant way (in my opinion): by creating a set of generic function templates that the compiler will use to instantiate a wrapper function for each script function you'd like to expose.

The entire solution is, due to the nature of templates in C++, rather verbose, so I will try to break it down to the most essential parts (without explaining all of the intricate details).

The preparation

First off, we will need some way to reflect on our code - more specifically, the script functions we'd like to expose. To instantiate a wrapper function, we need the following information about it: its function address, its parameter types as well as its return value type. Here's the structures and helper templates that will be used to capture this information:

template<typename T> struct sizeof_void { enum { value = sizeof(T) }; };  
template<> struct sizeof_void<void> { enum { value = 0 }; };

template<typename T, size_t t> struct TypeChar { static_assert(!t, "Unsupported type in variadic type list"); };  
template<typename T> struct TypeChar<T, sizeof(int)> { enum { value = 'i' }; };  
template<> struct TypeChar<double, sizeof(double)> { enum { value = 'f' }; };  
template<> struct TypeChar<char*, sizeof(char*)> { enum { value = 's' }; };  
template<> struct TypeChar<void, sizeof_void<void>::value> { enum { value = 'v' }; };

template<typename... Types>  
struct TypeString {  
    static constexpr char value[sizeof...(Types) + 1] = {
        TypeChar<Types, sizeof(Types)>::value...
    };
};

template<typename R, typename... Types>  
using Function = R(*)(Types...);

struct ScriptIdentity  
{
    Function<void> addr;
    const char* types;
    const char ret;
    const unsigned int numargs;

    template<typename R, typename... Types>
    constexpr ScriptIdentity(Function<R, Types...> addr) : addr(reinterpret_cast<Function<void>>(addr)), types(TypeString<Types...>::value), ret(TypeChar<R, sizeof_void<R>::value>::value), numargs(sizeof(TypeString<Types...>::value) - 1) {}
};

struct ScriptFunction  
{
    const char* name;
    const ScriptIdentity func;

    constexpr ScriptFunction(const char* name, ScriptIdentity func) : name(name), func(func) {}
};

Note that all of this data will exist only at compile-time; it will be instantiated so the compiler can use it in the wrapper function template later on.

There may be other or even better ways to do it, but I decided to encode the above-mentioned information as simple character values. TypeChar is the template that maps a type to a (unique) character that we can use later on. Every possible type that you are using in your script functions should have a TypeChar mapping. Provided above are examples for int, double, char* and void.

TypeString is an array of TypeChar - given a parameter pack Types, it will determine the character for each of the types provided.

ScriptIdentity holds all the information we need: types will contain the characters corresponding to the parameter types of the function, ret is a single character indicating the return type, and numargs is simply the number of parameters. The constructor of ScriptIdentity is templated so to allow extracting the parameter types and return type of a function passed to it as an argument.

And lastly, ScriptFunction adds some meta info about the function to it all that we will provide ourselves - for now, this is just a descriptive name.

Given this implementation, we can declare an array referencing all our script functions:

static constexpr ScriptFunction functions[] {  
    {"MyFunction", MyFunction},
    {"OtherFunction", OtherFunction},
// ...
};

In the next part of my blog post series, I will outline the implementation of the wrapper function template that makes use of this array. The glorious prospect: the only thing that will be necessary to create wrapper functions for any (or at least, the large majority) of your script functions will be to add them to the array above (and recompile). Isn't that exciting?

(Continue reading here)