A Tale of Two Binding Mechanisms: Some C++ Love

Updated on 2023-10-03

Exploit some of the power of C++ and optimize the way your code calls your code.

Introduction

C++, as usual, gives you several ways to do a thing, with different ramifications and impact on the generated code.

Here, we're going to explore a couple of different ways to get your code to "bind" to functions.

What do I mean by "bind" in this case? I mean coaxing your compiler to resolve a function and invoke it.

I've never seen this topic covered, but I thought it should be.

Understanding This Mess

The typical C++ compiler is an incredibly advanced beast. It can slice and dice your code until the generated assembly doesn't seem to reflect the original source at all in order to optimize it.

With this power comes the responsibility of understanding - at least in broad strokes - how the code you write influences the output.

On one extreme, C++ calls your function through a pointer, or through a vtable on a class, which is essentially a list of function pointers to your virtual functions. On the other extreme, the function is removed and inlined in your code - never called at all (the code in the function body is copied into the caller's routine.)

C++ will try to inline if the function is small enough - particularly when the end generated code of the method body would be less than it would calling a function.

When your compiler has direct access to the function and can resolve it at compile time - potentially eliminating the function pointer (but not required to) - I call this source level binding.

C++ - with the exception of the latest C++ compilers on the latest standards under special circumstances - cannot inline vtable calls or otherwise calls to function pointers. You must eat the extra overhead of creating the stack frame and the calling the function.

This would be runtime binding. When I refer to runtime binding, I'm referring to binding through a function pointer of some sort.

Coding This Mess

Runtime Binding

Consider the following code:

typedef class runtime_interface {
public:
    virtual int add(int lhs,int rhs)=0;
    virtual int subtract(int lhs,int rhs)=0;
    virtual int multiply(int lhs,int rhs)=0;
    virtual int divide(int lhs,int rhs)=0;
} runtime_interface_t;

typedef class runtime_binding final :
        public runtime_interface_t {
public:
    virtual int add(int lhs,int rhs) override {
        return lhs+rhs;
    }
    virtual int subtract(int lhs,int rhs) override {
        return lhs-rhs;
    }
    virtual int multiply(int lhs,int rhs) override {
        return lhs*rhs;
    }
    virtual int divide(int lhs,int rhs) override {
        return lhs/rhs;
    }
} runtime_binding_t;

Here, we've created a pure virtual class in runtime_interface_t which serves as our polymorphic runtime interface. This essentially crafts our final vtable - which again is a series of function pointers - or in this case, slots for function pointers since we haven't filled them in yet (pure virtual).

runtime_binding_t implements this interface, essentially filling in the vtable with function pointers to our function implementations.

When we call through runtime_interface_t, we're forcing the compiler to generate code to bind to the instance that implements it at runtime. It will create a stack frame*, and then call into the function.

  • assumes a typical calling convention where the caller creates the stack frame.

We can then create a function that uses it, like so:

void runtime_bind(runtime_interface_t& obj) {
    printf("2 + 2 = %d\r\n",obj.add(2,2));
    printf("5 - 2 = %d\r\n",obj.subtract(5,2));
    printf("2 * 3 = %d\r\n",obj.multiply(2,3));
    printf("4 / 2 = %d\r\n",obj.divide(4,2));
}

Note that I'm using printf instead of cout here. The iostream constructs tend to clutter the generated assembly with mess we don't need and will distract from the point.

More importantly, I'm calling each of the functions in that runtime_interface_t vtable.

Let's take a look at the generated assembly. I'm using godbolt.org setting it to GCC x64 and turning optimization on (-O). GCC tends to generate easier to follow code in many cases than MSVC and that's true here as well.

godbolt.org

.LC0:
        .string "2 + 2 = %d\r\n"
.LC1:
        .string "5 - 2 = %d\r\n"
.LC2:
        .string "2 * 3 = %d\r\n"
.LC3:
        .string "4 / 2 = %d\r\n"
runtime_bind(runtime_interface&):
        push    rbx
        mov     rbx, rdi
        mov     rax, QWORD PTR [rdi]
        mov     edx, 2
        mov     esi, 2
        call    [QWORD PTR [rax]]
        mov     esi, eax
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf
        mov     rax, QWORD PTR [rbx]
        mov     edx, 2
        mov     esi, 5
        mov     rdi, rbx
        call    [QWORD PTR [rax+8]]
        mov     esi, eax
        mov     edi, OFFSET FLAT:.LC1
        mov     eax, 0
        call    printf
        mov     rax, QWORD PTR [rbx]
        mov     edx, 3
        mov     esi, 2
        mov     rdi, rbx
        call    [QWORD PTR [rax+16]]
        mov     esi, eax
        mov     edi, OFFSET FLAT:.LC2
        mov     eax, 0
        call    printf
        mov     rax, QWORD PTR [rbx]
        mov     edx, 2
        mov     esi, 4
        mov     rdi, rbx
        call    [QWORD PTR [rax+24]]
        mov     esi, eax
        mov     edi, OFFSET FLAT:.LC3
        mov     eax, 0
        call    printf
        pop     rbx
        ret

It's pretty straightforward. Every few lines, it's loading the arguments into registers, calling the appropriate vtable function and then calling printf().

Source Binding

Source binding gives the compiler more options as to how to implement your function, but polymorphism isn't typically one of them (excepting the cases where C++ can optimize function overloads without a vtable). Ergo, you cannot do a source level bind at runtime - it doesn't even make any sense. If you're thinking about exporting functions from a runtime library like a DLL, forget source level binding. If you're thinking of exposing something like a COM object, again, forget source level binding.

When you can use it however, the compiler is free to rearrange your code to optimize the call, in some cases, eliminating it entirely!

To enable source level binding with a kind of pseudo-polymorphism, you can use template. You take the type of the object you're going to pass as a template argument which gives the compiler everything it needs to resolve your functions at compile time instead of runtime.

First, let's take a step back and consider the following:

typedef class source_only_binding final {
public:
    int add(int lhs,int rhs) {
        return lhs+rhs;
    }
    int subtract(int lhs,int rhs) {
        return lhs-rhs;
    }
    int multiply(int lhs,int rhs) {
        return lhs*rhs;
    }
    int divide(int lhs,int rhs) {
        return lhs/rhs;
    }
} source_only_binding_t;

This is very similar to runtime_binding_t except it doesn't implement an interface. By itself, this isn't very illustrative. The magic happens when we call it:

template<typename T>
void source_bind(T& obj) {
    printf("2 + 2 = %d\r\n",obj.add(2,2));
    printf("5 - 2 = %d\r\n",obj.subtract(5,2));
    printf("2 * 3 = %d\r\n",obj.multiply(2,3));
    printf("4 / 2 = %d\r\n",obj.divide(4,2));
}

You can see this is very similar to our runtime_bind() except it's a template function and takes the type of the target object as a template argument.

You call it the same way as runtime_bind(), passing an instance of source_only_binding_t. The compiler will implicitly fill in the template argument for you.

source_only_binding_t src;
source_bind(src);

Now let's examine the assembly output. Looking at the output, there is no source_bind() function. It has been inlined into main():

main:
        sub     rsp, 8
        mov     edi, OFFSET FLAT:.LC4
        call    puts
        mov     esi, 4
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf
        mov     esi, 3
        mov     edi, OFFSET FLAT:.LC1
        mov     eax, 0
        call    printf
        mov     esi, 6
        mov     edi, OFFSET FLAT:.LC2
        mov     eax, 0
        call    printf
        mov     esi, 2
        mov     edi, OFFSET FLAT:.LC3
        mov     eax, 0
        call    printf
        mov     eax, 0
        add     rsp, 8
        ret

You'll have to squint at it a little because the compiler has completely changed things around. There are no function calls except to printf(). Not only have the functions all been inlined, but the arguments and the call have been entirely folded into simple literal values. For example, instead of add(2,2) the compiler simply puts 4 in the output! It already knew the answer because of the magic of source level binding. This is, of course, an extreme case for illustrative purposes, but with more compile time information, your compiler can do a lot of magic to make your code tighter - sometimes much tighter.

Why Not Both?

There's absolutely no reason you can't pass an instance of runtime_binding_t to source_bind(). This is because in a source bind, the only thing that matters are the function names, not the type of the class. In fact, doing so will give you the best of both worlds, allowing for source level or runtime binding.

History

  • 3rd October, 2023: Initial version