Devirtualization and Static Polymorphism

Published: (February 25, 2026 at 01:41 PM EST)
6 min read

Source: Hacker News

Here’s the cleaned‑up markdown with the CSS properly formatted in a code block and the date on its own line:

main > p:first-of-type::first-letter {
    font-family: goudyinitialen, alegreya, alegreya fallback, serif;
    float: left;
    font-size: 6.6rem;
    line-height: .8;
    padding: .43rem .3rem 0 0;
}

February 25, 2026

Devirtualization and Static Polymorphism

Ever wondered why your “clean” polymorphic design underperforms in benchmarks?
Virtual dispatch enables polymorphism, but it comes with hidden overhead: pointer indirection, larger object layouts, and fewer inlining opportunities. Compilers do their best to devirtualize these calls, but it isn’t always possible. On latency‑sensitive paths, it’s beneficial to manually replace dynamic dispatch with static polymorphism, so calls are resolved at compile time and the abstraction has effectively zero runtime cost.


Virtual dispatch

Runtime polymorphism occurs when a base interface exposes a virtual method that derived classes override. Calls made through a Base& are then dispatched to the appropriate override at runtime. Under the hood:

  • a virtual table (vtable) is created for each class, and
  • a pointer (vptr) to the vtable is added to each instance.

Diagram
Figure 1: The method foo is declared virtual in Base and overridden in Derived. Both classes get a vtable, and each object gets a vptr pointing to the corresponding vtable.

On a virtual call the compiler:

  1. loads the vptr,
  2. selects the right slot in the vtable, and
  3. performs an indirect call through that function pointer.

The extra vptr increases object size, and the indirection makes the call hard to predict. This prevents inlining, increases branch mispredictions, and reduces cache efficiency.

Example: assembly of a non‑virtual call

class Base {
public:
    auto foo() -> int;
};

auto bar(Base* base) -> int {
    return base->foo() + 77;
}

Compiled with gcc -O3 on x86‑64 (similar results with clang):

# non‑virtual foo()
bar(Base*):
        sub     rsp, 8
        call    Base::foo()          # Direct call
        add     rsp, 8
        add     eax, 77
        ret

Example: assembly of a virtual call

class Base {
public:
    virtual auto foo() -> int;
};

auto bar(Base* base) -> int {
    return base->foo() + 77;
}
# virtual foo()
bar(Base*):
        sub     rsp, 8
        mov     rax, QWORD PTR [rdi]    # vptr (pointer to vtable)
        call    [QWORD PTR [rax]]      # Virtual call
        add     rsp, 8
        add     eax, 77
        ret

Devirtualization

Sometimes the compiler can statically deduce which override a virtual call will hit. In those cases it devirtualizes the call and emits a direct call (skipping the vtable).

Straight‑forward devirtualization

struct Base {
    virtual auto foo() -> int = 0;
};

struct Derived : Base {
    auto foo() -> int override { return 77; }
};

auto bar() -> int {
    Derived derived;
    return derived.foo();          // compiler knows this is Derived::foo
}

Because the runtime type is clearly fixed, the compiler emits a direct call (or even inlines the function).

How the compiler achieves this

  • Whole‑program analysis (-fwhole-program): tells the compiler that the current translation unit (TU) is the entire program. If no class derives from Base in this TU, the compiler may assume none ever will and devirtualize calls on Base.
  • Link‑time optimization (-flto): keeps an intermediate representation in the object files and optimizes across all of them at link time, effectively treating multiple source files as a single TU.

Using final to aid devirtualization

class Base {
public:
    virtual auto foo() -> int;
    virtual auto bar() -> int;
};

class Derived : public Base {
public:
    auto foo() -> int override;   // still virtual
    auto bar() -> int final;      // cannot be overridden further
};

auto test(Derived* derived) -> int {
    return derived->foo() + derived->bar();
}
  • foo() can still be overridden, so derived->foo() remains a virtual call.
  • bar() is marked final; the compiler emits a direct call even though it’s declared virtual in the base.
test(Derived*):
        push    rbx
        sub     rsp, 16
        mov     rax, QWORD PTR [rdi]          # load vptr
        mov     QWORD PTR [rsp+8], rdi
        call    [QWORD PTR [rax]]              # virtual call (foo)
        mov     rdi, QWORD PTR [rsp+8]
        mov     ebx, eax
        call    Derived::bar()                  # direct call (bar)
        add     rsp, 16
        add     eax, ebx
        pop     rbx
        ret

Static polymorphism

When the compiler can’t devirtualize, one option is to use static polymorphism. The canonical tool for this is the Curiously Recurring Template Pattern (CRTP).

What is CRTP?

CRTP is an idiom where a class X derives from a class template instantiated with X itself as a template argument. More generally, this is known as F‑bounded polymorphism (or F‑bounded quantification).

CRTP example

template 
class Base {
public:
    auto foo() -> int {
        return 77 + static_cast(this)->bar();
    }
};

class Derived : public Base {
public:
    auto bar() -> int { return 88; }
};

auto test() -> int {
    Derived derived;
    return derived.foo();
}
  • No virtual keyword is used.
  • The base class calls derived‑class functionality via static_cast.

With -O3 optimization the compiler inlines everything, producing code equivalent to a hand‑written, non‑virtual implementation—i.e., zero runtime overhead for the polymorphic abstraction.


Take‑away

  • Virtual dispatch is flexible but incurs indirection and prevents many optimizations.
  • Devirtualization (via whole‑program analysis, LTO, or final) can recover performance when the concrete type is known.
  • Static polymorphism (CRTP) gives you compile‑time dispatch with no runtime cost, at the price of more template‑heavy code.

Choose the technique that best matches your performance goals and code‑base constraints.

No vtable, No vptr, No Indirection

Fully optimized

The trade‑off is that each Base instantiation is a distinct, unrelated type, so there’s no common runtime base to up‑cast to. Any shared functionality that operates across different derived types must itself be templated.

Example Assembly

test():
        mov     eax, 165  // 77 + 88
        ret

Deducing this

C++23’s deducing this keeps the same static‑dispatch model but makes it easier to write. Instead of templating the entire class (and writing Base everywhere), you template only the member function that needs access to the derived type, and let the compiler deduce self from *this.

class Base {
public:
    auto foo(this auto&& self) -> int { return 77 + self.bar(); }
};

class Derived : public Base {
public:
    auto bar() -> int { return 88; }
};

This yields identical optimized code: foo is instantiated as foo, and the call to bar is resolved statically and inlined. — David Álvarez Rosa


0 views
Back to Blog

Related posts

Read more »

Line of Defense: Three Systems, Not One

Three Systems, Not One “Rate limiting” is often used as a catch‑all for anything that rejects or slows down requests. In reality there are three distinct mecha...