← History

C++ and the Zero-Overhead Principle

RAII, templates, the STL, and move semantics all chase one promise - you don't pay for what you don't use - and the steep complexity bill that promise quietly runs up.

C++

Bjarne Stroustrup distilled C++'s entire philosophy into a single sentence: "What you don't use, you don't pay for. And further: what you do use, you couldn't hand-code any better." This is the zero-overhead principle, and it is the lens through which every C++ memory feature should be read. RAII, templates, the STL, move semantics - none of them is "free" in the sense of being magic. Each is a bet that the abstraction compiles down to exactly the machine code a careful C programmer would have written by hand, so you get the ergonomics for free and keep the control.

The bet usually pays off. But it pays off at a price the slogan doesn't mention: complexity. The reason a std::unique_ptr can be the same size as a raw pointer and free its target with no runtime tax is a tower of compile-time machinery - templates, overload resolution, special member functions, value categories - that the programmer must understand to use safely. This article walks the four pillars of C++'s memory model, shows what each compiles to, and is honest about the bill.

"You don't pay for what you don't use"

The principle has two halves, and the second is the strict one.

The first half - don't pay for unused features - is why C++ has no mandatory garbage collector, no boxed everything, no runtime type tags on plain structs. A struct Point { float x, y; }; is eight bytes, the same as in C. If you never throw an exception, the happy path carries no cost for the machinery that would unwind one (the "zero-cost exceptions" model: cost is paid only when a throw actually happens, in exchange for a fatter binary holding unwind tables). If you never call a virtual function, your objects carry no vtable pointer.

struct Plain   { int a, b; };          // 8 bytes. No hidden fields.
struct Virtual { virtual ~Virtual(){}; int a, b; }; // 16 bytes: + vptr

static_assert(sizeof(Plain) == 8);     // pay nothing for polymorphism you skip
// sizeof(Virtual) is 16 on a 64-bit target: you opted into a vtable pointer

The second half - what you do use is as good as hand-coding - is the harder promise, and it is what justifies the abstractions below. A std::sort should be faster than C's qsort, not slower, because the comparator is inlined at compile time instead of called through a function pointer. A unique_ptr should generate the identical free call you'd write by hand. When the abstraction breaks this promise (and shared_ptr does, as we'll see), that's a deliberate trade you should be able to see and decline.

The contrast with C++'s newer cousins is instructive. Zig states a related rule - "no hidden allocations, no hidden control flow" - but reaches it by removing machinery rather than making it free: no destructors, no exceptions, no operator overloading. Both languages distrust hidden cost; C++ hides the mechanism and exposes the cost as zero, while Zig refuses to hide the mechanism at all.

RAII: lifetime is ownership

RAII - Resource Acquisition Is Initialization - is C++'s central idea and its best one. A resource's lifetime is bound to an object's lifetime. The constructor acquires; the destructor releases. When control leaves the scope where the object lives - by return, by falling off the end, or by an exception unwinding the stack - the compiler runs the destructor automatically, in reverse construction order.

#include <cstdlib>   // std::malloc, std::free
#include <new>       // std::bad_alloc

// A minimal owning buffer: it allocates in its constructor and frees in its
// destructor. No GC, no defer - the lifetime of the heap block IS the lifetime
// of this object.
class Buffer {
    char  *data_;
    size_t size_;
public:
    explicit Buffer(size_t n) : data_(static_cast<char*>(std::malloc(n))), size_(n) {
        if (!data_) throw std::bad_alloc{};
    }
    ~Buffer() { std::free(data_); }       // runs at scope exit, always

    Buffer(const Buffer&)            = delete;   // (copying would double-free)
    Buffer& operator=(const Buffer&) = delete;

    char  *data() { return data_; }
    size_t size() const { return size_; }
};

void use() {
    Buffer b(4096);          // malloc happens here
    b.data()[0] = 'x';
    if (b.size() < 8) return;   // <-- early return: ~Buffer still runs, frees
    // ... real work ...
}                            // <-- ~Buffer runs here on the normal path

The defining property is exception safety with no try/catch. There is no cleanup code written above, yet if any line throws, the stack unwinds and every fully-constructed local's destructor runs on the way out. This is the one thing the defer-based languages (Zig, Odin, Hare) cannot do automatically - their cleanup runs at scope exit too, but they model errors as return values, not as unwinding, because they have no destructors to unwind through.

RAII is what makes the memory-bug landscape in idiomatic C++ smaller than in C: a leak requires you to avoid RAII (raw new with no owner), and a use-after-free requires you to outlive a destructor. The cost - and there always is one - is that the free is invisible at the call site. use() allocates 4 KiB and frees it, and neither operation appears as a line of code. That invisibility is exactly what arena-oriented designs and Zig's explicit-allocator rule push back against: when allocation is a side effect of declaring a variable, you can lose track of how much you're allocating.

Move semantics: transfer without copy

RAII creates a new problem the moment you want to return an owning object or put it in a container. If Buffer above can't be copied (copying would double-free), how do you get one out of a factory function? The pre-2011 answer was ugly. The C++11 answer is move semantics.

A move transfers ownership of a resource from one object to another, leaving the source in a valid but empty state - typically holding nullptr, whose destructor is a harmless no-op. No bytes are copied; only the pointer (and bookkeeping) is handed over. This is the mechanism behind "keep on success" - the same problem Zig spells with the errdefer keyword, C++ folds into the type system.

#include <utility>   // std::exchange
#include <cstddef>   // size_t
#include <cstdlib>   // std::free

class Buffer {
    char  *data_ = nullptr;
    size_t size_ = 0;
public:
    explicit Buffer(size_t n);   // allocates (as before)
    ~Buffer() { std::free(data_); }

    // MOVE constructor: steal the pointer, null out the source so its
    // destructor frees nothing. This is O(1) and allocates nothing.
    Buffer(Buffer&& other) noexcept
        : data_(std::exchange(other.data_, nullptr)),
          size_(std::exchange(other.size_, 0)) {}

    Buffer& operator=(Buffer&& other) noexcept {
        if (this != &other) {
            std::free(data_);                          // release what we hold
            data_ = std::exchange(other.data_, nullptr);
            size_ = std::exchange(other.size_, 0);
        }
        return *this;
    }
};

Buffer make(size_t n) {
    Buffer b(n);
    return b;                 // moved (or elided) out - no deep copy, no double free
}

Two subtleties carry real weight:

Move semantics are the reason unique_ptr can be a true zero-overhead replacement for owning raw pointers: it is move-only, so passing ownership around costs a pointer copy and a null-out, exactly what you'd hand-write.

Templates and the STL: monomorphization

C++'s answer to generic code is templates, and the STL (Standard Template Library - generic containers and algorithms, brought into C++ by Alexander Stepanov in 1994) is built on them. The relevant property for the zero-overhead principle is monomorphization: a vector<int> and a vector<double> are two separate concrete types, each compiled to code specialized for its element type. There is no boxing, no void*, no runtime dispatch.

template <typename T>
T max_of(T a, T b) { return a < b ? b : a; }   // a stamp, not a function

int    i = max_of(3, 9);       // compiler stamps out max_of<int>
double d = max_of(2.5, 1.5);   // ... and a separate max_of<double>
// Each instantiation inlines to a single compare+select. Zero call overhead.

The STL turns this into a performance argument C can't match without giving up genericity. The canonical example is sorting:

#include <algorithm>
#include <vector>

std::vector<int> v = /* ... */;
std::sort(v.begin(), v.end());        // comparator inlined; no indirect calls

C's qsort takes a comparator as a function pointer, called once per comparison through an indirection the optimizer usually cannot inline. std::sort takes the comparator as a type (defaulting to std::less<int>), so the comparison is inlined into the sort loop. The generic version is typically the faster one - the second half of the zero-overhead promise, delivered.

Where the STL touches memory directly, it does so through allocators, a template parameter most code never names:

template <class T, class Allocator = std::allocator<T>>
class vector;        // the allocator is the second, defaulted, type parameter

A std::vector<int> owns a single contiguous heap block and grows it geometrically (commonly ~2x or 1.5x) so that n push_backs cost O(n) total - amortized O(1) each - with the elements' destructors run automatically when the vector dies. For tighter control, C++17's polymorphic memory resources (std::pmr) let you hand a container an arena at runtime without changing its type explosively:

#include <memory_resource>
#include <vector>

void handle_request() {
    std::byte buf[1 << 16];                         // 64 KiB on the stack
    std::pmr::monotonic_buffer_resource arena{buf, sizeof buf};
    std::pmr::vector<int> v{&arena};                // allocates FROM the arena

    for (int i = 0; i < 1000; ++i) v.push_back(i);  // no global heap touched
}   // arena and its storage vanish with the stack frame - bulk free, like Odin's
    // swappable context.allocator or Zig's ArenaAllocator

This is C++ reaching for the same arena idea the newer languages make idiomatic - but it took until 2017 to standardize, and std::pmr remains a corner of the language most programmers never visit.

The complexity cost

Now the honest part. Zero-overhead is real, but it is purchased with complexity that the abstractions push onto the programmer and the toolchain instead of onto the running program. The cost didn't disappear; it changed venue.

The Rule of Five. The moment a class manages a resource by hand, you may owe five special member functions - destructor, copy constructor, copy assignment, move constructor, move assignment - and getting any of them subtly wrong reintroduces the exact double-free or leak RAII promised to prevent. The Buffer above had to delete copies and write moves carefully. (Modern advice is the "Rule of Zero": hold resources in members that already manage themselves, like unique_ptr or vector, so you write none of the five - but knowing when that applies is itself expertise.)

shared_ptr is the abstraction that breaks the slogan - visibly. Where unique_ptr is genuinely zero-overhead, shared_ptr is not: it carries a heap-allocated control block with atomic reference counts, so every copy and destruction is a synchronized increment/decrement across cores, and the pointer is twice the size of a raw one.

#include <memory>

auto a = std::make_unique<int>(7);   // sizeof(a) == sizeof(int*). Free is one free().
auto b = std::make_shared<int>(7);   // sizeof(b) == 2 pointers; atomic refcount;
                                     // a control block; cycles leak unless you
                                     // break them with std::weak_ptr.

This is the principle working as designed: you only pay the atomic-refcount tax if you reach for shared ownership. But it is also a trap, because shared_ptr is the smart pointer that looks like a garbage collector and tempts you to use it everywhere, paying for sharing you don't need.

Templates move the cost to compile time and error messages. Monomorphization means every instantiation is real code: heavy template use produces code bloat (many near-identical functions) and famously long compile times. And because templates are duck-typed until instantiated, a small mistake can yield a page of diagnostics pointing deep inside STL internals. C++20 concepts exist largely to move those errors back to the call site - a feature added to manage complexity the templates themselves created.

The language is enormous. Compare the surface area honestly. HolyC was Terry A. Davis's deliberately small C-with-a-few-C++-conveniences, native to TempleOS, where memory is MAlloc/Free against a per-task heap and the whole system fits in one person's head - the antithesis of C++'s "expert friendly" sprawl. Hare and Forth go further toward minimalism. The same job - own a buffer, free it on every path - looks like this across the family:

// HolyC (TempleOS): manual, per-task heap, no destructors, no RAII.
U8 *buf = MAlloc(4096);   // from this task's heap
// ... use buf ...
Free(buf);                // you free it. Free(NULL) is allowed.
// Zig: explicit allocator, defer for cleanup. No destructors, cost is visible.
const buf = try allocator.alloc(u8, 4096);
defer allocator.free(buf);   // runs at scope exit; the free is on the page
// Odin: defer + implicit context.allocator. Still no RAII.
buf := make([]u8, 4096)      // uses context.allocator
defer delete(buf)            // explicit, scope-exit cleanup
// Hare: manual alloc/free with defer; minimal language, no errdefer, no RAII.
let buf: []u8 = alloc([0u8...], 4096);
defer free(buf);             // cleanup written next to the acquisition
\ Forth: the optional ALLOCATE/FREE word set. No scope, no defer; pair by hand.
4096 allocate throw   ( buf )    \ request 4096 bytes; throw on failure
\ ... use buf ...
free throw                       \ you free it, on the one exit path
// C++: RAII hides the free entirely - and that is both the feature and the cost.
auto buf = std::make_unique<char[]>(4096);
// ... use buf.get() ...
// no free line: ~unique_ptr frees at scope exit, even if an exception unwinds

Every other language here makes the free visible; C++ makes it automatic. That single difference is the whole trade. The C++ line is the safest of the lot (it can't leak on an early return or an exception) and the most opaque (you can't see the allocation or the free, only the declaration).

The verdict, on memory terms

The zero-overhead principle is not marketing - it is a verifiable property, and C++ largely delivers it. unique_ptr is a raw pointer. std::sort is faster than qsort. vector is a hand-rolled growable array with automatic element destruction. Move semantics do eliminate the copies they target. For the features you use, the machine code is hard to beat.

What the slogan omits is where the displaced cost lands:

The newer systems languages read as a series of arguments with exactly this trade. Zig keeps the control and rejects the hidden machinery (no destructors, no exceptions, explicit allocators). Odin and Hare keep defer but drop RAII. HolyC and Forth stay small enough to hold in your head and free everything by hand. C++'s wager is the opposite: accept a language of staggering complexity in exchange for abstractions that, when you understand them, cost nothing at runtime. Whether that wager is worth it is the oldest argument in systems programming - and the reason every language on this site is, in part, a response to it.