C++ and the Zero-Overhead Principle
RAII, templates, the STL, and move semantics all chase one promise - you don't pay for what you don't use - and the steep complexity bill that promise quietly runs up.
Bjarne Stroustrup distilled C++'s entire philosophy into a single sentence: "What you don't use, you don't pay for. And further: what you do use, you couldn't hand-code any better." This is the zero-overhead principle, and it is the lens through which every C++ memory feature should be read. RAII, templates, the STL, move semantics - none of them is "free" in the sense of being magic. Each is a bet that the abstraction compiles down to exactly the machine code a careful C programmer would have written by hand, so you get the ergonomics for free and keep the control.
The bet usually pays off. But it pays off at a price the slogan doesn't mention:
complexity. The reason a std::unique_ptr can be the same size as a raw
pointer and free its target with no runtime tax is a tower of compile-time
machinery - templates, overload resolution, special member functions, value
categories - that the programmer must understand to use safely. This article walks
the four pillars of C++'s memory model, shows what each compiles to, and is honest
about the bill.
"You don't pay for what you don't use"
The principle has two halves, and the second is the strict one.
The first half - don't pay for unused features - is why C++ has no mandatory
garbage collector, no boxed everything, no runtime type tags on plain structs. A
struct Point { float x, y; }; is eight bytes, the same as in C. If you never
throw an exception, the happy path carries no cost for the machinery that would
unwind one (the "zero-cost exceptions" model: cost is paid only when a throw
actually happens, in exchange for a fatter binary holding unwind tables). If you
never call a virtual function, your objects carry no vtable pointer.
struct Plain { int a, b; }; // 8 bytes. No hidden fields.
struct Virtual { virtual ~Virtual(){}; int a, b; }; // 16 bytes: + vptr
static_assert(sizeof(Plain) == 8); // pay nothing for polymorphism you skip
// sizeof(Virtual) is 16 on a 64-bit target: you opted into a vtable pointer
The second half - what you do use is as good as hand-coding - is the harder
promise, and it is what justifies the abstractions below. A std::sort should be
faster than C's qsort, not slower, because the comparator is inlined at compile
time instead of called through a function pointer. A unique_ptr should generate
the identical free call you'd write by hand. When the abstraction breaks this
promise (and shared_ptr does, as we'll see), that's a deliberate trade you should
be able to see and decline.
The contrast with C++'s newer cousins is instructive. Zig states a related rule - "no hidden allocations, no hidden control flow" - but reaches it by removing machinery rather than making it free: no destructors, no exceptions, no operator overloading. Both languages distrust hidden cost; C++ hides the mechanism and exposes the cost as zero, while Zig refuses to hide the mechanism at all.
RAII: lifetime is ownership
RAII - Resource Acquisition Is Initialization - is C++'s central idea and its
best one. A resource's lifetime is bound to an object's lifetime. The constructor
acquires; the destructor releases. When control leaves the scope where the object
lives - by return, by falling off the end, or by an exception unwinding the
stack - the compiler runs the destructor automatically, in reverse construction
order.
#include <cstdlib> // std::malloc, std::free
#include <new> // std::bad_alloc
// A minimal owning buffer: it allocates in its constructor and frees in its
// destructor. No GC, no defer - the lifetime of the heap block IS the lifetime
// of this object.
class Buffer {
char *data_;
size_t size_;
public:
explicit Buffer(size_t n) : data_(static_cast<char*>(std::malloc(n))), size_(n) {
if (!data_) throw std::bad_alloc{};
}
~Buffer() { std::free(data_); } // runs at scope exit, always
Buffer(const Buffer&) = delete; // (copying would double-free)
Buffer& operator=(const Buffer&) = delete;
char *data() { return data_; }
size_t size() const { return size_; }
};
void use() {
Buffer b(4096); // malloc happens here
b.data()[0] = 'x';
if (b.size() < 8) return; // <-- early return: ~Buffer still runs, frees
// ... real work ...
} // <-- ~Buffer runs here on the normal path
The defining property is exception safety with no try/catch. There is no
cleanup code written above, yet if any line throws, the stack unwinds and every
fully-constructed local's destructor runs on the way out. This is the one thing
the defer-based languages (Zig, Odin, Hare) cannot do automatically - their
cleanup runs at scope exit too, but they model errors as return values, not as
unwinding, because they have no destructors to unwind through.
RAII is what makes the memory-bug landscape in idiomatic C++ smaller than in C:
a leak requires you to avoid RAII (raw new with no owner), and a use-after-free
requires you to outlive a destructor. The cost - and there always is one - is that
the free is invisible at the call site. use() allocates 4 KiB and frees it,
and neither operation appears as a line of code. That invisibility is exactly what
arena-oriented designs and Zig's explicit-allocator rule push back against: when
allocation is a side effect of declaring a variable, you can lose track of how
much you're allocating.
Move semantics: transfer without copy
RAII creates a new problem the moment you want to return an owning object or put
it in a container. If Buffer above can't be copied (copying would double-free),
how do you get one out of a factory function? The pre-2011 answer was ugly. The
C++11 answer is move semantics.
A move transfers ownership of a resource from one object to another, leaving the
source in a valid but empty state - typically holding nullptr, whose destructor
is a harmless no-op. No bytes are copied; only the pointer (and bookkeeping) is
handed over. This is the mechanism behind "keep on success" - the same problem
Zig spells with the errdefer keyword, C++ folds into the type system.
#include <utility> // std::exchange
#include <cstddef> // size_t
#include <cstdlib> // std::free
class Buffer {
char *data_ = nullptr;
size_t size_ = 0;
public:
explicit Buffer(size_t n); // allocates (as before)
~Buffer() { std::free(data_); }
// MOVE constructor: steal the pointer, null out the source so its
// destructor frees nothing. This is O(1) and allocates nothing.
Buffer(Buffer&& other) noexcept
: data_(std::exchange(other.data_, nullptr)),
size_(std::exchange(other.size_, 0)) {}
Buffer& operator=(Buffer&& other) noexcept {
if (this != &other) {
std::free(data_); // release what we hold
data_ = std::exchange(other.data_, nullptr);
size_ = std::exchange(other.size_, 0);
}
return *this;
}
};
Buffer make(size_t n) {
Buffer b(n);
return b; // moved (or elided) out - no deep copy, no double free
}
Two subtleties carry real weight:
noexceptis not decoration.std::vectorwill only move its elements when reallocating its backing store if their move constructor isnoexcept; otherwise it falls back to copying them, to preserve its strong exception guarantee. A move constructor that forgetsnoexceptcan silently turn an O(1)-per-element growth into O(n) deep copies. This is a place where the zero-overhead promise depends on a one-word annotation.- The moved-from object still gets destroyed. Move doesn't end a lifetime; it
empties an object. That's why the source must be left valid (here,
nullptr), so its eventual destructor is safe. Forgetting to null the source is the classic way to reintroduce the double-free that RAII was supposed to abolish.
Move semantics are the reason unique_ptr can be a true zero-overhead replacement
for owning raw pointers: it is move-only, so passing ownership around costs a
pointer copy and a null-out, exactly what you'd hand-write.
Templates and the STL: monomorphization
C++'s answer to generic code is templates, and the STL (Standard Template
Library - generic containers and algorithms, brought into C++ by Alexander
Stepanov in 1994) is built on them. The relevant property for the zero-overhead
principle is monomorphization: a vector<int> and a vector<double> are two
separate concrete types, each compiled to code specialized for its element type.
There is no boxing, no void*, no runtime dispatch.
template <typename T>
T max_of(T a, T b) { return a < b ? b : a; } // a stamp, not a function
int i = max_of(3, 9); // compiler stamps out max_of<int>
double d = max_of(2.5, 1.5); // ... and a separate max_of<double>
// Each instantiation inlines to a single compare+select. Zero call overhead.
The STL turns this into a performance argument C can't match without giving up genericity. The canonical example is sorting:
#include <algorithm>
#include <vector>
std::vector<int> v = /* ... */;
std::sort(v.begin(), v.end()); // comparator inlined; no indirect calls
C's qsort takes a comparator as a function pointer, called once per comparison
through an indirection the optimizer usually cannot inline. std::sort takes the
comparator as a type (defaulting to std::less<int>), so the comparison is
inlined into the sort loop. The generic version is typically the faster one - the
second half of the zero-overhead promise, delivered.
Where the STL touches memory directly, it does so through allocators, a template parameter most code never names:
template <class T, class Allocator = std::allocator<T>>
class vector; // the allocator is the second, defaulted, type parameter
A std::vector<int> owns a single contiguous heap block and grows it
geometrically (commonly ~2x or 1.5x) so that n push_backs cost O(n)
total - amortized O(1) each - with the elements' destructors run automatically
when the vector dies. For tighter control, C++17's polymorphic memory resources
(std::pmr) let you hand a container an arena at runtime without changing its
type explosively:
#include <memory_resource>
#include <vector>
void handle_request() {
std::byte buf[1 << 16]; // 64 KiB on the stack
std::pmr::monotonic_buffer_resource arena{buf, sizeof buf};
std::pmr::vector<int> v{&arena}; // allocates FROM the arena
for (int i = 0; i < 1000; ++i) v.push_back(i); // no global heap touched
} // arena and its storage vanish with the stack frame - bulk free, like Odin's
// swappable context.allocator or Zig's ArenaAllocator
This is C++ reaching for the same arena idea the newer languages make idiomatic -
but it took until 2017 to standardize, and std::pmr remains a corner of the
language most programmers never visit.
The complexity cost
Now the honest part. Zero-overhead is real, but it is purchased with complexity that the abstractions push onto the programmer and the toolchain instead of onto the running program. The cost didn't disappear; it changed venue.
The Rule of Five. The moment a class manages a resource by hand, you may owe
five special member functions - destructor, copy constructor, copy assignment, move
constructor, move assignment - and getting any of them subtly wrong reintroduces
the exact double-free or leak RAII promised to prevent. The Buffer above had to
delete copies and write moves carefully. (Modern advice is the "Rule of Zero":
hold resources in members that already manage themselves, like unique_ptr or
vector, so you write none of the five - but knowing when that applies is itself
expertise.)
shared_ptr is the abstraction that breaks the slogan - visibly. Where
unique_ptr is genuinely zero-overhead, shared_ptr is not: it carries a
heap-allocated control block with atomic reference counts, so every copy and
destruction is a synchronized increment/decrement across cores, and the pointer is
twice the size of a raw one.
#include <memory>
auto a = std::make_unique<int>(7); // sizeof(a) == sizeof(int*). Free is one free().
auto b = std::make_shared<int>(7); // sizeof(b) == 2 pointers; atomic refcount;
// a control block; cycles leak unless you
// break them with std::weak_ptr.
This is the principle working as designed: you only pay the atomic-refcount tax
if you reach for shared ownership. But it is also a trap, because shared_ptr is
the smart pointer that looks like a garbage collector and tempts you to use it
everywhere, paying for sharing you don't need.
Templates move the cost to compile time and error messages. Monomorphization means every instantiation is real code: heavy template use produces code bloat (many near-identical functions) and famously long compile times. And because templates are duck-typed until instantiated, a small mistake can yield a page of diagnostics pointing deep inside STL internals. C++20 concepts exist largely to move those errors back to the call site - a feature added to manage complexity the templates themselves created.
The language is enormous. Compare the surface area honestly. HolyC was Terry
A. Davis's deliberately small C-with-a-few-C++-conveniences, native to TempleOS,
where memory is MAlloc/Free against a per-task heap and the whole system fits
in one person's head - the antithesis of C++'s "expert friendly" sprawl. Hare and
Forth go further toward minimalism. The same job - own a buffer, free it on every
path - looks like this across the family:
// HolyC (TempleOS): manual, per-task heap, no destructors, no RAII.
U8 *buf = MAlloc(4096); // from this task's heap
// ... use buf ...
Free(buf); // you free it. Free(NULL) is allowed.
// Zig: explicit allocator, defer for cleanup. No destructors, cost is visible.
const buf = try allocator.alloc(u8, 4096);
defer allocator.free(buf); // runs at scope exit; the free is on the page
// Odin: defer + implicit context.allocator. Still no RAII.
buf := make([]u8, 4096) // uses context.allocator
defer delete(buf) // explicit, scope-exit cleanup
// Hare: manual alloc/free with defer; minimal language, no errdefer, no RAII.
let buf: []u8 = alloc([0u8...], 4096);
defer free(buf); // cleanup written next to the acquisition
\ Forth: the optional ALLOCATE/FREE word set. No scope, no defer; pair by hand.
4096 allocate throw ( buf ) \ request 4096 bytes; throw on failure
\ ... use buf ...
free throw \ you free it, on the one exit path
// C++: RAII hides the free entirely - and that is both the feature and the cost.
auto buf = std::make_unique<char[]>(4096);
// ... use buf.get() ...
// no free line: ~unique_ptr frees at scope exit, even if an exception unwinds
Every other language here makes the free visible; C++ makes it automatic. That single difference is the whole trade. The C++ line is the safest of the lot (it can't leak on an early return or an exception) and the most opaque (you can't see the allocation or the free, only the declaration).
The verdict, on memory terms
The zero-overhead principle is not marketing - it is a verifiable property, and
C++ largely delivers it. unique_ptr is a raw pointer. std::sort is faster
than qsort. vector is a hand-rolled growable array with automatic element
destruction. Move semantics do eliminate the copies they target. For the
features you use, the machine code is hard to beat.
What the slogan omits is where the displaced cost lands:
- Runtime cost is genuinely near-zero for the unique-ownership, value-semantic,
monomorphized core - and explicitly non-zero for the features (
shared_ptr, exceptions when thrown, virtual dispatch) you opt into. - Cognitive cost is high and growing: the Rule of Five/Zero, value categories,
noexceptcorrectness, template instantiation rules, and a standard library that spans thousands of pages. The safety RAII buys is real, but using it correctly is expert work. - Toolchain cost - compile times, code bloat, error-message archaeology - is the bill templates run up to keep runtime at zero.
The newer systems languages read as a series of arguments with exactly this
trade. Zig keeps the control and rejects the hidden machinery (no destructors, no
exceptions, explicit allocators). Odin and Hare keep defer but drop RAII. HolyC
and Forth stay small enough to hold in your head and free everything by hand.
C++'s wager is the opposite: accept a language of staggering complexity in exchange
for abstractions that, when you understand them, cost nothing at runtime. Whether
that wager is worth it is the oldest argument in systems programming - and the
reason every language on this site is, in part, a response to it.