← History

Metaprogramming: macros, templates, comptime, words

Four radically different ways to write code that writes code - C's textual preprocessor, C++'s template/constexpr machinery, Zig's unified comptime, and Forth's self-extending immediate words - read through the one question this site keeps asking: where do the bytes come from, and when?

CC++ZigForth

"Metaprogramming" is a grand word for a humble idea: code that produces other code. Every language in this collection has to answer the same question - how do I write a thing once and have it specialized, expanded, or generated for me? - and the answers are wildly different. C answers with a separate, typeless text substitution language. C++ answers with templates plus constexpr: a whole second language that runs inside the compiler. Zig answers with one keyword, comptime, that just runs ordinary Zig at build time. Forth answers by being homoiconic - the compiler is itself made of Forth words, and a program can extend that compiler mid-definition.

This site cares about memory, so that is the lens here. Metaprogramming is, at bottom, a memory-management technique. The work a macro or a template or a comptime block does at compile time is work that does not happen at run time

The four headline mechanisms - C preprocessor, C++ templates/constexpr, Zig comptime, Forth immediate words - span the entire design space, and the other three languages here (HolyC, Hare, Odin) stake out instructive positions in between. We will walk the spectrum from "no metaprogramming at all" to "the program is the compiler."

The C preprocessor: a typeless language stapled on top

C's only metaprogramming facility is the preprocessor, and it is worth being precise about what it is: a separate language that runs before the C compiler ever sees your code, operating purely on text (technically, on token sequences). It does not know about types, scopes, or expressions. It knows about text and substitution. That is its power and its curse.

// Object-like and function-like macros: pure textual substitution.
#define PI 3.14159
#define SQUARE(x) ((x) * (x))     // parens matter - see below

double area(double r) {
    return PI * SQUARE(r);        // expands to: 3.14159 * ((r) * (r))
}

Because substitution is textual, the famous footguns are not bugs in your code - they are inherent to the model. The two classics every C programmer learns by scar tissue:

// 1) Missing parens - precedence is destroyed by naive substitution.
#define BAD_SQ(x) x * x
int y = BAD_SQ(1 + 2);   // expands to 1 + 2 * 1 + 2  ==  5, not 9

// 2) Double evaluation - the argument's text is pasted twice, so its
//    SIDE EFFECTS happen twice. This is a real memory/correctness hazard.
#define MAX(a, b) ((a) > (b) ? (a) : (b))
int z = MAX(i++, j);     // i++ evaluated TWICE; i advances twice

The preprocessor has no idea i++ has a side effect, because it never sees an expression - only the characters i++, which it pastes wherever a appears. A real (inline) function would evaluate its argument exactly once; a macro cannot, because it is not a function.

Where the preprocessor genuinely earns its keep on a memory-focused reading is code generation that lands in static storage. The canonical idiom is the X-macro: a single list, defined once, expanded into several different forms - an enum, a string table, a dispatch - so the lists can never drift out of sync.

// One source of truth. Define the list once...
#define COLORS  \
    X(RED,   0xFF0000) \
    X(GREEN, 0x00FF00) \
    X(BLUE,  0x0000FF)

// ...expand it into an enum:
enum Color {
#define X(name, rgb) COLOR_##name,     // ## pastes tokens: COLOR_RED, ...
    COLORS
#undef X
    COLOR_COUNT
};

// ...and expand the SAME list into a parallel const table in .rodata.
static const struct { const char *name; unsigned rgb; } color_info[] = {
#define X(name, rgb) { #name, rgb },   // # stringizes: "RED", "GREEN", ...
    COLORS
#undef X
};

The memory payoff is real: color_info[] is a static const array, so the compiler places it in read-only data mapped straight from the executable. There is no heap allocation, no init loop at startup, and nothing to free - the table simply exists in the binary's image. The X-macro generated it for you, and kept the enum and the table in lockstep, but it did so by gluing text together, with no type checking of the result until the C compiler runs afterward.

The hard limit: the preprocessor cannot run a loop, cannot do arithmetic on a real value, cannot inspect a type. It can paste text and choose between branches (#if). Everything beyond that is the C compiler's constant folder, not metaprogramming. This is exactly the wall the other languages set out to climb.

HolyC: no #define at all - #exe{} runs a program instead

Terry A. Davis's HolyC, the native language of his single-developer TempleOS, makes a deliberate and surprising choice here: it drops #define function macros entirely. Davis was blunt about it - in the language documentation he simply notes he is "not a fan." Given how much grief macro footguns cause in C, removing them is a coherent design decision, of a piece with HolyC's general project of stripping C's ceremony down to one programmer's taste.

But TempleOS is JIT-compiled at the prompt, and that compile-as-you-go nature gives HolyC a different and genuinely interesting answer: the #exe{} block. Inside #exe{ ... } you write a small HolyC program that runs during compilation, and whatever it emits with StreamPrint() is injected as source text into the stream of code being compiled. It is code that runs at compile time to produce code - the same shape as a Lisp macro or a Zig comptime block, but expressed as "run a program now and splice its output here."

// HolyC: no #define. Instead, #exe{} runs HolyC at compile time and the
// text it StreamPrint()s becomes part of the program being compiled.
// Here we bake the build timestamp directly into a string constant.
U8 *build_stamp = #exe{ StreamPrint("\"%D %T\"", Now, Now); };

// A more "codegen" use: emit a const table by looping AT COMPILE TIME and
// printing source text. The loop runs in the compiler; the array it writes
// lands in the data segment, with no runtime init loop and nothing to free.
I64 squares[8] = {
#exe{
  I64 i;
  for (i = 0; i < 8; i++)
    StreamPrint("%d,", i * i);   // emits: 0,1,4,9,16,25,36,49,
}
};

Notice the contrast with C's X-macro. C's preprocessor cannot run that for loop; you would have to hand-write X(0) X(1) ... or unroll by hand. HolyC's #exe{} runs a real loop in the compiler because the compiler is right there, JIT-ing your line. The result is the same kind of static table in the data segment - but generated by execution, not textual pasting. (On a memory reading, this fits TempleOS perfectly: top-level code runs once at load into the task's heap or data segment, and a task's memory is reclaimed wholesale when the task dies, so a precomputed table is just data that lives for the task's life.)

The respectful, accurate summary: HolyC is less macro-capable than C in the textual sense (no #define) but has a compile-time execution facility C lacks (#exe{}), because its JIT-at-the-prompt design made that natural. It is a thoughtful trade, not an omission.

C++ templates: a second, accidentally-Turing-complete language

C++ took C's preprocessor problem - "I want generic code, but #define is typeless and dangerous" - and answered it with templates: a real, type-aware substitution engine built into the compiler. A template is a pattern for code; the compiler stamps out a concrete version (an instantiation) for each distinct set of type arguments. This process is monomorphization, and it is the same thing Zig and Rust do.

// A function template: one pattern, monomorphized per type used.
template <typename T>
T max_of(T a, T b) {
    return a > b ? a : b;
}
// max_of<int>(3, 9)   -> a concrete int version is generated
// max_of<double>(...) -> a separate double version is generated
// A class template - this is how std::vector<T> works. Each T yields a
// distinct concrete type with no boxing and no runtime type dispatch.
template <typename T>
class Box {
    T value;
public:
    explicit Box(T v) : value(v) {}
    const T& get() const { return value; }
};
// Box<int> and Box<std::string> are two completely separate types.

The memory consequences are the heart of the matter. Monomorphization means generic code costs nothing at run time relative to hand-written code: no boxing, no void*, no vtable, no indirection. std::vector<int> stores ints directly, contiguously. That is the "zero-overhead abstraction" promise. The bill instead lands at compile time and in the binary: every instantiation is real, separate machine code, so heavy template use produces code bloat - many nearly identical functions - and notoriously long compile times and page-spanning error messages, because templates are duck-typed until instantiated.

Templates turned out to be accidentally Turing-complete - you can compute arbitrary values at compile time by abusing recursive instantiation and partial specialization. The classic factorial-in-the-type-system:

// Template metaprogramming, the old way: recursion via specialization.
// The "value" is computed entirely by the compiler instantiating types.
template <unsigned N> struct Fact { static constexpr unsigned value = N * Fact<N-1>::value; };
template <>           struct Fact<0> { static constexpr unsigned value = 1; };

static_assert(Fact<5>::value == 120);   // proven at compile time

This works, but it is a different sublanguage - recursion-as-instantiation, specialization-as-base-case - that reads nothing like ordinary C++. Modern C++ made the obvious move: let you write ordinary C++ and run it at compile time.

constexpr, consteval, if constexpr

constexpr marks a function or variable that may be evaluated at compile time. If you call it in a context that demands a constant (an array size, a static_assert, a constexpr variable), it runs in the compiler; otherwise it can run at run time like a normal function.

// constexpr: the SAME function, evaluated at compile time when the context
// requires a constant, otherwise at run time. The factorial, sanely.
constexpr unsigned fact(unsigned n) {
    unsigned r = 1;
    for (unsigned i = 2; i <= n; ++i) r *= i;   // a real loop, at compile time
    return r;
}

constexpr unsigned f5 = fact(5);   // computed by the compiler -> 120
static_assert(f5 == 120);

// Build a lookup table at compile time. It lands in .rodata: no heap,
// no startup loop, nothing to free - exactly the C X-macro payoff, but
// written as a normal loop with full type checking.
constexpr auto squares = [] {
    std::array<int, 16> t{};
    for (int i = 0; i < 16; ++i) t[i] = i * i;
    return t;
}();

consteval (C++20) is stricter: a consteval function is an immediate function - it is guaranteed to run at compile time, and it is an error to call it in a way that would defer to run time. Use it when "this must never cost anything at run time" is a contract you want enforced.

// consteval: MUST evaluate at compile time. Calling with a runtime value
// is a compile error - the guarantee is enforced, not just allowed.
consteval int square_now(int x) { return x * x; }
constexpr int nine = square_now(3);   // OK, compile time
// int bad = square_now(runtime_x);   // ERROR: not a constant expression

if constexpr (C++17) is compile-time branch selection: the false branch is discarded entirely - not compiled, not type-checked for validity - which lets one template body do different things per type without SFINAE gymnastics.

// if constexpr: the dead branch is dropped at compile time. This replaces
// a pile of SFINAE/overload tricks with a readable compile-time if.
template <typename T>
void describe(const T& x) {
    if constexpr (std::is_integral_v<T>)
        std::puts("an integer");        // only this branch exists for ints
    else
        std::puts("something else");    // only this branch exists otherwise
}

So C++ ended up with three overlapping metaprogramming systems: the preprocessor (inherited from C, still there, still textual), templates (a type-aware pattern language with its own baroque rules - SFINAE, partial specialization, fold expressions, concepts), and constexpr/consteval compile-time evaluation of ordinary C++. Each is powerful; together they are a language of staggering surface area. That accumulated complexity is precisely the target Zig took aim at.

Zig comptime: one mechanism for all of it

Zig's bet is that you do not need a second language at all. The compiler can simply run ordinary Zig during compilation, and that single idea - comptime

const std = @import("std");

// A comptime-evaluated block: an ordinary `for` loop runs IN the compiler,
// and the const result lives in the binary's read-only data (.rodata).
const squares: [16]u32 = blk: {
    var t: [16]u32 = undefined;
    for (&t, 0..) |*slot, i| slot.* = @intCast(i * i);
    break :blk t;
};
// No allocator, no startup loop, nothing to free. The work happened once,
// in the compiler - the same .rodata payoff as the C X-macro and the C++
// constexpr table, but written as plain Zig.

The keystone is that a type is just a value of type type, known at compile time. So generics are not a template system - they are ordinary functions that take a comptime type parameter, and generic containers are functions that return a type:

// Generics with no template language: a normal function whose parameter
// happens to be a type. The compiler monomorphizes per distinct T.
fn maxOf(comptime T: type, a: T, b: T) T {
    return if (a > b) a else b;
}

// A generic container is a function returning a struct type. This is how
// std's ArrayList etc. are written - and the allocator stays explicit.
fn Box(comptime T: type) type {
    return struct { value: T };
}
// Box(i32) and Box([]const u8) are two distinct concrete types.

Because compile-time Zig is just Zig, you get reflection for free: @typeInfo / std.meta.fields let you inspect a type's fields at compile time and generate code from them - serializers, hashers, ORMs - with no separate codegen step and no runtime type information.

const std = @import("std");

// Compile-time reflection: sum every field of any struct, generated per
// type at compile time. `inline for` UNROLLS the loop in the compiler, so
// the emitted code has no loop and no per-iteration cost at run time.
fn sumFields(comptime T: type, v: T) i64 {
    var total: i64 = 0;
    inline for (std.meta.fields(T)) |field| {
        total += @field(v, field.name);
    }
    return total;
}

The memory story is the cleanest of the family. comptime moves work out of run time entirely (tables, dispatch decisions, specialized code). Generics built on it monomorphize, so no boxing and no hidden indirection. And it all happens in one ordinary-looking language - the same Zig you write for run-time code - which means generic type errors are normal Zig errors at the use site, not template instantiation backtraces. Zig offers the full power of C++'s compile-time world through a single unified mechanism, and keeps its promise that nothing happens unless you wrote it.

Forth: the program is the compiler

Forth sits at the opposite philosophical pole from C++. Where C++ bolts on a heavyweight separate compile-time language, Forth makes compile-time and run-time the same language by being homoiconic: the Forth compiler is itself just a collection of Forth words, and any word you mark IMMEDIATE executes during compilation. A program can therefore extend its own compiler as it goes.

To see it, you need Forth's two states. Normally the outer interpreter executes words as it reads them (interpret state). Inside a colon definition : NAME ... ; the compiler is compiling words into the new definition (compile state). An IMMEDIATE word breaks the rule: it runs even in compile state. That is the hook on which all Forth metaprogramming hangs - control-flow words like IF, DO, BEGIN are themselves immediate words that run at compile time to lay down branch instructions.

\ Compute a value with a real loop, then freeze the result into a word.
\ SUM-1-TO is an ordinary word; running it NOW (at the interpreter, while we
\ define the constant) sums 1..n, and CONSTANT bakes that result into a word.
: SUM-1-TO ( n -- sum )  0 SWAP 1+ 1 ?DO  I +  LOOP ;
10 SUM-1-TO CONSTANT FIFTY-FIVE   \ runs the loop -> 55, then bakes it into a word

\ Generate a const table AT COMPILE TIME. CREATE makes a word; ',' compiles
\ each cell into data space as the loop runs in the compiler. The table is
\ static program data - no heap, no runtime init, nothing to free.
CREATE SQUARES  16 0 DO  I I *  ,  LOOP   \ ',' (comma) appends a cell of data
\ SQUARES now returns the address of [0,1,4,9,...,225] in the dictionary.

The deepest piece is CREATE ... DOES>, Forth's defining-word mechanism: a word that, when run, defines new words and attaches custom run-time behavior to them. This is metaprogramming in the truest sense - you are extending the language with new "keywords."

\ A defining word. CONSTANT itself can be written in Forth like this:
\   CREATE  lays down a new word that pushes its data-field address;
\   ,       stores the value into that data field at definition time;
\   DOES>   replaces the run-time behavior: fetch (@) the stored value.
: CONSTANT ( n "name" -- )  CREATE ,  DOES>  @ ;

7 CONSTANT SEVEN     \ defines a brand-new word SEVEN...
SEVEN .              \ ...which, when run, pushes 7. Prints: 7

And POSTPONE is how immediate words compose: it parses the next word and arranges for it to be compiled into the current definition (deferring an immediate word, or compiling a normal one), which is how you build new compile-time control structures out of existing ones.

\ Build your own compiler control word. ENDIF becomes a synonym for THEN,
\ compiled in via POSTPONE; IMMEDIATE makes ENDIF run at compile time.
: ENDIF  POSTPONE THEN ;  IMMEDIATE

On a memory reading, Forth's metaprogramming and its allocation model are the same fabric. The dictionary is a bump-allocated region: CREATE and , and ALLOT advance a pointer (HERE) to carve out new words and their data at compile time. So generating a table is allocating it - into static program space, reclaimed (if at all) by rewinding the dictionary with a MARKER, not by freeing individual objects. There is no type system to check any of it; Forth's stack is untyped, which is exactly why a word like SQR ( n -- n*n ) DUP * ; is implicitly "generic" - it works on anything the multiply applies to. Forth buys its astonishing flexibility by giving up every guardrail.

The ones that say "no": Hare and Odin

Two languages here deliberately limit metaprogramming, and their reasons are as instructive as the maximalists'.

Hare goes furthest: it has no macros, no generics, and no metaprogramming whatsoever, by design. The entire language fits in a short specification, and the project's stated goal is to "never surprise you." Where C reaches for an X-macro and Zig reaches for comptime, Hare's answer is: write the values, write the function. Its reasoning is that the one or two data structures central to a program are worth implementing by hand so you understand them and can tune them.

// Hare: no metaprogramming. A "generated table" is just a const global the
// compiler places in read-only data. Reuse is plain, typed functions.
def N: size = 16;
const squares: [N]int = [0, 1, 4, 9, 16, 25, 36, 49,
                         64, 81, 100, 121, 144, 169, 196, 225];

fn sqr(x: int) int = x * x;   // no generics; write the function you need

You get the same .rodata table as everyone else - you just type the numbers. The memory model is maximally legible precisely because there is no metaprogramming layer to reason through.

Odin takes a middle, pragmatic stance: no Zig-style arbitrary compile-time function execution, but it does have compile-time constants, compile-time conditionals (when), build-time configuration (#config), and template-style parametric polymorphism via $T. This covers generics and conditional compilation while keeping the language smaller and more predictable than full comptime.

// Odin: parametric polymorphism with $T - monomorphized per type at the
// call site, like a C++ template or a Zig generic, but more constrained.
sqr :: proc(x: $T) -> T { return x * x }

// Compile-time conditionals via `when`, driven by build configuration.
// The false branch is not compiled - a clean conditional-compilation tool.
ENABLE_LOGGING :: #config(ENABLE_LOGGING, false)

log :: proc(msg: string) {
    when ENABLE_LOGGING {
        fmt.println(msg)      // exists in the binary only if configured on
    }
}

// A compile-time constant table lives in read-only data.
SQUARES :: [16]int{0,1,4,9,16,25,36,49,64,81,100,121,144,169,196,225}

when is the spiritual cousin of C's #if and C++'s if constexpr - a compile-time branch where the dead arm is dropped - but it operates on real typed constants, not text. $T gives the monomorphization payoff (no boxing, no dispatch) without exposing a full template language.

The spectrum, and why it is about memory

Line the seven up and a clean gradient appears, from "no code-generation at all" to "the code generates itself":

Read every one of these as a memory technique and the unifying thread is moving work from run time to compile time, and being able to see where the resulting bytes live. A static const X-macro table, a C++ constexpr std::array, a Zig comptime block, a Forth CREATE , loop, an Odin constant array, a HolyC #exe{}-emitted table - all six produce the same physical thing: data baked into the binary's read-only segment, costing no heap, no startup initialization, and nothing to free. The difference is only in how legibly you can write it and how much the tool checks: from Hare's "type it by hand" through C's untyped text-pasting, to Zig's fully type-checked ordinary code, to Forth's untyped-but-total self-modification.

Generics tell the same story at the level of layout. C++ templates, Zig generics, and Odin's $T all monomorphize, so vector<int>, Box(i32), and [dynamic]int store their elements inline and contiguously - no boxing, no void*, no vtable, no indirection. That is a memory-layout guarantee delivered by metaprogramming. The cost is paid in code size (each instantiation is real machine code) and compile time, not in run-time bytes.

The oldest trade in systems programming runs straight through this topic. C++ accepts enormous compile-time complexity to keep run-time cost at zero. Zig keeps the same compile-time power but folds it into one honest mechanism. Forth gives you unlimited power and no guardrails. Hare refuses the whole game in the name of legibility, and Odin and HolyC each draw a careful line in between. There is no winner - only seven different answers to the same question this site keeps asking: I can make the compiler do this work, so where, exactly, do the bytes end up, and what does it cost when the program actually runs?


Sources: The C preprocessor - GCC docs · X Macros - Wikipedia · constexpr / consteval - cppreference · if constexpr - cppreference · Templates - cppreference · Zig comptime - Zig language reference · HolyC reference (TempleOS docs) · TempleOS - Wikipedia · Hare FAQ - why no macros/generics · Odin overview - parametric polymorphism & when · IMMEDIATE - Forth Standard · Defining words in Forth - Starting Forth, ch. 11