Forth: building a language from a stack
Two stacks, a dictionary, and raw addresses - how Chuck Moore's RPN minimalism turns memory into a bump pointer you move by hand and a language you extend word by word.
Every other language on this site is, in some sense, an argument about how to hide memory: C++ wraps it in destructors, Zig drags the allocator into the signature, Rust tracks ownership in the type system. Forth makes a different bet, and an older one. It hides almost nothing. There is no garbage collector, no type system, and - in the classic core - not even a heap. There is a stack of numbers, a second stack for return addresses, and a dictionary of named code. Memory is a bump pointer you push forward by hand, and addresses are just numbers you keep on the stack like any other.
Forth predates C. Chuck Moore named it FORTH around 1968 at Mohasco Industries, on an IBM 1130, after refining a personal programming system since the late 1950s; the name was meant to be "Fourth" (a fourth-generation language) but the host OS clipped filenames to five characters. By 1971 he had built a complete standalone Forth at the U.S. National Radio Astronomy Observatory to run the 36-foot radio telescope at Kitt Peak - multitasking instrument control on tiny hardware. That origin is the whole personality of the language: a system small enough that one person can hold the compiler, the editor, and the application in their head at once, and close enough to the metal that controlling a telescope and poking a memory-mapped register are the same kind of act.
This article is about Forth as a memory story. Not its syntax for its own sake, but its picture of memory: the two stacks, the dictionary as an allocator, raw fetch-and-store, and the way you build the language up - including its memory model - out of words you define yourself.
RPN: the data stack is the calling convention
Forth is concatenative and uses Reverse Polish Notation (postfix). There are no parentheses and no precedence because there is nothing to parenthesize: a program is a stream of words, read left to right, and almost every word does its work by popping operands off a shared data stack (also called the parameter stack) and pushing results back. 2 3 + means push 2, push 3, then + pops two and pushes 5. ( 2 + 3 ) * 4 is written 2 3 + 4 *.
\ A word is defined with : name ... ; - the stack comment ( in -- out )
\ is just a comment, but it is the closest thing Forth has to a type signature.
: square ( n -- n^2 ) dup * ; \ duplicate top, multiply: n n -> n*n
: average ( a b -- avg ) + 2/ ; \ add, then halve
5 square . \ prints 25 ( . pops and prints the top of the stack )
10 20 average . \ prints 15
The stack is the calling convention. There are no named parameters and no local variables in classic Forth; arguments arrive on the stack and results leave on it. That makes a small vocabulary of stack shufflers load-bearing - dup (copy top), drop (discard top), swap (exchange top two), over (copy second to top), rot (rotate top three). Moore credited the names of DUP, DROP, and SWAP to the Burroughs B5500's DUPL, DLET, and EXCH instructions; Forth inherits the B5500's stack-machine discipline directly.
The contrast with the C family is exact. Where C passes arguments in registers or on a frame the compiler manages for you, Forth hands you the operand stack as a visible, mutable object. Compare a trivial swap:
/* C: the stack frame is the compiler's business; you name your operands. */
void swap(int *a, int *b) { int t = *a; *a = *b; *b = t; }
\ Forth: SWAP is a primitive that reorders the top two cells in place.
\ No names, no frame you can see - just the stack itself.
: 2dup ( a b -- a b a b ) over over ;
This is the source of both Forth's density and its reputation for write-only code: a long word with poor stack hygiene becomes a juggling act. The discipline that makes it readable is short definitions - Moore's rule of thumb was a word per line, a screen per concept.
Two stacks: the return stack is yours to abuse
Most languages have one stack and treat it as sacred. Forth has two, and lets you reach into the second. The return stack holds the return addresses that thread one word's call into the next - the machinery that makes : average + 2/ ; come back to its caller. Crucially, Forth exposes it: >R moves a cell from the data stack to the return stack, R> moves it back, and R@ copies the top of the return stack without removing it (with double-cell variants 2>R, 2R>, 2R@).
\ Use the return stack as scratch space to get an awkward value out of the
\ way, then bring it back. The rule is strict: anything you >R inside a word
\ MUST be R>'d (or DROPped from it) before the word's ; runs, or you corrupt
\ the return address and crash.
: stash-example ( a b c -- b a c )
>R \ stash c: data( a b ) return( ... c )
swap \ data( b a )
R> ; \ restore c: data( b a c )
This is genuinely dangerous and genuinely powerful. The return stack is where loop indices live (DO ... LOOP keeps its counter there, which I reads via the return stack), and it is how you implement coroutines, custom control flow, and tail-call-like tricks by directly editing where a word will return to. No mainstream systems language gives you this. The closest analogues elsewhere are all read-only or off-limits:
/* C: you can READ the return address (a GCC/Clang builtin), but writing it
is undefined behavior - the call stack is the compiler's invariant. */
void *ret = __builtin_return_address(0);
// Zig: the call stack is likewise managed; the safe escape hatch is
// @returnAddress(), useful for stack traces - not for rewriting control flow.
const ra = @returnAddress();
Forth treats the return stack as just another piece of memory you are trusted with. That trust is the recurring theme: the language gives you the mechanism and assumes you own the policy.
The dictionary: defining words is allocating memory
Here is where Forth's memory model becomes unusual. There is no separate notion of "the program" and "the data." Both live in a single linear region called data space, and the structure that organizes it is the dictionary - a linked list of headers, each pairing a name with the code or data that follows it. When you define a new word, you are literally appending bytes to data space and bumping a pointer.
That pointer has a name: HERE. It returns the address of the next free cell of data space. Allocation in classic Forth is nothing more than moving HERE forward.
\ HERE is the bump pointer. ALLOT moves it by n address units (reserving
\ space). , (comma) appends one cell and bumps HERE by one cell. C, appends
\ one byte. CELLS converts a cell count to address units (a cell is usually
\ 8 bytes on a 64-bit Forth, 4 on a 32-bit one - never assume).
create grades \ make a named word; its data field starts at HERE
90 , 85 , 72 , 100 , \ append four cells of initial data
10 cells allot \ reserve 10 more uninitialized cells after them
\ Executing `grades` now pushes the ADDRESS of that block. It is up to you
\ to remember it holds 14 cells. Nothing records the length or the type.
grades . \ prints the base address as a plain number
create makes a dictionary header whose runtime behavior is "push my data-field address." , and allot then grow that data field by walking HERE forward. This is a monotonic bump allocator built into the language's most basic act. There is no free for data space - you can MARKER a point and roll the dictionary back to it wholesale, but you do not individually deallocate. It is, in spirit, an arena: exactly the pattern Odin, Zig, and Hare reach for deliberately with ArenaAllocator and friends, except in Forth it is not a library choice - it is how the compiler itself allocates.
// Odin: an arena is an opt-in allocator you install; bump a pointer, free in bulk.
arena: virtual.Arena
_ = virtual.arena_init_growing(&arena) // must init before use
context.allocator = virtual.arena_allocator(&arena)
data := make([]int, 14) // bump-allocated from the arena
// ... no per-object free; virtual.arena_destroy(&arena) frees it all at once
\ Forth: the dictionary IS the arena. HERE is the bump pointer, and the
\ "reset" is rolling the dictionary back to a saved point.
marker -work \ define -work; executing it later forgets everything after
create scratch 1000 cells allot
\ ... use scratch ...
-work \ reclaim scratch AND the marker itself in one stroke
variable and constant are just convenience defining words built on this. variable x is essentially create x 1 cells allot; constant bakes a value into the word's behavior. The dictionary is the through-line that unifies code, named data, and the allocator into one mechanism - and it is why a complete classic Forth fit in a few kilobytes.
Raw memory: @ and ! and no one to stop you
Forth's access to memory is as direct as it gets: two words. @ ("fetch") reads a cell from an address; ! ("store") writes a cell to an address. Byte versions are C@ and C!; +! adds to the cell at an address in place. An address is an ordinary number on the stack, so arithmetic on addresses is just arithmetic.
variable counter \ reserve one cell, named counter
0 counter ! \ store 0 into it: ( value addr -- )
counter @ . \ fetch and print: 0
5 counter +! \ add 5 in place
counter @ . \ 5
\ Index into the grades block from earlier. No bounds check exists.
grades 2 cells + @ . \ fetch the 3rd cell (index 2): 72
grades 99 cells + @ . \ reads far past the block. Forth will not warn you.
There are no bounds checks, no null checks, and no types. A cell is an untyped machine word; whether it holds a number, a character, a boolean flag, or an address is entirely in your head. grades 99 cells + @ happily reads whatever bytes lie there. This is the same flat-array-of-bytes model C installed in everyone's head - *p and p[i] - but stripped of even C's thin type decoration:
/* C: still raw, but the compiler at least tracks that p is an int*. */
int *p = grades;
int third = p[2]; /* 72; the type int scales the index for you */
int wild = p[99]; /* also no check - but the type is recorded */
\ Forth: YOU scale the index (2 cells +), and nothing records that the
\ address "is" an int pointer. The address is just a number.
grades 2 cells + @
HolyC sits in between and is worth naming precisely, because Terry A. Davis built it as the native language of TempleOS - a 64-bit, ring-0, single-address-space system with no memory protection by design, intended as a modern Commodore 64 the hobbyist could fully understand. Like Forth, HolyC trusts you completely with raw memory; unlike Forth, it keeps C's static (if weak, unchecked) types:
// HolyC (TempleOS): static fixed-width types (U0,I64,U8,F64,...), manual heap,
// and zero protection - a wild store can corrupt the whole machine. The spirit
// of total trust is the same as Forth's; the type system is not.
I64 *grades = MAlloc(14 * sizeof(I64)); // from this task's heap
grades[2] = 72;
Free(grades); // you free it; Free(NULL) is allowed
Both languages share Forth's bargain - you get the hardware, you accept the consequences - but Forth alone discards the type system entirely, leaving the cell as a bare machine word.
The optional heap: ALLOCATE / FREE, and THROW to check it
Bump allocation and arenas cover an enormous amount of ground, but sometimes you need true dynamic memory with individual frees. The Forth 2012 standard provides this as the optional Memory-Allocation word set - ALLOCATE, FREE, and RESIZE - the C malloc/free/realloc trio, opt-in and GC-free.
The stack effects encode Forth's idiom for errors. ALLOCATE ( u -- a-addr ior ) takes a byte count and returns both an address and an ior (an I/O result code), zero on success. FREE ( a-addr -- ior ) and RESIZE ( a-addr u -- a-addr ior ) likewise return an ior. You are expected to check it - idiomatically by passing it to THROW, which raises an exception (caught by an enclosing CATCH) when the code is nonzero.
\ Request 4096 bytes. ALLOCATE leaves ( addr ior ); THROW turns a nonzero
\ ior into an exception and otherwise drops the 0, leaving just ( addr ).
4096 allocate throw ( buf )
\ ... use buf with @ ! c@ c! ...
buf free throw \ hand it back; THROW catches a bad ior
The discipline is exactly C's, and exactly the manual defer-free pairing the other systems languages also live with - you match every ALLOCATE with a FREE by hand, on every exit path. Forth has no scope-based cleanup at all: no destructors (C++), no defer (Zig, Odin, Hare), no errdefer. The pairing is your responsibility, full stop. Lined up against the family:
// Zig: explicit allocator + defer makes the free automatic at scope exit.
const buf = try allocator.alloc(u8, 4096);
defer allocator.free(buf);
// Hare: manual alloc/free, cleanup parked next to acquisition with defer.
let buf: []u8 = alloc([0u8...], 4096);
defer free(buf);
// C++: RAII hides the free entirely - the destructor runs even on a throw.
auto buf = std::make_unique<char[]>(4096);
\ Forth: no defer, no destructor. The free is a line you must not forget,
\ and CATCH is how you make it run even when the work in between throws.
: with-buffer ( -- )
4096 allocate throw ( buf )
['] do-work catch \ run do-work; CATCH returns its throw-code (0 if ok)
swap free throw \ free regardless, THEN re-raise any caught code
throw ;
That last word - using CATCH to guarantee the FREE runs whether the work succeeds or throws - is Forth reconstructing, by hand, the safety that RAII and defer give the other languages for free. There is no keyword for it; there is a pattern you build from CATCH, THROW, and swap.
Immediate words and metaprogramming: extending the compiler
The last piece is what makes Forth more than a small assembler with stacks: you can extend the compiler itself, and the memory model is part of what you extend. Forth has two states - interpreting (run words now) and compiling (append words to the definition being built). A normal word, when encountered during : compilation, is compiled into the new word. But a word flagged IMMEDIATE runs at compile time instead. That single mechanism is the whole metaprogramming story.
\ IF, ELSE, THEN, DO, LOOP, BEGIN, WHILE - Forth's control flow - are not
\ syntax. They are IMMEDIATE words that, at compile time, emit branch
\ instructions into the word being defined. You can write your own.
: my-constant ( n "name" -- ) \ a defining word: makes new constants
create , \ header + store n into its data field
does> @ ; \ runtime behavior of the children: fetch & push
42 my-constant answer
answer . \ prints 42
CREATE ... DOES> is the jewel. create makes a child word's header and data field at definition time; does> replaces the child's runtime behavior with the code after it. So my-constant is a factory that stamps out new words, each carrying its own data and a shared behavior - a runtime code generator written in three lines. This is how Forth grows new data structures and new allocators as first-class language extensions:
\ Define ARRAY: a defining word that builds fixed-size cell arrays whose
\ children take an index and return the address of that element.
: array ( n "name" -- )
create cells allot \ reserve n cells of data space at definition
does> ( i -- addr ) swap cells + ;
10 array scores \ allocate 10 cells, named scores
7 0 scores ! \ scores[0] = 7
5 scores @ . \ read scores[5]
Here metaprogramming and memory management are the same act: array extends the language with a new kind of declared, bump-allocated object, and the does> part defines how you address into it. For full compile-time control there is POSTPONE (compile a word's compilation behavior, deferring it), [ and ] (drop into and out of interpret state mid-definition), and LITERAL (bake a stack value into the code being compiled). The compiler is open all the way down.
The comparison to the other languages is sharp. C++ templates and constexpr, Zig's comptime, and Odin's compile-time execution are all powerful - but they are separate sublanguages layered onto a fixed grammar. Forth has no fixed grammar to layer onto; the compiler is just more words, and IMMEDIATE lets you add to it.
// Zig: comptime is a genuine compile-time evaluator - but it runs WITHIN a
// fixed language. You cannot add new statement keywords to Zig itself.
fn Array(comptime n: usize) type {
return struct { data: [n]i64 }; // generate a type at compile time
}
// C++: constexpr/templates compute at compile time, also within fixed syntax.
template <std::size_t N> struct Array { long data[N]; };
\ Forth: ARRAY above genuinely added a new "keyword" to the language, and
\ control flow itself (IF/LOOP) is built the same way. The grammar is open.
10 array scores
Chuck Moore's minimalism: less is the whole point
Tie it together and you get a coherent philosophy, not a pile of features. Forth's smallness is not an accident of 1970s hardware; it is Chuck Moore's deliberate aesthetic, pursued ever more radically over fifty years. He moved from software to silicon, designing stack-machine CPUs - Novix, the MuP21, the 144-core GA144 - that execute Forth-like instructions directly, so the two-stack model is the hardware model, not an emulation on top of a register machine. He wrote colorForth, which encodes a word's interpret/compile/comment role in the color of its text and discards almost all punctuation. His standing argument is that most software is bloated because programmers solve problems more general than the one in front of them.
That ethos is why Forth's memory model looks the way it does. No GC, because GC is policy the language imposes on you. No type system, because a cell is a machine word and you know what is in it. A dictionary that is simultaneously the namespace, the program, and a bump allocator, because three mechanisms that can be one mechanism should be one. An optional heap, because not every program needs malloc, and you should not pay for what you do not use. Open Firmware - the boot environment on Sun, Apple, and OLPC machines - is Forth precisely because a bootloader must be tiny, self-contained, and able to poke hardware addresses directly: Forth's whole nature.
Read against the rest of the site, Forth is the limit case. C froze the flat-byte-array machine model into a portable language; Forth had that model first and refused to decorate it. C++ adds destructors to tame manual memory; Forth adds nothing and trusts you. Zig, Hare, and Odin drag the allocator into the open as a named, swappable thing; Forth's allocator is HERE, a single pointer you move yourself, with arenas as the default rather than an upgrade. HolyC shares Forth's total trust and its no-protection world, but keeps C's types and syntax. Where almost every modern language asks "how do we make memory safe?", Forth asks the older question - "how little do we need to give you the machine?" - and answers it with two stacks, a dictionary, and @ and !. The minimalism is not a constraint it works around. It is the language.