Concurrency & Threads

The same job across all seven systems languages: spawn a thread (or two), have it compute a sum, then join and read the result. With no garbage collector and a shared address space, the real question is how data crosses the thread boundary - almost always as an explicit pointer into memory that outlives the spawn, freed only after join. Compare C pthreads, C++ std::thread (RAII + join-or-terminate), Zig std.Thread (no hidden allocation), Odin core:thread, and Hare's bare-bones clone(2) (its stdlib still ships no thread module), then note that HolyC is cooperatively multitasked (Spawn/Yield, no preemption) and Forth has no threads at all without an extension.

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

/* The thread function takes/returns void*; to hand data in or results
 * out you pass POINTERS, so the heap (or stack of a still-live frame)
 * is the shared channel. Here each worker owns one heap-allocated long. */
typedef struct { long n, sum; } Work;

static void *worker(void *arg) {
    Work *w = arg;                    /* the void* we passed to pthread_create */
    w->sum = 0;
    for (long i = 1; i <= w->n; i++)  /* sum 1..n */
        w->sum += i;
    return NULL;                      /* could also return a heap pointer */
}

int main(void) {
    pthread_t t1, t2;
    Work *a = malloc(sizeof *a);      /* heap so both threads see live storage */
    Work *b = malloc(sizeof *b);
    if (!a || !b) { free(a); free(b); return 1; }
    a->n = 1000; b->n = 2000;

    pthread_create(&t1, NULL, worker, a);  /* spawn; 'a' is shared with t1 */
    pthread_create(&t2, NULL, worker, b);

    pthread_join(t1, NULL);          /* block until t1 finishes... */
    pthread_join(t2, NULL);          /* ...then t2; now results are visible */

    printf("%ld %ld\n", a->sum, b->sum);  /* 500500 2001000 */
    free(a);                          /* main owns the buffers; free after join */
    free(b);
    return 0;
}

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

/* The thread function takes/returns void*; to hand data in or results
 * out you pass POINTERS, so the heap (or stack of a still-live frame)
 * is the shared channel. Here each worker owns one heap-allocated long. */
typedef struct { long n, sum; } Work;

static void *worker(void *arg) {
    Work *w = arg;                    /* the void* we passed to pthread_create */
    w->sum = 0;
    for (long i = 1; i <= w->n; i++)  /* sum 1..n */
        w->sum += i;
    return NULL;                      /* could also return a heap pointer */
}

int main(void) {
    pthread_t t1, t2;
    Work *a = malloc(sizeof *a);      /* heap so both threads see live storage */
    Work *b = malloc(sizeof *b);
    if (!a || !b) { free(a); free(b); return 1; }
    a->n = 1000; b->n = 2000;

    pthread_create(&t1, NULL, worker, a);  /* spawn; 'a' is shared with t1 */
    pthread_create(&t2, NULL, worker, b);

    pthread_join(t1, NULL);          /* block until t1 finishes... */
    pthread_join(t2, NULL);          /* ...then t2; now results are visible */

    printf("%ld %ld\n", a->sum, b->sum);  /* 500500 2001000 */
    free(a);                          /* main owns the buffers; free after join */
    free(b);
    return 0;
}

POSIX threads share the process address space, so data crosses the boundary through void* pointers: here each worker gets a heap Work* that stays alive across the spawn. pthread_join blocks until the thread exits and establishes the happens-before edge that makes the writes visible, so main only frees the buffers after joining (link with -pthread).

C++

#include <iostream>
#include <thread>
#include <vector>
#include <memory>
#include <numeric>

// std::thread runs any callable; arguments are COPIED into the thread
// unless you opt into std::ref. We capture a unique_ptr's payload by
// reference and write the result through it -- ownership stays in main.
long sum_to(long n) {                  // pure compute, returned by value
    long s = 0;
    for (long i = 1; i <= n; ++i) s += i;
    return s;
}

int main() {
    auto r1 = std::make_unique<long>(0);   // results live on the heap, owned by main
    auto r2 = std::make_unique<long>(0);

    // Spawn two threads; the lambda captures the raw target by reference.
    std::thread t1([&, p = r1.get()] { *p = sum_to(1000); });
    std::thread t2([&, p = r2.get()] { *p = sum_to(2000); });

    t1.join();                          // must join (or detach) before ~thread,
    t2.join();                          // else terminate(); join syncs the writes

    std::cout << *r1 << ' ' << *r2 << '\n';  // 500500 2001000
    return 0;                           // unique_ptrs free the longs (RAII)
}

#include <iostream>
#include <thread>
#include <vector>
#include <memory>
#include <numeric>

// std::thread runs any callable; arguments are COPIED into the thread
// unless you opt into std::ref. We capture a unique_ptr's payload by
// reference and write the result through it -- ownership stays in main.
long sum_to(long n) {                  // pure compute, returned by value
    long s = 0;
    for (long i = 1; i <= n; ++i) s += i;
    return s;
}

int main() {
    auto r1 = std::make_unique<long>(0);   // results live on the heap, owned by main
    auto r2 = std::make_unique<long>(0);

    // Spawn two threads; the lambda captures the raw target by reference.
    std::thread t1([&, p = r1.get()] { *p = sum_to(1000); });
    std::thread t2([&, p = r2.get()] { *p = sum_to(2000); });

    t1.join();                          // must join (or detach) before ~thread,
    t2.join();                          // else terminate(); join syncs the writes

    std::cout << *r1 << ' ' << *r2 << '\n';  // 500500 2001000
    return 0;                           // unique_ptrs free the longs (RAII)
}

std::thread takes any callable and copies its arguments by default, so to share storage you pass a pointer/std::ref; here the lambdas write through raw pointers borrowed from unique_ptrs that main still owns (RAII frees them). A std::thread must be join()ed or detach()ed before its destructor runs or the program calls std::terminate(); join() also synchronizes the worker's writes. (C++20's std::jthread joins automatically in its destructor.)

HolyC

// HolyC/TempleOS is cooperatively multitasked and single-core: you spawn
// a "task" with Spawn(fp, data), and it yields cooperatively. There is no
// preemption and no memory protection, so all tasks share ring-0 memory
// directly -- the data pointer IS the shared channel, no copy is made.

class Work { I64 n, sum; };

U0 Worker(Work *w)            // a task entry point: takes the data pointer
{
  I64 i, s = 0;
  for (i = 1; i <= w->n; i++) // sum 1..n
    s += i;
  w->sum = s;                 // write the result straight into shared memory
}

Work *a = MAlloc(sizeof(Work)); // heap shared with the spawned task
a->n = 1000; a->sum = 0;

CTask *t = Spawn(&Worker, a);   // create + start a task running Worker(a)
while (TaskValidate(t))         // no real join: poll until the task dies
  Yield;                        // cooperatively give it CPU time

Print("%d\n", a->sum);          // 500500
Free(a);

// HolyC/TempleOS is cooperatively multitasked and single-core: you spawn
// a "task" with Spawn(fp, data), and it yields cooperatively. There is no
// preemption and no memory protection, so all tasks share ring-0 memory
// directly -- the data pointer IS the shared channel, no copy is made.

class Work { I64 n, sum; };

U0 Worker(Work *w)            // a task entry point: takes the data pointer
{
  I64 i, s = 0;
  for (i = 1; i <= w->n; i++) // sum 1..n
    s += i;
  w->sum = s;                 // write the result straight into shared memory
}

Work *a = MAlloc(sizeof(Work)); // heap shared with the spawned task
a->n = 1000; a->sum = 0;

CTask *t = Spawn(&Worker, a);   // create + start a task running Worker(a)
while (TaskValidate(t))         // no real join: poll until the task dies
  Yield;                        // cooperatively give it CPU time

Print("%d\n", a->sum);          // 500500
Free(a);

TempleOS has no pthread-style preemptive threads: it is cooperatively scheduled, so you Spawn a task and it must Yield for others to run. With no memory protection every task shares ring-0 memory, so the MAlloc'd Work pointer is passed straight through with no copy and written in place. There is no true join; the idiom is to poll TaskValidate (or use a shared flag) until the task exits, then Free the buffer.

Zig

const std = @import("std");

// std.Thread.spawn takes a config, a function, and a tuple of args that
// are passed by value -- to mutate shared state you pass a pointer. No
// hidden allocation: the thread's stack size comes from SpawnConfig.
fn worker(n: u64, out: *u64) void {
    var s: u64 = 0;
    var i: u64 = 1;
    while (i <= n) : (i += 1) s += i; // sum 1..n
    out.* = s;                        // write result through the pointer
}

pub fn main() !void {
    var r1: u64 = 0;
    var r2: u64 = 0;

    // spawn returns !std.Thread; args is a tuple matching worker's params.
    const t1 = try std.Thread.spawn(.{}, worker, .{ 1000, &r1 });
    const t2 = try std.Thread.spawn(.{}, worker, .{ 2000, &r2 });

    t1.join(); // blocks until the thread returns; no error to handle
    t2.join();

    std.debug.print("{d} {d}\n", .{ r1, r2 }); // 500500 2001000
}

const std = @import("std");

// std.Thread.spawn takes a config, a function, and a tuple of args that
// are passed by value -- to mutate shared state you pass a pointer. No
// hidden allocation: the thread's stack size comes from SpawnConfig.
fn worker(n: u64, out: *u64) void {
    var s: u64 = 0;
    var i: u64 = 1;
    while (i <= n) : (i += 1) s += i; // sum 1..n
    out.* = s;                        // write result through the pointer
}

pub fn main() !void {
    var r1: u64 = 0;
    var r2: u64 = 0;

    // spawn returns !std.Thread; args is a tuple matching worker's params.
    const t1 = try std.Thread.spawn(.{}, worker, .{ 1000, &r1 });
    const t2 = try std.Thread.spawn(.{}, worker, .{ 2000, &r2 });

    t1.join(); // blocks until the thread returns; no error to handle
    t2.join();

    std.debug.print("{d} {d}\n", .{ r1, r2 }); // 500500 2001000
}

std.Thread.spawn(config, func, args_tuple) returns !std.Thread; the args tuple is passed by value, so shared mutable state crosses as an explicit pointer (&r1) into stack storage that outlives the join. There are no hidden allocations -- the worker's stack comes from SpawnConfig -- and join() blocks until the thread returns and makes its writes visible. (std.Thread.Mutex guards data that is touched concurrently.)

Hare

use fmt;

// Hare's standard library has NO threading module yet -- there is no
// pthread-style create/join wrapper. The only primitive is the raw Linux
// clone(2) syscall in sys::linux:
//
//   use sys::linux;
//   // share the address space (CLONE_VM) so memory crosses the boundary;
//   // the child continues inline from the call (fork-style, not an fp entry):
//   let stack = alloc([0u8...], 64 * 1024)!;  // manual child stack you own
//   match (sys::linux::clone(&stack[len(stack)], flags, null, &ctid, 0)) {
//   case let pid: int => /* parent: futex-wait on ctid to "join" */;
//   case void         => /* child: do work, then sys::linux::exit(0) */;
//   case let e: errno => /* handle */;
//   };
//
// That is verbose and Linux-only, so the honest minimal answer is the
// single-task computation -- the shared-memory idiom is still a pointer
// into storage main owns.
type work = struct { n: u64, sum: u64 };

fn worker(w: *work) void = {
	let s = 0u64;
	for (let i = 1u64; i <= w.n; i += 1) {
		s += i;            // sum 1..n
	};
	w.sum = s;                 // write result through the pointer
};

export fn main() void = {
	let a = alloc(work { n = 1000, sum = 0 })!; // heap, owned by main
	defer free(a);                              // reclaimed after use

	worker(a);                                  // (would be the cloned child)
	fmt::printfln("{}", a.sum)!;                // 500500
};

use fmt;

// Hare's standard library has NO threading module yet -- there is no
// pthread-style create/join wrapper. The only primitive is the raw Linux
// clone(2) syscall in sys::linux:
//
//   use sys::linux;
//   // share the address space (CLONE_VM) so memory crosses the boundary;
//   // the child continues inline from the call (fork-style, not an fp entry):
//   let stack = alloc([0u8...], 64 * 1024)!;  // manual child stack you own
//   match (sys::linux::clone(&stack[len(stack)], flags, null, &ctid, 0)) {
//   case let pid: int => /* parent: futex-wait on ctid to "join" */;
//   case void         => /* child: do work, then sys::linux::exit(0) */;
//   case let e: errno => /* handle */;
//   };
//
// That is verbose and Linux-only, so the honest minimal answer is the
// single-task computation -- the shared-memory idiom is still a pointer
// into storage main owns.
type work = struct { n: u64, sum: u64 };

fn worker(w: *work) void = {
	let s = 0u64;
	for (let i = 1u64; i <= w.n; i += 1) {
		s += i;            // sum 1..n
	};
	w.sum = s;                 // write result through the pointer
};

export fn main() void = {
	let a = alloc(work { n = 1000, sum = 0 })!; // heap, owned by main
	defer free(a);                              // reclaimed after use

	worker(a);                                  // (would be the cloned child)
	fmt::printfln("{}", a.sum)!;                // 500500
};

Unlike the others, Hare's standard library currently ships no thread module -- there is no unix::thread, no create/join, not even a mutex. The only mechanism is the raw Linux clone(2) syscall exposed as sys::linux::clone(stack, flags, parent_tid, child_tid, tls), which (like fork) returns the new tid to the parent and void to the child that continues inline; you share state by passing CLONE_VM and "join" by futex-waiting on the tid the kernel clears via CLONE_CHILD_CLEARTID. Because that is verbose and Linux-only, the honest minimal snippet runs the computation as a single task: the cross-boundary idiom is still an alloc'd pointer that main owns and frees with defer free(a) (note alloc(...)! propagates nomem).

Odin

package main

import "core:fmt"
import "core:thread"

// core:thread wraps OS threads. A ^thread.Thread carries user_index/data
// fields, but the idiomatic way to share is a pointer in a struct you own
// (memory routes through the implicit context.allocator).
Work :: struct { n: int, sum: int }

worker :: proc(t: ^thread.Thread) {
	w := cast(^Work)t.data   // the pointer we stashed before starting
	s := 0
	for i in 1..=w.n do s += i  // sum 1..n
	w.sum = s                // write result into shared memory
}

main :: proc() {
	a := new(Work); a.n = 1000  // heap via context.allocator
	b := new(Work); b.n = 2000
	defer free(a)               // main owns the buffers
	defer free(b)

	t1 := thread.create(worker); t1.data = a  // create paused...
	t2 := thread.create(worker); t2.data = b
	defer thread.destroy(t1)
	defer thread.destroy(t2)

	thread.start(t1); thread.start(t2)  // ...then start
	thread.join(t1);  thread.join(t2)   // block until both finish

	fmt.println(a.sum, b.sum)   // 500500 2001000
}

package main

import "core:fmt"
import "core:thread"

// core:thread wraps OS threads. A ^thread.Thread carries user_index/data
// fields, but the idiomatic way to share is a pointer in a struct you own
// (memory routes through the implicit context.allocator).
Work :: struct { n: int, sum: int }

worker :: proc(t: ^thread.Thread) {
	w := cast(^Work)t.data   // the pointer we stashed before starting
	s := 0
	for i in 1..=w.n do s += i  // sum 1..n
	w.sum = s                // write result into shared memory
}

main :: proc() {
	a := new(Work); a.n = 1000  // heap via context.allocator
	b := new(Work); b.n = 2000
	defer free(a)               // main owns the buffers
	defer free(b)

	t1 := thread.create(worker); t1.data = a  // create paused...
	t2 := thread.create(worker); t2.data = b
	defer thread.destroy(t1)
	defer thread.destroy(t2)

	thread.start(t1); thread.start(t2)  // ...then start
	thread.join(t1);  thread.join(t2)   // block until both finish

	fmt.println(a.sum, b.sum)   // 500500 2001000
}

Odin's core:thread creates a ^thread.Thread (initially paused), into which you stash a data pointer to a struct allocated via the implicit context.allocator; start runs it and join blocks until it exits. Sharing is explicit pointers -- new(Work) is owned by main, freed with defer free, and the worker writes through cast(^Work)t.data. thread.destroy reclaims the thread object itself after the join.

Forth

\ Standard ANS/Forth has NO portable threading model -- the language is
\ deliberately tiny and single-task. Concurrency is an environmental
\ extension: cooperative MULTITASKER words (PAUSE/ACTIVATE) on classic
\ systems like Gforth/SwiftForth, or raw OS threads via an FFI. Below is
\ the closest idiom: a cooperative task that yields, sharing one cell.

VARIABLE result            \ a single shared cell (the "channel")

: sum-to ( n -- )          \ sum 1..n into 'result'
  0 swap                   ( 0 n )
  1+ 1 ?DO  I +  LOOP      \ accumulate 1..n
  result ! ;               \ store the total into the shared variable

\ On a multitasker you would: build a task, ACTIVATE it on sum-to,
\ then PAUSE in a loop until it signals done -- cooperative, not
\ preemptive, so the worker must PAUSE to let others run. With no such
\ extension this simply runs inline (one task), which is the honest answer:
1000 sum-to
result @ .                 \ 500500

\ Standard ANS/Forth has NO portable threading model -- the language is
\ deliberately tiny and single-task. Concurrency is an environmental
\ extension: cooperative MULTITASKER words (PAUSE/ACTIVATE) on classic
\ systems like Gforth/SwiftForth, or raw OS threads via an FFI. Below is
\ the closest idiom: a cooperative task that yields, sharing one cell.

VARIABLE result            \ a single shared cell (the "channel")

: sum-to ( n -- )          \ sum 1..n into 'result'
  0 swap                   ( 0 n )
  1+ 1 ?DO  I +  LOOP      \ accumulate 1..n
  result ! ;               \ store the total into the shared variable

\ On a multitasker you would: build a task, ACTIVATE it on sum-to,
\ then PAUSE in a loop until it signals done -- cooperative, not
\ preemptive, so the worker must PAUSE to let others run. With no such
\ extension this simply runs inline (one task), which is the honest answer:
1000 sum-to
result @ .                 \ 500500

Standard Forth has no built-in threads -- it is single-task by design -- so concurrency is always an extension: a cooperative MULTITASKER (PAUSE/ACTIVATE/STOP, as in Gforth or SwiftForth) or OS threads bolted on via an FFI. The closest portable idiom is a shared VARIABLE used as the channel plus a cooperative task that must PAUSE to yield; with no extension present the word simply runs inline as the single task, which is shown here. There is no join -- you spin on a shared done-flag instead.