Tech Notes: The smallest build system

The smallest build system

January 10, 2026

Industrial programming languages like C++ or Rust tend to have language-specific industrial build systems, designed for scale; Ninja was for projects with tens of thousands of source files.

Meanwhile, at the other extreme, small software projects often have some miscellaneous smaller build-like needs that span language boundaries, such as running a command to generate some source files or rebuilding the docs. At the industrial scale, tools like Bazel are designed to support builds that span toolchains. But in most projects these kinds of tasks often end up in the source tree as a random shell script, Makefile, or "task runner" config.

In my experience those approaches fall short. Some aren't aware of what's already up to date and do unneeded work. Or you start with Makefiles, but realize you want more than the basics and end up trying to write programs in the Makefile $(foreach ...) language. Or you use some customized tool, but now need your users to install another program just to build yours.

So here's today's idea: why not include your own build system in the source language itself?

A toy build system

"Real" build systems tend to express the build as a declarative graph of interdependent steps, which is the principled approach for scaling and parallelization. Zig, which is some nice prior art for writing the build files in the source language, takes this approach. The downside is now you are not writing build steps, you are writing programs that describe graphs of build steps; take a peek at the doit examples to see what that looks like.

What we're trying to do here instead is be more appealing than the other extreme, which is a shell script that maybe has a conditional or two.

What would be some properties of such a thing?

An imperative approach to writing the build steps, in a full programming language.
Avoid redoing work that doesn't need to be done.
Performance isn't critical, but it'd be nice to use all the CPU cores.
Maybe a bit of UI polish.

And all in some code that is simple enough that it doesn't feel like you're using a chainsaw to trim a flower.

Motivation

I'll use some tasks from retrowin32 as motivation just to make the examples more concrete. For complex project-specific reasons retrowin32 parses its own source to generate some win32 DLL files, which means when you modify those sources you need to run the generation step again.

The commands we want to run look like the following:

# for each DLL, e.g. "kernel32", "user32", etc:
$ cargo run -p win32-derive user32   # generates user32.s, the input to next step
$ clang-cl ...many flags here... user32.s /def:user32.def /out:user32.dll

In pseudo-Rust you might rewrite the above as follows:

fn build_dll(name: &str) {
    run_command(&["cargo", "run", "-p", "win32-derive", name]);
    let asm = format!("{name}.s");
    let def = format!("{name}.def");
    let dll = format!("{name}.dll");
    run_command(&["clang-cl", asm, format!("/def:{def}"), format!("/out:{dll}")]);
}

fn build_dlls() {
    for dll in ["kernel32", "user32", ...] {
        build_dll(dll);
    }
}

(In this post I'll use Rust, but the main point is that the whole framework is small enough that for your project you could just as well implement it in your own code.)

So far we've just translated what would be a pretty simple shell script into some uglier Rust, which is pretty much a loss, but we can build from here.

Avoiding work

Add a function that for checking whether some files are up to date:

/// Return true if all output paths in outs are newer than all of the paths in ins.
fn up_to_date(outs: &[&str], ins: &[&str]) -> bool { ... }

We can then only run commands if they are needed:

fn build_dll(name: &str) {
    let inputs_that_generate_asm = ...;
    let asm = format!("{name}.s");
    if !up_to_date(&[asm], inputs_that_generate_asm) {
        run_command(&["cargo", "run", "-p", "win32-derive", name]);
    }

    let def = format!("{name}.def");
    let dll = format!("{name}.dll");
    if !up_to_date(&[dll], &[asm, def]) {
        run_command(&["clang-cl", asm, format!("/def:{def}"), format!("/out:{dll}")]);
    }
}

With my Ninja hat on my first reaction to this is to worry "wait, this might be doing more disk lookups than needed!" But the nice thing about the intention of working at a small scale is that this just doesn't matter much.

Progress

We could sprinkle some print statements to show what's going on. But you'll note the work is kind of hierarchical, matching the control flow: the "build dlls" step runs one step per dll and those steps themselves run two commands. We can pass around a context object that lets us name these.

struct Task {
    desc: String,
}
impl Task {
    /// Make a new subtask name and immediately run the given function with it.
    fn task(&self, desc: &str, f: impl FnOnce(Task)) {
        let desc = format!("{} > {}", desc, self.desc);
        println!("{desc}");
        f(Task { desc });
    }
}

fn build_dll(t: Task, name: &str) {
    t.task("generate source", |t| {
        if !up_to_date(...) {
            run_command(&["cargo", ...]);
        }
    });
    t.task("compile+link", |t| {
        if !up_to_date(...) {
            run_command(&["clang-cl", ...]);
        }
    });
}
fn build_dlls(t: Task) {
    for dll in ["kernel32", "user32", ...] {
        t.task(dll, |t| build_dll(t, dll));
    }
}

Now when we run, we print a nice trace of output progress like:

dlls > advapi32.dll > generate source
dlls > advapi32.dll > compile+link
dlls > comctl32.dll > generate source
dlls > comctl32.dll > compile+link

If you'll allow a bit of terminal trickery, you can replace the println! with something like:

print!("\r\x1b[K{}", msg);
std::io::stdout().flush().unwrap();

which causes each line to overprint the previous one, keeping the output to just showing one line of what is currently being worked on.

Parallelization

The above executes the build steps serially. Conceptually, when we have a loop like:

for dll in ["kernel32", "user32", ...] {
    t.task(dll, |t| build_dll(t, dll));
}

we potentially instead could run each of those task calls in parallel, then wait for them all at the completion of the loop.

At the small scale we're worried about, we might as well do this by just spawning a bunch of threads! Threads aren't free but they are pretty cheap, so as long as we don't have thousands of tasks we don't need to worry about running too many. (If we did care, adding in a semaphore isn't too bad.)

std::thread::scope(|scope| {
    for dll in ["kernel32", "user32", ...] {
        t.spawn(scope, |t| build_dll(t, dll));
    }
});
// std::thread::scope implicitly waits for all spawned tasks

Again, from the production build system perspective, this "wastes" a thread to block on std::thread::scope waiting for all its tasks to finish, but again at a small scale this doesn't cost much.

Invocation

Using the approach of cargo xtask, we can integrate the above into an easy to execute command by putting the code in its own crate and creating a project-local .cargo/config:

[alias]
minibuild = "run -q p minibuild --"

Now, invoking cargo minibuild from the shell will first (using Rust's build system) rebuild this build system, then invoke it. (On a platform like Node you would comparably use the scripts block of package.json.)

(By the way, Make your own make from 2018 had similar goals to this post, and was the motivating post for cargo xtask as well. While I was drafting this post he additionally wrote another post that goes further! Relative to that post I think my best ideas are conditionally executing commands and the hierarchical task status.)

A note about Rust

Readers who know Rust may notice the above fudged language correctness like proper borrows and error handling. For the purposes of this post these details are relatively uninteresting and in a different high-level language things would be different.

In fact, in writing this post I realized that the careful error handling I had written using anyhow::Result everywhere only served to make the code clunkier. For our purposes, panicking on any unhandled error is both simpler code and showing a stack trace is a more useful user experience anyway. (It also integrates nicely with std::thread::scope, which forwards panics.)

Similarly, one way to implement task parallelization is to make t.task() return a Future. I tried implementing this and it worked, but async Rust means all the functions become async, which then leads to lifetime complexity, awaits all over the place, needing to box the closures, and so on. It's definitely possible but the result felt pretty ugly.

Worked code

The full code is here. lib.rs is the build framework, under 150 lines of code. It includes a few features not mentioned in this post, such as an "explain" mode where it prints why it believes a given target is out of date before executing it, and buffering command output so parallel commands don't interleave their output.

main.rs is the retrowin32 project's particular build steps, the sort of thing you might use as a user of it. But this whole idea is that this is not a crate you ought to pull in, but rather some simple code you could write yourself.

Is this a build system, or a glorified shell script? I think the distinction is better thought of as points along a spectrum, starting at "run these commands from the README" to "run this shell script" to the idea from this post to Makefiles to meta-Makefile systems, with the big guns like Bazel at the other extreme.

And I think it's a pretty useful point in that space. In this code you can see some advantages of using a full programming language, including static types, vectors, and path manipulation. Could I have written this as a shell script or Makefile? Surely yes, but also surely I would get something wrong.