<feed xmlns="http://www.w3.org/2005/Atom">
	<title>Tech Notes</title>
	<id>tag:neugierig.org,2010:tech-notes</id>
	<link href="https://neugierig.org/software/blog/"></link>
	<link rel="self" href="https://neugierig.org/software/blog/atom.xml"></link>
	<updated>2026-04-24T00:00:00Z</updated>
	<author>
		<name>Evan Martin</name>
		<email>evan.martin@gmail.com</email>
	</author>
	<entry>
		<id>tag:neugierig.org,2010:tech-notes/2026-04-24/theseus-unpack</id>
		<updated>2026-04-24T00:00:00Z</updated>
		<title>Theseus unpacking</title>
		<link href="https://neugierig.org/software/blog/2026/04/theseus-unpack.html"></link>
		<content type="html">&lt;p&gt;&lt;a href=&#34;theseus.html&#34;&gt;Theseus&lt;/a&gt;, my new Windows binary translator, must see all the code&#xA;it might run ahead of time to translate it. In that post I highlighted how this&#xA;means it doesn&#39;t support programs that have a JIT. Someone emailed to ask about&#xA;the packed executables found in the demoscene, which also unpack code at&#xA;runtime. This absolutely nerd sniped me.&lt;/p&gt;&#xA;&lt;blockquote&gt;&#xA;&lt;p&gt;I &lt;a href=&#34;/software/blog/2025/04/unpacking.html&#34;&gt;wrote before&lt;/a&gt; about packed&#xA;executables, so if you&#39;re not familiar with the term this is useful background&#xA;context.&lt;/p&gt;&#xA;&lt;/blockquote&gt;&#xA;&lt;p&gt;At a high level unpacking is simple. You run the packed program up to the point&#xA;where it finishes decompressing and is about to jump to the original &lt;code&gt;main()&lt;/code&gt;&#xA;function, at which point you grab all the decompressed state and write it out to&#xA;a new unpacked program.&lt;/p&gt;&#xA;&lt;p&gt;I could use Theseus to do the same, just as two passes: run Theseus once to&#xA;translate the packed program, amend it to make it write an executable file to&#xA;disk when ran, run it, and now you have a normal executable to run Theseus a&#xA;second time on. When I implemented unpacking in retrowin32 I added some flags to&#xA;support that &#34;write out an executable&#34; mode. I could do the same here.&lt;/p&gt;&#xA;&lt;p&gt;But the big picture idea I have with Theseus is this framing that the translated&#xA;program is this mutable thing you can reach into and monkey with. Here&#39;s a kind&#xA;of tidier approach to unpacking that uses that.&lt;/p&gt;&#xA;&lt;p&gt;First, I run Theseus on the packed program, and in its output it prints:&lt;/p&gt;&#xA;&lt;pre&gt;&lt;code&gt;WARN  tc/src/traverse.rs:40 omitting 004085dd: block appears zero-filled&#xA;&lt;/code&gt;&lt;/pre&gt;&#xA;&lt;p&gt;This is saying &#34;I saw a jump to this address, but on the other side there is no&#xA;code&#34;. This immediately reveals the address of the original &lt;code&gt;main()&lt;/code&gt; function:&#xA;once unpacked, that&#39;s where the real program starts.&lt;/p&gt;&#xA;&lt;p&gt;Next, I manually implement that function&#xA;&lt;a href=&#34;https://github.com/evmar/theseus/blob/e941128a3b249efc18f93e26d41c677024622fe7/exe/chillin-unpack/src/externs.rs#L5&#34;&gt;within the Theseus output&lt;/a&gt;&#xA;to call my own &lt;code&gt;do_unpack&lt;/code&gt; function. When control reaches there, I know the&#xA;program has now unpacked itself into memory, and I can invoke Theseus itself on&#xA;that memory to have it generate a program from the unpacked code.&lt;/p&gt;&#xA;&lt;p&gt;In other words, I don&#39;t need to write out an intermediate &lt;code&gt;.exe&lt;/code&gt; file and invoke&#xA;Theseus again &amp;mdash; I can modify the generated unpacker program to directly link&#xA;and call back to Theseus itself! This is weird because Theseus-generated&#xA;programs don&#39;t generally need to link the Theseus translator itself. But there&#39;s&#xA;no reason it can&#39;t, the code is right there.&lt;/p&gt;&#xA;&lt;p&gt;The&#xA;&lt;a href=&#34;https://github.com/evmar/theseus/blob/e941128a3b249efc18f93e26d41c677024622fe7/exe/chillin-unpack/src/main.rs&#34;&gt;total implementation&lt;/a&gt;&#xA;ends up extremely simple, because I don&#39;t need to go through generating a proper&#xA;PE file, I just gather the data Theseus needs. (Generic unpackers are thousands&#xA;of lines of code; UPX&#39;s own functionality for unpacking is significantly more&#xA;code than this; even retrowin32&#39;s implementation is twice as long.) And Theseus&#xA;doesn&#39;t grow an unpacker mode, and rather just supports programs with manual&#xA;modifications in general.&lt;/p&gt;&#xA;&lt;p&gt;Unfortunately, I was so consumed by the nerd snipe here that I forgot the other&#xA;reason you typically have to unpack a packed executable: to load it into a&#xA;debugger. For that I would need to generate an exe, oh well. It&#39;s still neat.&lt;/p&gt;</content>
	</entry>
	<entry>
		<id>tag:neugierig.org,2010:tech-notes/2026-04-19/theseus</id>
		<updated>2026-04-19T00:00:00Z</updated>
		<title>Theseus, a static Windows emulator</title>
		<link href="https://neugierig.org/software/blog/2026/04/theseus.html"></link>
		<content type="html">&lt;p&gt;&lt;em&gt;This post is likely the end of my&#xA;&lt;a href=&#34;/software/blog/2023/09/retrowin32.html&#34;&gt;series on retrowin32&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;&#xA;&lt;p&gt;I bring you: &lt;a href=&#34;https://github.com/evmar/theseus&#34;&gt;Theseus&lt;/a&gt;, a new Windows/x86&#xA;emulator that translates programs &lt;em&gt;statically&lt;/em&gt;, solving a bunch of emulation&#xA;problems while surely introducing new ones.&lt;/p&gt;&#xA;&lt;h2&gt;What happened to retrowin32?&lt;/h2&gt;&#xA;&lt;p&gt;I haven&#39;t been working on&#xA;&lt;a href=&#34;https://github.com/evmar/retrowin32&#34;&gt;retrowin32, my win32 emulator&lt;/a&gt;, in part&#xA;due to life stuff and in part because I haven&#39;t been sure where I wanted to go&#xA;with it. And then someone who had contributed to it in the past posted&#xA;&lt;a href=&#34;https://github.com/lqs/retrotick&#34;&gt;retrotick&lt;/a&gt;, their own web-based Windows&#xA;emulator that looks better than my years of work, and commented on HN that it&#xA;took them an hour with Claude.&lt;/p&gt;&#xA;&lt;p&gt;This is not a post about AI, both because there are too many of those already&#xA;and because I&#39;m not yet sure of my own feelings on it. But one small thing I&#xA;have been thinking about is that (1) AI has been slowly but surely climbing the&#xA;junior to senior engineer ladder; and (2) one of the main pieces of being a&#xA;senior engineer is better understanding what you &lt;em&gt;ought&lt;/em&gt; to be building, as&#xA;distinct from how to build it.&lt;/p&gt;&#xA;&lt;p&gt;(Is that just the Innovator&#39;s Dilemma&#39;s concept of &#34;retreating upmarket&#34;,&#xA;applied to my own utility as a human? Not even sure. I am grateful I do this&#xA;work for the journey, to satisfy my own curiosity, because that means I am not&#xA;existentially threatened like a business would be in this situation. As&#xA;&lt;a href=&#34;https://www.youtube.com/watch?v=2rAMOlXu_Eg&#34;&gt;Benny Feldman says&lt;/a&gt;: &#34;I cheat at&#xA;the casino by secretly not having an attachment to material wealth!&#34;)&lt;/p&gt;&#xA;&lt;p&gt;So, Mr. Senior Engineer, what ought we build? What problem are we even solving&#xA;with emulators, and how do our approaches meet that? I came to a kind of&#xA;unorthodox solution that I&#39;d like to tell you about!&lt;/p&gt;&#xA;&lt;h2&gt;Emulators and JITs&lt;/h2&gt;&#xA;&lt;p&gt;The simplest CPU emulator is very similar to an interpreter. An input program,&#xA;after parsing, becomes x86 instructions like:&lt;/p&gt;&#xA;&lt;pre&gt;&lt;code&gt;mov eax, 3&#xA;add eax, 4&#xA;call ...  ; some Windows system API&#xA;&lt;/code&gt;&lt;/pre&gt;&#xA;&lt;p&gt;An interpreting emulator is a big loop that steps through the instructions. It&#xA;looks like:&lt;/p&gt;&#xA;&lt;pre style=&#34;background-color:#fff;-moz-tab-size:2;-o-tab-size:2;tab-size:2;&#34;&gt;&lt;code&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a90d91&#34;&gt;loop&lt;/span&gt; {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   &lt;span style=&#34;color:#a90d91&#34;&gt;let&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;instr&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;next_instruction&lt;/span&gt;();&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   &lt;span style=&#34;color:#a90d91&#34;&gt;match&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;instr&lt;/span&gt; {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#177500&#34;&gt;// e.g. `mov eax, 3`&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#177500&#34;&gt;&lt;/span&gt;      &lt;span style=&#34;color:#000&#34;&gt;Mov&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;=&amp;gt;&lt;/span&gt; { &lt;span style=&#34;color:#000&#34;&gt;set&lt;/span&gt;(&lt;span style=&#34;color:#000&#34;&gt;argument_1&lt;/span&gt;(), &lt;span style=&#34;color:#000&#34;&gt;argument_2&lt;/span&gt;()); }&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#177500&#34;&gt;// e.g. `add eax, 4`&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#177500&#34;&gt;&lt;/span&gt;      &lt;span style=&#34;color:#000&#34;&gt;Add&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;=&amp;gt;&lt;/span&gt; { &lt;span style=&#34;color:#000&#34;&gt;set&lt;/span&gt;(&lt;span style=&#34;color:#000&#34;&gt;argument_1&lt;/span&gt;(), &lt;span style=&#34;color:#000&#34;&gt;argument_1&lt;/span&gt;() &lt;span style=&#34;color:#000&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;argument_2&lt;/span&gt;()); }&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#000&#34;&gt;..&lt;/span&gt;.&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  }&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&#xA;&lt;p&gt;Like an interpreter, this approach is slow.&lt;/p&gt;&#xA;&lt;p&gt;At a high level interpreters are slow because they are doing a bunch of dynamic&#xA;work for each instruction. Imagine emulating a program that runs the same &lt;code&gt;add&lt;/code&gt;&#xA;instruction in a loop; the above emulator loop has all these function calls to&#xA;repeatedly ask &#34;what instruction am I running now?&#34; and inspect the arguments,&#xA;only to eventually do the same &lt;code&gt;add&lt;/code&gt; on each iteration. x86 memory references&#xA;are extra painful because they are&#xA;&lt;a href=&#34;https://blog.yossarian.net/2020/06/13/How-x86_64-addresses-memory&#34;&gt;very flexible&lt;/a&gt;.&lt;/p&gt;&#xA;&lt;p&gt;Further, on x86 the &lt;code&gt;add&lt;/code&gt; instruction not only adds the numbers but also&#xA;computes six derived values, including things like the parity flag: whether the&#xA;result contains an even number of 1 bits(!). A correct emulator needs to either&#xA;compute all of these as well, or perform some sort of side analysis of the code&#xA;to decide how to run it efficiently.&lt;/p&gt;&#xA;&lt;p&gt;There are&#xA;&lt;a href=&#34;http://www.emulators.com/docs/nx25_nostradamus.htm&#34;&gt;various fun techniques to improve emulators&lt;/a&gt;.&#xA;But if you want to go fast what you really need is some combination of analyzing&#xA;the code and generating native machine code from it &amp;mdash; a JIT. JITs are famously&#xA;hard to write! They are effectively optimizing compilers, which means all the&#xA;complexity of optimization and generating machine code, but also where the&#xA;runtime of the compilation itself is in the critical performance path. I liked&#xA;&lt;a href=&#34;https://tratt.net/laurie/blog/2026/retrofitting_jit_compilers_into_c_interpreters.html&#34;&gt;this post&#39;s discussion&#xA;of why JITs are hard&lt;/a&gt;&#xA;which mentions there have been more than 15 attempts at a Python JIT.&lt;/p&gt;&#xA;&lt;h2&gt;Static binary translation&lt;/h2&gt;&#xA;&lt;p&gt;So suppose you want to generate efficient machine code, but you don&#39;t want to&#xA;write a JIT. You know what&#39;s really good at analyzing code and generating&#xA;efficient machine code from it? A compiler!&lt;/p&gt;&#xA;&lt;p&gt;So here&#39;s the main idea. Given code like the above input x86 snippet, we can&#xA;process it into &lt;em&gt;source code&lt;/em&gt; that looks like:&lt;/p&gt;&#xA;&lt;pre style=&#34;background-color:#fff;-moz-tab-size:2;-o-tab-size:2;tab-size:2;&#34;&gt;&lt;code&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#000&#34;&gt;regs&lt;/span&gt;.&lt;span style=&#34;color:#000&#34;&gt;eax&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#1c01ce&#34;&gt;3&lt;/span&gt;;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#000&#34;&gt;regs&lt;/span&gt;.&lt;span style=&#34;color:#000&#34;&gt;eax&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;add&lt;/span&gt;(&lt;span style=&#34;color:#000&#34;&gt;regs&lt;/span&gt;.&lt;span style=&#34;color:#000&#34;&gt;eax&lt;/span&gt;, &lt;span style=&#34;color:#1c01ce&#34;&gt;4&lt;/span&gt;);&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#000&#34;&gt;windows_api&lt;/span&gt;();  &lt;span style=&#34;color:#177500&#34;&gt;// some native implementation of the API that was called&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&#xA;&lt;p&gt;We then feed this code back in to an optimizing compiler to get a program native&#xA;to your current architecture, x86 no longer needed.&lt;/p&gt;&#xA;&lt;p&gt;In other words, instead of handing an &lt;code&gt;.exe&lt;/code&gt; file directly to an emulator that&#xA;might JIT code out, we instead have a sort of compiler that statically&#xA;translates the &lt;code&gt;.exe&lt;/code&gt; (via a second compiler in the middle) directly into a&#xA;&#34;native&#34; executable.&lt;/p&gt;&#xA;&lt;p&gt;(I write native in scare quotes because while the resulting executable is a&#xA;native binary, it is a binary that is carrying around a sort of inner virtual&#xA;machine representing the x86 state, like the &lt;code&gt;regs&lt;/code&gt; struct in the above code.&#xA;More on this in a bit.)&lt;/p&gt;&#xA;&lt;p&gt;I think I came up with this basic idea on my own just by thinking hard about&#xA;what I was trying to achieve, but it turns out this approach is known as static&#xA;binary translation and is well studied. It has some nice properties, and also&#xA;some big problems.&lt;/p&gt;&#xA;&lt;h2&gt;Decompilation&lt;/h2&gt;&#xA;&lt;p&gt;I&#39;ll go into those, but first, a minor detour about how I ended up here.&lt;/p&gt;&#xA;&lt;p&gt;Have you heard of &lt;a href=&#34;https://decomp.wiki/&#34;&gt;decompilation&lt;/a&gt;? These madmen&#xA;(madpeople?) are manually recreating the source code to old video games, one&#xA;function at a time. They take the game binary, extract the machine code of one&#xA;function, then use a &lt;a href=&#34;https://decomp.me/&#34;&gt;fancy UI&lt;/a&gt; (click one of the entries&#xA;under &#34;Recent activity&#34;) to iteratively tinker on reproducing the higher-level&#xA;code that generates the exact same machine code. It&#39;s kind of amazing.&lt;/p&gt;&#xA;&lt;p&gt;(To do this, they need to even run the same original compiler that was used to&#xA;compile the target game. Those compilers are often Windows programs, which means&#xA;implementing the above fancy UI involves running old Windows binaries on their&#xA;Linux servers. This is how I first learned about them &amp;mdash; they need a Windows&#xA;emulator!)&lt;/p&gt;&#xA;&lt;p&gt;Decompilation is not only just a weird and fascinating (and likely tedious?)&#xA;human endeavor. It also highlighted something important for me: I don&#39;t so much&#xA;care about having an emulator that can run any random program, I care about&#xA;running a few very specific programs and I&#39;m willing to go to even some manual&#xA;lengths to help out.&lt;/p&gt;&#xA;&lt;p&gt;In practice, if you look at a person building a Windows emulator, they end up as&#xA;surgeons needing to kind of manually reach in and pump the heart of the target&#xA;program themselves anyway, including debugging the target program&#xA;&lt;a href=&#34;https://corteximplant.com/@aaronsgiles/116044322660755992&#34;&gt;and working around its individual bugs&lt;/a&gt;.&#xA;It&#39;s common for emulators to even manually curate a list of programs that are&#xA;known to work or fail.&lt;/p&gt;&#xA;&lt;h2&gt;An old idea&lt;/h2&gt;&#xA;&lt;p&gt;Statically translating machine code is not a new idea. Why isn&#39;t it more&#xA;popular? My impression in trying to read about it is that it is often dismissed&#xA;because it can&#39;t work, but at least so far it&#39;s worked well. Maybe I haven&#39;t yet&#xA;encountered some impossible problem that I&#39;ve so far overlooked?&lt;/p&gt;&#xA;&lt;p&gt;(When trying to look up related work for this blog post, I saw&#xA;&lt;a href=&#34;https://andrewkelley.me/post/jamulator.html&#34;&gt;this attempt&#xA;at statically translating NES&lt;/a&gt; that&#xA;concluded it can&#39;t be done, but then also&#xA;&lt;a href=&#34;https://1379.tech/nesrecomp-from-faxanadu-to-4-supported-commercial-titles/&#34;&gt;these people seem to be&#xA;succeeding at it&lt;/a&gt;&#xA;so it&#39;s hard to say.)&lt;/p&gt;&#xA;&lt;p&gt;I think there are two main problems, a technical one and a more cultural one.&lt;/p&gt;&#xA;&lt;p&gt;The technical part is that the simple idea has complex details. To start with,&#xA;any program that generates code at runtime (e.g. itself containing a JIT) won&#39;t&#xA;work, but it&#39;s easy for me to just dismiss those programs as out of scope. There&#xA;are also challenges around things like how control flow works, but those are&#xA;small and interesting and I might go into them in future posts.&lt;/p&gt;&#xA;&lt;p&gt;A common topic of research is that it&#39;s in the limit impossible to statically&#xA;find all of the code that might be executed even in a program that doesn&#39;t&#xA;generate code at runtime, because of&#xA;&lt;a href=&#34;https://scholar.google.com/scholar?q=jump+target+identification&#34;&gt;dynamic control flow from&#xA;vtables or jump tables&lt;/a&gt;.&#xA;In particular, while there are techniques to find &lt;em&gt;most&lt;/em&gt; of the code, no&#xA;approach is guaranteed to work perfectly. This is where decompilation changed my&#xA;view: if I&#39;m willing to manually help out a bit on a specific program, then this&#xA;problem might be fine?&lt;/p&gt;&#xA;&lt;p&gt;The main cultural reason I think binary translation isn&#39;t more common is that&#xA;it&#39;s not as convenient as a generic emulator that handles most programs already.&#xA;Users aren&#39;t likely to want to run a compiler toolchain, though I have seen&#xA;projects embed the compiler (e.g. LLVM) directly to avoid this.&lt;/p&gt;&#xA;&lt;p&gt;The other cultural problem is there are legal ramifications if you intend to&#xA;distribute translated programs. Every video game emulator relies on the legal&#xA;fiction of &#34;first, copy the game data from the physical copy you already own and&#xA;pass that in as an input&#34;, so they get to plausibly remain non-derivative works.&lt;/p&gt;&#xA;&lt;p&gt;But I&#39;m not solving for users, I&#39;m solving for my own interest. These cultural&#xA;problems don&#39;t matter to me.&lt;/p&gt;&#xA;&lt;h2&gt;Benefits&lt;/h2&gt;&#xA;&lt;p&gt;Again consider the snippet above, which is adding 3 and 4. In a static&#xA;translator world we parse the instruction stream ahead of time, so the compiler&#xA;gets to see that we want to put a 3 in eax and not (as an interpreter would)&#xA;spend runtime considering what values we are reading and writing where.&lt;/p&gt;&#xA;&lt;p&gt;A compiler will not only generate the correct machine code for the target&#xA;architecture, it even will optimize code like the above to&#xA;&lt;a href=&#34;https://godbolt.org/z/xK84Tv1s3&#34;&gt;just store the resulting value 7&lt;/a&gt;. And a&#xA;compiler is capable of eliminating unneeded code like parity computations if you&#xA;frame things right. Because the Theseus code generation happens &#34;offline&#34;,&#xA;separately from the execution of the program, I can worry less than a JIT might&#xA;to about spending runtime analyzing the code to try to help.&lt;/p&gt;&#xA;&lt;p&gt;When I started this I had thought that performance would be the whole benefit of&#xA;this approach, but it turns out to be easier to develop as well because it&#xA;brings in all of the other developer tools:&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;The translated instructions appear as regular code in the output program,&#xA;which means the native debugger can step translated instructions, which appear&#xA;as regular source code.&lt;/li&gt;&#xA;&lt;li&gt;If the program crashes, the native stack trace traces back in to the&#xA;(translated assembly of the) original program.&lt;/li&gt;&#xA;&lt;li&gt;I haven&#39;t tried it yet, but CPU profiling ought to have the same benefit.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;In retrowin32 I ended up building a whole debugger UI to help track down&#xA;problems, but in Theseus I&#39;ve just used my system debugger so far and it&#39;s been&#xA;fine.&lt;/p&gt;&#xA;&lt;p&gt;In retrowin32 I also spent&#xA;&lt;a href=&#34;/software/blog/2024/09/retrowin32-syscalls.html&#34;&gt;a lot of time fiddling with the bridge between the emulator and&#xA;native code&lt;/a&gt;. This boundary&#xA;still exists in Theseus but it is so much smaller, because the translated code&#xA;can directly call my native win32 system API implementation (with a bit of glue&#xA;code to move data in and out of the inner machine&#39;s representation).&lt;/p&gt;&#xA;&lt;p&gt;On MacOS retrowin32 could run under Rosetta but it meant the entire executable&#xA;needed to be an x86-64 binary, which meant it required a cross-compiled SDL. A&#xA;Theseus binary is native code that just calls the native SDL.&lt;/p&gt;&#xA;&lt;p&gt;All told it is just much simpler. From the start of this idea to getting&#xA;&lt;a href=&#34;https://neugierig.org/software/blog/2022/10/retrowin32.html&#34;&gt;the test program I&#39;ve tinkered with all this&#xA;while&lt;/a&gt; running its&#xA;first scene, including DirectX, FPU, and MMX, only took me a couple weeks.&lt;/p&gt;&#xA;&lt;h2&gt;Partial evaluation&lt;/h2&gt;&#xA;&lt;p&gt;You can think of the different approaches of interpreter to JIT to static binary&#xA;as a spectrum of how much work you do ahead of time versus at runtime. Theseus&#xA;take the dynamic question of &#34;what kind of mov is this&#34; and move it to the ahead&#xA;of time compilation step, partially evaluating the generic instruction handler&#xA;into a specific instruction with nailed-down arguments. (I&#39;ll link again to&#xA;&lt;a href=&#34;https://tratt.net/laurie/blog/2026/retrofitting_jit_compilers_into_c_interpreters.html&#34;&gt;the excellent blog about meta-tracing C code&lt;/a&gt;.&#xA;Read about&#xA;&lt;a href=&#34;http://blog.sigfpe.com/2009/05/three-projections-of-doctor-futamura.html&#34;&gt;Futamura projections&lt;/a&gt;&#xA;for this idea taken to its extreme conclusion!)&lt;/p&gt;&#xA;&lt;p&gt;For another example, a typical Windows emulator must parse and load the PE&#xA;executable on startup, but Theseus does that at compile time and writes out just&#xA;the data structures needed to execute it. The PE-parsing code isn&#39;t needed in&#xA;the output.&lt;/p&gt;&#xA;&lt;p&gt;Similarly, executable startup involves linking and loading any referenced DLLs&#xA;including those from the system, but Theseus must see all the code it will run,&#xA;so it does this linking ahead of time. Here&#39;s some output near a call to a&#xA;Windows API, where at compile time it resolved an IAT reference (the &lt;code&gt;ds:[...]&lt;/code&gt;&#xA;address) directly to the Rust implementation I wrote:&lt;/p&gt;&#xA;&lt;pre style=&#34;background-color:#fff;-moz-tab-size:2;-o-tab-size:2;tab-size:2;&#34;&gt;&lt;code&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#177500&#34;&gt;// 004012a0 push 4070A4h&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#177500&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#000&#34;&gt;push&lt;/span&gt;(&lt;span style=&#34;color:#000&#34;&gt;ctx&lt;/span&gt;, &lt;span style=&#34;color:#1c01ce&#34;&gt;0x4070a4&lt;/span&gt;&lt;span style=&#34;color:#a90d91&#34;&gt;u32&lt;/span&gt;);&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#177500&#34;&gt;// 004012a5 push 8&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#177500&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#000&#34;&gt;push&lt;/span&gt;(&lt;span style=&#34;color:#000&#34;&gt;ctx&lt;/span&gt;, &lt;span style=&#34;color:#1c01ce&#34;&gt;0x8&lt;/span&gt;&lt;span style=&#34;color:#a90d91&#34;&gt;u32&lt;/span&gt;);&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#177500&#34;&gt;// 004012a7 call dword ptr ds:[4060E8h]&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#177500&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#000&#34;&gt;call&lt;/span&gt;(&lt;span style=&#34;color:#000&#34;&gt;ctx&lt;/span&gt;, &lt;span style=&#34;color:#1c01ce&#34;&gt;0x4012ad&lt;/span&gt;, &lt;span style=&#34;color:#000&#34;&gt;Cont&lt;/span&gt;(&lt;span style=&#34;color:#000&#34;&gt;user32&lt;/span&gt;::&lt;span style=&#34;color:#000&#34;&gt;CreateWindowExA_stdcall&lt;/span&gt;))&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&#xA;&lt;p&gt;In some sense it&#39;s as if Theseus at compile time is partially running the system&#xA;binary loader and the output source code is a snapshot of the ready state. It&#xA;reminds me a bit of the problem of&#xA;&lt;a href=&#34;https://neugierig.org/software/blog/2025/04/unpacking.html&#34;&gt;unpacking executables&lt;/a&gt;.&lt;/p&gt;&#xA;&lt;h2&gt;WebAssembly&lt;/h2&gt;&#xA;&lt;p&gt;Theseus should easily extend to running on the web under WebAssembly; most of it&#xA;is just compiling the generated program with wasm as the target architecture. (I&#xA;initially had this working then decided I don&#39;t need the additional complexity&#xA;for now, so it isn&#39;t implemented.)&lt;/p&gt;&#xA;&lt;p&gt;Separately, the output program from Theseus is inspired by how WebAssembly is&#xA;executed. In both there is an outer host program that carries within it a&#xA;&#34;machine&#34; with its own idea of code and memory. The code within that machine can&#xA;only read/write to its own memory and must call provided hooks to bridge out to&#xA;the host. Like WebAssembly, the Theseus output executable code is isolated from&#xA;the data, with the nice property that no amount of unintentional/malicious&#xA;memory writes can create new code.&lt;/p&gt;&#xA;&lt;p&gt;A wasm Theseus would be a turducken of machines:&lt;/p&gt;&#xA;&lt;ol&gt;&#xA;&lt;li&gt;the native host machine&#39;s WebAssembly implementation (e.g. the Chrome&#xA;runtime), with its notion of memory, runs a&lt;/li&gt;&#xA;&lt;li&gt;WebAssembly virtual machine with the Theseus wasm blob, with its own idea&#xA;about memory (e.g. where my Rust implementation of the Windows API puts&#xA;allocations), and within that there is&lt;/li&gt;&#xA;&lt;li&gt;the x86 virtual machine and Windows program&#39;s notion of memory (which e.g.&#xA;might say &#34;read from the static data table at memory offset $x&#34;).&lt;/li&gt;&#xA;&lt;/ol&gt;&#xA;&lt;p&gt;In thinking about it, it&#39;s tempting to try to blend some layers of machines&#xA;here, and make the WebAssembly program&#39;s memory 1:1 with the input Windows&#xA;program&#39;s idea of memory. That is, if the input program writes to some address&#xA;$x, you could translate that to exactly writing to WebAssembly memory address&#xA;$x. (You&#39;d need to adjust the middle layer to hide its data structures in places&#xA;the x86 program doesn&#39;t use.) I had to do something like this to make&#xA;&lt;a href=&#34;/software/blog/2023/08/x86-x64-aarch64.html&#34;&gt;retrowin32 work under an x86 emulator&lt;/a&gt;.&#xA;WebAssembly even would let me&#xA;&lt;a href=&#34;https://webassembly.github.io/spec/core/syntax/modules.html#syntax-data&#34;&gt;lay out the memory directly from the&#xA;binary&lt;/a&gt;.&#xA;I don&#39;t think this really buys you much, it would just be kind of cute.&lt;/p&gt;&#xA;&lt;p&gt;On the topic of WebAssembly and static binary translation, check out&#xA;&lt;a href=&#34;https://wingolog.org/archives/2025/10/30/wastrel-a-profligate-implementation-of-webassembly&#34;&gt;wastrel&lt;/a&gt;&#xA;which is static binary translation applied to the problem of executing&#xA;WebAssembly. Reading about it surely gave me the seeds of this idea.&lt;/p&gt;&#xA;&lt;h2&gt;Theseus&lt;/h2&gt;&#xA;&lt;p&gt;I named this project Theseus, as in&#xA;&lt;a href=&#34;https://en.wikipedia.org/wiki/Ship_of_Theseus&#34;&gt;the ship&lt;/a&gt;.&lt;/p&gt;&#xA;&lt;p&gt;Consider again the x86 assembly at the top of the post. What does it do?&#xA;Depending on how you look at it, one correct answer is &#34;adds three and four&#34; or&#xA;even just &#34;computes 7&#34;. Or you could say it puts 3 in the eax register, adds 4&#xA;to the eax register, consumes some CPU clocks, and sets various CPU flags.&lt;/p&gt;&#xA;&lt;p&gt;If I or my compiler replaces one of these interpretations with another, is it&#xA;still the same program? Depending on which context you care about &amp;mdash; my&#xA;impression is that emulating systems like the NES requires getting the clocks&#xA;exactly right &amp;mdash; these details either matter or don&#39;t. In the case of Theseus I&#xA;am explicitly throwing away the input program because I have replaced all its&#xA;parts, one by one.&lt;/p&gt;&#xA;&lt;p&gt;I have one farther off idea, again along the lines of the ship of Theseus.&#xA;Implementing the Windows API is an endless stream of working around four decades&#xA;of &lt;a href=&#34;https://www.hyrumslaw.com/&#34;&gt;Hyrum&#39;s Law&lt;/a&gt;. Consider&#xA;&lt;a href=&#34;https://corteximplant.com/@aaronsgiles/116044322660755992&#34;&gt;that random bug workaround again&lt;/a&gt;:&#xA;if you were documenting the API of &lt;code&gt;DirectPlayEnumerateA&lt;/code&gt; would you write that&#xA;it calls the callback, or would it be more correct to say that it calls the&#xA;callback and also restores a preserved stack pointer? If you look at the code of&#xA;a Windows emulator like Wine today it is full of things like this.&lt;/p&gt;&#xA;&lt;p&gt;One idea I&#39;ve been thinking about is that for problems like these, rather than&#xA;making the emulator more complicated, you could take a page from the&#xA;decompilation playbook and provide an easy way to manage replacing parts of the&#xA;program itself.&lt;/p&gt;&#xA;&lt;p&gt;Once you&#39;re willing to replace pieces of a program there are more interesting&#xA;possibilities. If a program has some bit of code that doesn&#39;t perform well,&#xA;instead of making a JIT fancier, you could just manually replace the code with&#xA;your own implementation. (It&#39;s plausible you wouldn&#39;t even need to change&#xA;algorithms, it might be enough to just write the same algorithm in native code&#xA;and let your modern compiler apply its autovectorization logic to it.) With&#xA;enough machinery, you could even replace parts to add features, as one&#xA;contributor to retrowin32&#xA;&lt;a href=&#34;https://github.com/LinusU/retrowin32/tree/deimos-rising/deimos-rising&#34;&gt;investigated here&lt;/a&gt;&#xA;and even&#xA;&lt;a href=&#34;https://github.com/LinusU/rustic-yellow&#34;&gt;implemented for some GameBoy games&lt;/a&gt;.&lt;/p&gt;</content>
	</entry>
	<entry>
		<id>tag:neugierig.org,2010:tech-notes/2026-01-10/smallest-build-system</id>
		<updated>2026-01-10T00:00:00Z</updated>
		<title>The smallest build system</title>
		<link href="https://neugierig.org/software/blog/2026/01/smallest-build-system.html"></link>
		<content type="html">&lt;p&gt;Industrial programming languages like C++ or Rust tend to have language-specific&#xA;industrial build systems, designed for scale; Ninja&#xA;&lt;a href=&#34;https://neugierig.org/software/chromium/notes/2011/02/ninja.html&#34;&gt;was&lt;/a&gt; for&#xA;projects with tens of thousands of source files.&lt;/p&gt;&#xA;&lt;p&gt;Meanwhile, at the other extreme, small software projects often have some&#xA;miscellaneous smaller build-like needs that span language boundaries, such as&#xA;running a command to generate some source files or rebuilding the docs. At the&#xA;industrial scale, tools like Bazel are designed to support builds that span&#xA;toolchains. But in most projects these kinds of tasks often end up in the source&#xA;tree as a random shell script, Makefile, or &#34;task runner&#34; config.&lt;/p&gt;&#xA;&lt;p&gt;In my experience those approaches fall short. Some aren&#39;t aware of what&#39;s&#xA;already up to date and do unneeded work. Or you start with Makefiles, but&#xA;realize you want more than the basics and end up trying to write programs in the&#xA;Makefile &lt;code&gt;$(foreach ...)&lt;/code&gt; language. Or you use some customized tool, but now&#xA;need your users to install another program just to build yours.&lt;/p&gt;&#xA;&lt;p&gt;So here&#39;s today&#39;s idea: why not include your own build system in the source&#xA;language itself?&lt;/p&gt;&#xA;&lt;h2&gt;A toy build system&lt;/h2&gt;&#xA;&lt;p&gt;&#34;Real&#34; build systems tend to express the build as a declarative graph of&#xA;interdependent steps, which is the principled approach for scaling and&#xA;parallelization. Zig, which is some nice prior art for writing the build files&#xA;in the source language,&#xA;&lt;a href=&#34;https://ziglang.org/learn/build-system/&#34;&gt;takes this approach&lt;/a&gt;. The downside is&#xA;now you are not writing build steps, you are writing programs that describe&#xA;graphs of build steps; take a peek at the &lt;a href=&#34;https://pydoit.org/&#34;&gt;doit&lt;/a&gt; examples&#xA;to see what that looks like.&lt;/p&gt;&#xA;&lt;p&gt;What we&#39;re trying to do here instead is be more appealing than the other&#xA;extreme, which is a shell script that maybe has a conditional or two.&lt;/p&gt;&#xA;&lt;p&gt;What would be some properties of such a thing?&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;An imperative approach to writing the build steps, in a full programming&#xA;language.&lt;/li&gt;&#xA;&lt;li&gt;Avoid redoing work that doesn&#39;t need to be done.&lt;/li&gt;&#xA;&lt;li&gt;Performance isn&#39;t critical, but it&#39;d be nice to use all the CPU cores.&lt;/li&gt;&#xA;&lt;li&gt;Maybe a bit of UI polish.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;And all in some code that is simple enough that it doesn&#39;t feel like you&#39;re&#xA;using a chainsaw to trim a flower.&lt;/p&gt;&#xA;&lt;h2&gt;Motivation&lt;/h2&gt;&#xA;&lt;p&gt;I&#39;ll use some tasks from retrowin32 as motivation just to make the examples more&#xA;concrete. For complex project-specific reasons retrowin32 parses its own source&#xA;to generate some win32 DLL files, which means when you modify those sources you&#xA;need to run the generation step again.&lt;/p&gt;&#xA;&lt;p&gt;The commands we want to run look like the following:&lt;/p&gt;&#xA;&lt;pre style=&#34;background-color:#fff;-moz-tab-size:2;-o-tab-size:2;tab-size:2;&#34;&gt;&lt;code&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#177500&#34;&gt;# for each DLL, e.g. &amp;#34;kernel32&amp;#34;, &amp;#34;user32&amp;#34;, etc:&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ cargo run -p win32-derive user32   &lt;span style=&#34;color:#177500&#34;&gt;# generates user32.s, the input to next step&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ clang-cl ...many flags here... user32.s /def:user32.def /out:user32.dll&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&#xA;&lt;p&gt;In pseudo-Rust you might rewrite the above as follows:&lt;/p&gt;&#xA;&lt;pre style=&#34;background-color:#fff;-moz-tab-size:2;-o-tab-size:2;tab-size:2;&#34;&gt;&lt;code&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a90d91&#34;&gt;fn&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;build_dll&lt;/span&gt;(&lt;span style=&#34;color:#000&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#a90d91&#34;&gt;&amp;amp;&lt;/span&gt;&lt;span style=&#34;color:#a90d91&#34;&gt;str&lt;/span&gt;) {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#000&#34;&gt;run_command&lt;/span&gt;(&lt;span style=&#34;color:#000&#34;&gt;&amp;amp;&lt;/span&gt;[&lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;cargo&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;run&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;-p&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;win32-derive&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#000&#34;&gt;name&lt;/span&gt;]);&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a90d91&#34;&gt;let&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;asm&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;format!&lt;/span&gt;(&lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#c41a16&#34;&gt;{name}&lt;/span&gt;&lt;span style=&#34;color:#c41a16&#34;&gt;.s&amp;#34;&lt;/span&gt;);&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a90d91&#34;&gt;let&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;format!&lt;/span&gt;(&lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#c41a16&#34;&gt;{name}&lt;/span&gt;&lt;span style=&#34;color:#c41a16&#34;&gt;.def&amp;#34;&lt;/span&gt;);&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a90d91&#34;&gt;let&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;dll&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;format!&lt;/span&gt;(&lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#c41a16&#34;&gt;{name}&lt;/span&gt;&lt;span style=&#34;color:#c41a16&#34;&gt;.dll&amp;#34;&lt;/span&gt;);&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#000&#34;&gt;run_command&lt;/span&gt;(&lt;span style=&#34;color:#000&#34;&gt;&amp;amp;&lt;/span&gt;[&lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;clang-cl&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#000&#34;&gt;asm&lt;/span&gt;, &lt;span style=&#34;color:#000&#34;&gt;format!&lt;/span&gt;(&lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;/def:&lt;/span&gt;&lt;span style=&#34;color:#c41a16&#34;&gt;{def}&lt;/span&gt;&lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;&lt;/span&gt;), &lt;span style=&#34;color:#000&#34;&gt;format!&lt;/span&gt;(&lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;/out:&lt;/span&gt;&lt;span style=&#34;color:#c41a16&#34;&gt;{dll}&lt;/span&gt;&lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;&lt;/span&gt;)]);&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a90d91&#34;&gt;fn&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;build_dlls&lt;/span&gt;() {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a90d91&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;dll&lt;/span&gt; &lt;span style=&#34;color:#a90d91&#34;&gt;in&lt;/span&gt; [&lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;kernel32&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;user32&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#000&#34;&gt;..&lt;/span&gt;.] {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#000&#34;&gt;build_dll&lt;/span&gt;(&lt;span style=&#34;color:#000&#34;&gt;dll&lt;/span&gt;);&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&#xA;&lt;p&gt;(In this post I&#39;ll use Rust, but the main point is that the whole framework is&#xA;small enough that for your project you could just as well implement it in your&#xA;own code.)&lt;/p&gt;&#xA;&lt;p&gt;So far we&#39;ve just translated what would be a pretty simple shell script into&#xA;some uglier Rust, which is pretty much a loss, but we can build from here.&lt;/p&gt;&#xA;&lt;h2&gt;Avoiding work&lt;/h2&gt;&#xA;&lt;p&gt;Add a function that for checking whether some files are up to date:&lt;/p&gt;&#xA;&lt;pre style=&#34;background-color:#fff;-moz-tab-size:2;-o-tab-size:2;tab-size:2;&#34;&gt;&lt;code&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#c41a16&#34;&gt;/// Return true if all output paths in outs are newer than all of the paths in ins.&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#c41a16&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#a90d91&#34;&gt;fn&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;up_to_date&lt;/span&gt;(&lt;span style=&#34;color:#000&#34;&gt;outs&lt;/span&gt;: &lt;span style=&#34;color:#a90d91&#34;&gt;&amp;amp;&lt;/span&gt;[&lt;span style=&#34;color:#000&#34;&gt;&amp;amp;&lt;/span&gt;&lt;span style=&#34;color:#a90d91&#34;&gt;str&lt;/span&gt;], &lt;span style=&#34;color:#000&#34;&gt;ins&lt;/span&gt;: &lt;span style=&#34;color:#a90d91&#34;&gt;&amp;amp;&lt;/span&gt;[&lt;span style=&#34;color:#000&#34;&gt;&amp;amp;&lt;/span&gt;&lt;span style=&#34;color:#a90d91&#34;&gt;str&lt;/span&gt;]) -&amp;gt; &lt;span style=&#34;color:#a90d91&#34;&gt;bool&lt;/span&gt; { &lt;span style=&#34;color:#000&#34;&gt;..&lt;/span&gt;. }&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&#xA;&lt;p&gt;We can then only run commands if they are needed:&lt;/p&gt;&#xA;&lt;pre style=&#34;background-color:#fff;-moz-tab-size:2;-o-tab-size:2;tab-size:2;&#34;&gt;&lt;code&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a90d91&#34;&gt;fn&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;build_dll&lt;/span&gt;(&lt;span style=&#34;color:#000&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#a90d91&#34;&gt;&amp;amp;&lt;/span&gt;&lt;span style=&#34;color:#a90d91&#34;&gt;str&lt;/span&gt;) {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a90d91&#34;&gt;let&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;inputs_that_generate_asm&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;..&lt;/span&gt;.;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a90d91&#34;&gt;let&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;asm&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;format!&lt;/span&gt;(&lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#c41a16&#34;&gt;{name}&lt;/span&gt;&lt;span style=&#34;color:#c41a16&#34;&gt;.s&amp;#34;&lt;/span&gt;);&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a90d91&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;!&lt;/span&gt;&lt;span style=&#34;color:#000&#34;&gt;up_to_date&lt;/span&gt;(&lt;span style=&#34;color:#000&#34;&gt;&amp;amp;&lt;/span&gt;[&lt;span style=&#34;color:#000&#34;&gt;asm&lt;/span&gt;], &lt;span style=&#34;color:#000&#34;&gt;inputs_that_generate_asm&lt;/span&gt;) {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#000&#34;&gt;run_command&lt;/span&gt;(&lt;span style=&#34;color:#000&#34;&gt;&amp;amp;&lt;/span&gt;[&lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;cargo&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;run&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;-p&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;win32-derive&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#000&#34;&gt;name&lt;/span&gt;]);&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a90d91&#34;&gt;let&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;format!&lt;/span&gt;(&lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#c41a16&#34;&gt;{name}&lt;/span&gt;&lt;span style=&#34;color:#c41a16&#34;&gt;.def&amp;#34;&lt;/span&gt;);&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a90d91&#34;&gt;let&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;dll&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;format!&lt;/span&gt;(&lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#c41a16&#34;&gt;{name}&lt;/span&gt;&lt;span style=&#34;color:#c41a16&#34;&gt;.dll&amp;#34;&lt;/span&gt;);&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a90d91&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;!&lt;/span&gt;&lt;span style=&#34;color:#000&#34;&gt;up_to_date&lt;/span&gt;(&lt;span style=&#34;color:#000&#34;&gt;&amp;amp;&lt;/span&gt;[&lt;span style=&#34;color:#000&#34;&gt;dll&lt;/span&gt;], &lt;span style=&#34;color:#000&#34;&gt;&amp;amp;&lt;/span&gt;[&lt;span style=&#34;color:#000&#34;&gt;asm&lt;/span&gt;, &lt;span style=&#34;color:#000&#34;&gt;def&lt;/span&gt;]) {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#000&#34;&gt;run_command&lt;/span&gt;(&lt;span style=&#34;color:#000&#34;&gt;&amp;amp;&lt;/span&gt;[&lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;clang-cl&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#000&#34;&gt;asm&lt;/span&gt;, &lt;span style=&#34;color:#000&#34;&gt;format!&lt;/span&gt;(&lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;/def:&lt;/span&gt;&lt;span style=&#34;color:#c41a16&#34;&gt;{def}&lt;/span&gt;&lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;&lt;/span&gt;), &lt;span style=&#34;color:#000&#34;&gt;format!&lt;/span&gt;(&lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;/out:&lt;/span&gt;&lt;span style=&#34;color:#c41a16&#34;&gt;{dll}&lt;/span&gt;&lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;&lt;/span&gt;)]);&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&#xA;&lt;p&gt;With my Ninja hat on my first reaction to this is to worry &#34;wait, this might be&#xA;doing more disk lookups than needed!&#34; But the nice thing about the intention of&#xA;working at a small scale is that this just doesn&#39;t matter much.&lt;/p&gt;&#xA;&lt;h2&gt;Progress&lt;/h2&gt;&#xA;&lt;p&gt;We could sprinkle some print statements to show what&#39;s going on. But you&#39;ll note&#xA;the work is kind of hierarchical, matching the control flow: the &#34;build dlls&#34;&#xA;step runs one step per dll and those steps themselves run two commands. We can&#xA;pass around a context object that lets us name these.&lt;/p&gt;&#xA;&lt;pre style=&#34;background-color:#fff;-moz-tab-size:2;-o-tab-size:2;tab-size:2;&#34;&gt;&lt;code&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a90d91&#34;&gt;struct&lt;/span&gt; &lt;span style=&#34;color:#3f6e75&#34;&gt;Task&lt;/span&gt; {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#000&#34;&gt;desc&lt;/span&gt;: &lt;span style=&#34;color:#a90d91&#34;&gt;String&lt;/span&gt;,&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a90d91&#34;&gt;impl&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;Task&lt;/span&gt; {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#c41a16&#34;&gt;/// Make a new subtask name and immediately run the given function with it.&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#c41a16&#34;&gt;&lt;/span&gt;    &lt;span style=&#34;color:#a90d91&#34;&gt;fn&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;task&lt;/span&gt;(&lt;span style=&#34;color:#000&#34;&gt;&amp;amp;&lt;/span&gt;&lt;span style=&#34;color:#5b269a&#34;&gt;self&lt;/span&gt;, &lt;span style=&#34;color:#000&#34;&gt;desc&lt;/span&gt;: &lt;span style=&#34;color:#a90d91&#34;&gt;&amp;amp;&lt;/span&gt;&lt;span style=&#34;color:#a90d91&#34;&gt;str&lt;/span&gt;, &lt;span style=&#34;color:#000&#34;&gt;f&lt;/span&gt;: &lt;span style=&#34;color:#3f6e75&#34;&gt;impl&lt;/span&gt; &lt;span style=&#34;color:#a90d91&#34;&gt;FnOnce&lt;/span&gt;(&lt;span style=&#34;color:#000&#34;&gt;Task&lt;/span&gt;)) {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#a90d91&#34;&gt;let&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;desc&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;format!&lt;/span&gt;(&lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#c41a16&#34;&gt;{}&lt;/span&gt;&lt;span style=&#34;color:#c41a16&#34;&gt; &amp;gt; &lt;/span&gt;&lt;span style=&#34;color:#c41a16&#34;&gt;{}&lt;/span&gt;&lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#000&#34;&gt;desc&lt;/span&gt;, &lt;span style=&#34;color:#5b269a&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#000&#34;&gt;desc&lt;/span&gt;);&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#000&#34;&gt;println!&lt;/span&gt;(&lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#c41a16&#34;&gt;{desc}&lt;/span&gt;&lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;&lt;/span&gt;);&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#000&#34;&gt;f&lt;/span&gt;(&lt;span style=&#34;color:#000&#34;&gt;Task&lt;/span&gt; { &lt;span style=&#34;color:#000&#34;&gt;desc&lt;/span&gt; });&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a90d91&#34;&gt;fn&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;build_dll&lt;/span&gt;(&lt;span style=&#34;color:#000&#34;&gt;t&lt;/span&gt;: &lt;span style=&#34;color:#3f6e75&#34;&gt;Task&lt;/span&gt;, &lt;span style=&#34;color:#000&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#a90d91&#34;&gt;&amp;amp;&lt;/span&gt;&lt;span style=&#34;color:#a90d91&#34;&gt;str&lt;/span&gt;) {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#000&#34;&gt;t&lt;/span&gt;.&lt;span style=&#34;color:#000&#34;&gt;task&lt;/span&gt;(&lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;generate source&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#000&#34;&gt;|&lt;/span&gt;&lt;span style=&#34;color:#000&#34;&gt;t&lt;/span&gt;&lt;span style=&#34;color:#000&#34;&gt;|&lt;/span&gt; {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#a90d91&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;!&lt;/span&gt;&lt;span style=&#34;color:#000&#34;&gt;up_to_date&lt;/span&gt;(&lt;span style=&#34;color:#000&#34;&gt;..&lt;/span&gt;.) {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#000&#34;&gt;run_command&lt;/span&gt;(&lt;span style=&#34;color:#000&#34;&gt;&amp;amp;&lt;/span&gt;[&lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;cargo&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#000&#34;&gt;..&lt;/span&gt;.]);&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        }&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    });&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#000&#34;&gt;t&lt;/span&gt;.&lt;span style=&#34;color:#000&#34;&gt;task&lt;/span&gt;(&lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;compile+link&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#000&#34;&gt;|&lt;/span&gt;&lt;span style=&#34;color:#000&#34;&gt;t&lt;/span&gt;&lt;span style=&#34;color:#000&#34;&gt;|&lt;/span&gt; {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#a90d91&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;!&lt;/span&gt;&lt;span style=&#34;color:#000&#34;&gt;up_to_date&lt;/span&gt;(&lt;span style=&#34;color:#000&#34;&gt;..&lt;/span&gt;.) {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#000&#34;&gt;run_command&lt;/span&gt;(&lt;span style=&#34;color:#000&#34;&gt;&amp;amp;&lt;/span&gt;[&lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;clang-cl&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#000&#34;&gt;..&lt;/span&gt;.]);&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        }&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    });&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a90d91&#34;&gt;fn&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;build_dlls&lt;/span&gt;(&lt;span style=&#34;color:#000&#34;&gt;t&lt;/span&gt;: &lt;span style=&#34;color:#3f6e75&#34;&gt;Task&lt;/span&gt;) {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a90d91&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;dll&lt;/span&gt; &lt;span style=&#34;color:#a90d91&#34;&gt;in&lt;/span&gt; [&lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;kernel32&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;user32&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#000&#34;&gt;..&lt;/span&gt;.] {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#000&#34;&gt;t&lt;/span&gt;.&lt;span style=&#34;color:#000&#34;&gt;task&lt;/span&gt;(&lt;span style=&#34;color:#000&#34;&gt;dll&lt;/span&gt;, &lt;span style=&#34;color:#000&#34;&gt;|&lt;/span&gt;&lt;span style=&#34;color:#000&#34;&gt;t&lt;/span&gt;&lt;span style=&#34;color:#000&#34;&gt;|&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;build_dll&lt;/span&gt;(&lt;span style=&#34;color:#000&#34;&gt;t&lt;/span&gt;, &lt;span style=&#34;color:#000&#34;&gt;dll&lt;/span&gt;));&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&#xA;&lt;p&gt;Now when we run, we print a nice trace of output progress like:&lt;/p&gt;&#xA;&lt;pre&gt;&lt;code&gt;dlls &amp;gt; advapi32.dll &amp;gt; generate source&#xA;dlls &amp;gt; advapi32.dll &amp;gt; compile+link&#xA;dlls &amp;gt; comctl32.dll &amp;gt; generate source&#xA;dlls &amp;gt; comctl32.dll &amp;gt; compile+link&#xA;&lt;/code&gt;&lt;/pre&gt;&#xA;&lt;p&gt;If you&#39;ll allow a bit of terminal trickery, you can replace the &lt;code&gt;println!&lt;/code&gt; with&#xA;something like:&lt;/p&gt;&#xA;&lt;pre style=&#34;background-color:#fff;-moz-tab-size:2;-o-tab-size:2;tab-size:2;&#34;&gt;&lt;code&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#000&#34;&gt;print!&lt;/span&gt;(&lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#c41a16&#34;&gt;\r\x1b&lt;/span&gt;&lt;span style=&#34;color:#c41a16&#34;&gt;[K&lt;/span&gt;&lt;span style=&#34;color:#c41a16&#34;&gt;{}&lt;/span&gt;&lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#000&#34;&gt;msg&lt;/span&gt;);&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#000&#34;&gt;std&lt;/span&gt;::&lt;span style=&#34;color:#000&#34;&gt;io&lt;/span&gt;::&lt;span style=&#34;color:#000&#34;&gt;stdout&lt;/span&gt;().&lt;span style=&#34;color:#000&#34;&gt;flush&lt;/span&gt;().&lt;span style=&#34;color:#000&#34;&gt;unwrap&lt;/span&gt;();&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&#xA;&lt;p&gt;which causes each line to overprint the previous one, keeping the output to just&#xA;showing one line of what is currently being worked on.&lt;/p&gt;&#xA;&lt;h2&gt;Parallelization&lt;/h2&gt;&#xA;&lt;p&gt;The above executes the build steps serially. Conceptually, when we have a loop&#xA;like:&lt;/p&gt;&#xA;&lt;pre style=&#34;background-color:#fff;-moz-tab-size:2;-o-tab-size:2;tab-size:2;&#34;&gt;&lt;code&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a90d91&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;dll&lt;/span&gt; &lt;span style=&#34;color:#a90d91&#34;&gt;in&lt;/span&gt; [&lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;kernel32&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;user32&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#000&#34;&gt;..&lt;/span&gt;.] {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#000&#34;&gt;t&lt;/span&gt;.&lt;span style=&#34;color:#000&#34;&gt;task&lt;/span&gt;(&lt;span style=&#34;color:#000&#34;&gt;dll&lt;/span&gt;, &lt;span style=&#34;color:#000&#34;&gt;|&lt;/span&gt;&lt;span style=&#34;color:#000&#34;&gt;t&lt;/span&gt;&lt;span style=&#34;color:#000&#34;&gt;|&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;build_dll&lt;/span&gt;(&lt;span style=&#34;color:#000&#34;&gt;t&lt;/span&gt;, &lt;span style=&#34;color:#000&#34;&gt;dll&lt;/span&gt;));&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&#xA;&lt;p&gt;we potentially instead could run each of those &lt;code&gt;task&lt;/code&gt; calls in parallel, then&#xA;wait for them all at the completion of the loop.&lt;/p&gt;&#xA;&lt;p&gt;At the small scale we&#39;re worried about, we might as well do this by just&#xA;spawning a bunch of threads! Threads aren&#39;t free but they are pretty cheap, so&#xA;as long as we don&#39;t have thousands of tasks we don&#39;t need to worry about running&#xA;too many. (If we did care, adding in a semaphore isn&#39;t too bad.)&lt;/p&gt;&#xA;&lt;pre style=&#34;background-color:#fff;-moz-tab-size:2;-o-tab-size:2;tab-size:2;&#34;&gt;&lt;code&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#000&#34;&gt;std&lt;/span&gt;::&lt;span style=&#34;color:#000&#34;&gt;thread&lt;/span&gt;::&lt;span style=&#34;color:#000&#34;&gt;scope&lt;/span&gt;(&lt;span style=&#34;color:#000&#34;&gt;|&lt;/span&gt;&lt;span style=&#34;color:#000&#34;&gt;scope&lt;/span&gt;&lt;span style=&#34;color:#000&#34;&gt;|&lt;/span&gt; {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a90d91&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;dll&lt;/span&gt; &lt;span style=&#34;color:#a90d91&#34;&gt;in&lt;/span&gt; [&lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;kernel32&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;user32&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#000&#34;&gt;..&lt;/span&gt;.] {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#000&#34;&gt;t&lt;/span&gt;.&lt;span style=&#34;color:#000&#34;&gt;spawn&lt;/span&gt;(&lt;span style=&#34;color:#000&#34;&gt;scope&lt;/span&gt;, &lt;span style=&#34;color:#000&#34;&gt;|&lt;/span&gt;&lt;span style=&#34;color:#000&#34;&gt;t&lt;/span&gt;&lt;span style=&#34;color:#000&#34;&gt;|&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;build_dll&lt;/span&gt;(&lt;span style=&#34;color:#000&#34;&gt;t&lt;/span&gt;, &lt;span style=&#34;color:#000&#34;&gt;dll&lt;/span&gt;));&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;});&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#177500&#34;&gt;// std::thread::scope implicitly waits for all spawned tasks&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&#xA;&lt;p&gt;Again, from the production build system perspective, this &#34;wastes&#34; a thread to&#xA;block on &lt;code&gt;std::thread::scope&lt;/code&gt; waiting for all its tasks to finish, but again at&#xA;a small scale this doesn&#39;t cost much.&lt;/p&gt;&#xA;&lt;h2&gt;Invocation&lt;/h2&gt;&#xA;&lt;p&gt;Using the approach of &lt;a href=&#34;https://github.com/matklad/cargo-xtask/&#34;&gt;cargo xtask&lt;/a&gt;, we&#xA;can integrate the above into an easy to execute command by putting the code in&#xA;its own crate and creating a project-local &lt;code&gt;.cargo/config&lt;/code&gt;:&lt;/p&gt;&#xA;&lt;pre style=&#34;background-color:#fff;-moz-tab-size:2;-o-tab-size:2;tab-size:2;&#34;&gt;&lt;code&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;[&lt;span style=&#34;color:#000&#34;&gt;alias&lt;/span&gt;]&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#000&#34;&gt;minibuild&lt;/span&gt; = &lt;span style=&#34;color:#c41a16&#34;&gt;&amp;#34;run -q p minibuild --&amp;#34;&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&#xA;&lt;p&gt;Now, invoking &lt;code&gt;cargo minibuild&lt;/code&gt; from the shell will first (using Rust&#39;s build&#xA;system) rebuild this build system, then invoke it. (On a platform like Node you&#xA;would comparably use the &lt;code&gt;scripts&lt;/code&gt; block of &lt;code&gt;package.json&lt;/code&gt;.)&lt;/p&gt;&#xA;&lt;p&gt;(By the way,&#xA;&lt;a href=&#34;https://matklad.github.io/2018/01/03/make-your-own-make.html&#34;&gt;Make your own make&lt;/a&gt;&#xA;from 2018 had similar goals to this post, and was the motivating post for&#xA;&lt;code&gt;cargo xtask&lt;/code&gt; as well. While I was drafting this post he additionally wrote&#xA;&lt;a href=&#34;https://matklad.github.io/2026/01/27/make-ts.html&#34;&gt;another post&lt;/a&gt; that goes&#xA;further! Relative to that post I think my best ideas are conditionally executing&#xA;commands and the hierarchical task status.)&lt;/p&gt;&#xA;&lt;h2&gt;A note about Rust&lt;/h2&gt;&#xA;&lt;p&gt;Readers who know Rust may notice the above fudged language correctness like&#xA;proper borrows and error handling. For the purposes of this post these details&#xA;are relatively uninteresting and in a different high-level language things would&#xA;be different.&lt;/p&gt;&#xA;&lt;p&gt;In fact, in writing this post I realized that the careful error handling I had&#xA;written using &lt;code&gt;anyhow::Result&lt;/code&gt; everywhere only served to make the code clunkier.&#xA;For our purposes, panicking on any unhandled error is both simpler code and&#xA;showing a stack trace is a more useful user experience anyway. (It also&#xA;integrates nicely with &lt;code&gt;std::thread::scope&lt;/code&gt;, which forwards panics.)&lt;/p&gt;&#xA;&lt;p&gt;Similarly, one way to implement task parallelization is to make &lt;code&gt;t.task()&lt;/code&gt;&#xA;return a Future. I tried implementing this and it worked, but async Rust means&#xA;all the functions become async, which then leads to lifetime complexity, awaits&#xA;all over the place, needing to box the closures, and so on. It&#39;s definitely&#xA;possible but the result felt pretty ugly.&lt;/p&gt;&#xA;&lt;h2&gt;Worked code&lt;/h2&gt;&#xA;&lt;p&gt;The full code is&#xA;&lt;a href=&#34;https://github.com/evmar/retrowin32/tree/main/minibuild/src&#34;&gt;here&lt;/a&gt;. &lt;code&gt;lib.rs&lt;/code&gt; is&#xA;the build framework, under 150 lines of code. It includes a few features not&#xA;mentioned in this post, such as an &#34;explain&#34; mode where it prints why it&#xA;believes a given target is out of date before executing it, and buffering&#xA;command output so parallel commands don&#39;t interleave their output.&lt;/p&gt;&#xA;&lt;p&gt;&lt;code&gt;main.rs&lt;/code&gt; is the retrowin32 project&#39;s particular build steps, the sort of thing&#xA;you might use as a user of it. But this whole idea is that this is not a crate&#xA;you ought to pull in, but rather some simple code you could write yourself.&lt;/p&gt;&#xA;&lt;p&gt;Is this a build system, or a glorified shell script? I think the distinction is&#xA;better thought of as points along a spectrum, starting at &#34;run these commands&#xA;from the README&#34; to &#34;run this shell script&#34; to &lt;em&gt;the idea from this post&lt;/em&gt; to&#xA;Makefiles to meta-Makefile systems, with the big guns like Bazel at the other&#xA;extreme.&lt;/p&gt;&#xA;&lt;p&gt;And I think it&#39;s a pretty useful point in that space. In this code you can see&#xA;some advantages of using a full programming language, including static types,&#xA;vectors, and path manipulation. Could I have written this as a shell script or&#xA;Makefile? Surely yes, but also surely I would get something wrong.&lt;/p&gt;</content>
	</entry>
</feed>