retrowin32, split into pieces
This post is part of a series on retrowin32.
The Rust compiler compiles code in parallel. But the unit of caching is the crate — a concept larger than a module, which corresponds maybe to a library in the C world or package in the JS world. A typical program is a single crate. This means every time you run the compiler, it compiles all the code from scratch. To improve build performance, you can split a program into multiple crates, under the hope that with each compile you can reuse the crates you didn't modify.
retrowin32 was already arranged as a few crates along some obvious boundaries. The x86 emulator, the win32 implementation, the native and web targets were each separate. But the win32 implementation and the underlying system were necessarily pretty tangled, because (among other things) x86 code calls win32 functions which might need to call back into x86 code.
This meant any change to the win32 implementation recompiled a significant
quantity of code. This post is about how I managed to split things up further,
with one crate per Windows library. retrowin32 now has crates like
builtin-gdi32
and builtin-ddraw
that implement those pieces of Windows, and
they can now compile and cache in parallel (mostly).
The big cycle
Going in, there was a god object Machine
that held both the CPU emulator (e.g.
the state of the registers) as well as the rest of the system (e.g. memory and
kernel state). When the Machine
emulated its way to a win32 function call (as
described in
the syscalls post), it passed
itself to the target, which would allow it to poke at system state and
potentially call back into further emulation.
For example, the Windows CreateWindow
API creates a window and as part of that
process it synchronously "sends" the
WM_CREATE
message, which concretely means within CreateWindow
we invoke the window
procedure and hand control back to the emulated code.
You cannot have cycles between crates, so this cycle meant that we must put
Machine
and all the win32 implementation in one single crate. The fix, like
with most computer science problems, is adding a layer of abstraction.
A new shared crate defines a System
trait, which is the interface expressing
"things from the underlying system that a win32 function implementation might
need to call". This is then passed to win32 APIs and implemented by Machine
,
allowing us to compile win32 functions as separate crates, each depending only
on the definition of System
.
One interesting consequence of this layout is that the win32 implementation no
longer directly depends on any emulator at all, as long as the System
interface exposes some way to invoke user code. You could hypothetically imagine
a retrowin32 that runs on native 32-bit x86, or alternatively one that lets you
port a Windows program that you have source for to a non-x86 platform like
winelib.
System state
I mentioned above that Machine
also holds system state. For example, gdi32
implements the drawing API, which provides functions that vend handles to device
contexts. The new gdi32
library enabled by the System
interface can declare
what state it needs, but we must store that state somewhere.
Further, there are interdependencies between these various Windows libraries.
user32
, which handles windows and messaging, needs to use code from gdi32
to
implement drawing upon windows. But the winmm
crate, which implements audio,
is independent from those.
One obvious way — the way I imagine it might work in real Windows — is for this state to be held in per-library static globals. I came up with a different solution that is a little strange so I thought I would write it down and see if any reader has a name for it or a better way.
To restate the problem, there's a core Machine
type that depends on all the
libraries and which holds all the program state. But we want to be able to build
each library independently, possibly with interdependencies between them,
without them holding a dependency on Machine
itself.
The answer is for the ouroborus-breaking System
trait to expose a
dynamically-typed "get my state by its type" function:
fn state(&self, id: &std::any::TypeId) -> &dyn std::any::Any;
Each library, e.g. gdi32
, can register its state (a gdi32::State
, perhaps)
and fetch it when needed from the system. This way a library like user32
can
call gdi32
and both of them can access their own internal state off of the
shared state object.
It's maybe just a static with extra steps. I'm not sure yet if I like it.
Result
Most of the win32 API is now in separate crates. (The remaining piece is
kernel32
, which is the lowest-level piece and will need some more work to pull
apart.)
Here's a waterfall of the part of the build that involves these separate crates:
Per xkcd this probably won't save me time overall, but at least I don't have to wait as long when I'm repeatedly cycling.
At the bottom you see the final win32
crate that ties everything together.
This one is still too slow (possibly due to kernel32), but it's better than
before!