This post is part of a series on retrowin32.
One demo I've been attempted to get running under retrowin32, my win32 emulator loads an external DLL to play its music and, as best as I can tell from the disassembly, uses progress counters from playing the music to advance the graphical state. So while I didn't especially want to tackle DLL loading or sound, here I am!
In Windows, DLLs are PE files much like executables so loading one was
relatively easy. The main wrinkle is that DLLs have their own DllMain()
that
must be invoked when the DLL is loaded for initialization purposes. In
retrowin32's architecture we previously started the CPU pointed directly at the
executable's WinMain()
equivalent, so it took me some effort to puzzle through
how to layer things to invoke code in the proper order.
Async
Semantically, I wanted to write the equivalent of
for dll in dlls {
emulate(dll.DllMain);
}
emulate(exe.WinMain);
but I have no single emulate
function like that; there is just an x86 CPU that
knows how to emulate basic blocks. In other words, the code I write in the
emulator is ordinary synchronous Rust code, while emulated x86 code is invoked a
subset at a time in a loop. But I needed to fundamentally solve this problem,
because there are plenty of Windows APIs I'm implementing like
DispatchMessage()
that need to synchronously call back into the executable's
code.
Ultimately what I wanted was coroutines and I managed to cobble something
together with Rust's async
support and the result feels pretty decent. Observe
how I define
EnumDisplayModes as an async
function and it
manages to await a call to a callback.
I'm pretty sure I didn't do it quite right but it seems to at least work for my limited purposes. It feels like it might nicely generalize to cases where emulated code wants to synchronously perform some operation that ends up async in the web platform (like reading files) but I haven't explored it too much yet.
Failed self-check
In any case, once I had DLL loading roughly in place, I found this particular
library's initialization code performs some sort of self-check and fails with a
MessageBox()
: "This file has been tampered with and MAY BE INFECTED BY A
VIRUS!"
What this actually points to is some bug in the emulator; running the program on a native Windows machine is fine. But how can I isolate the bug? Via a disassembler I can see that the initialization code runs a bunch of loops over memory and at some point things go wrong ... but where?
I had been putting it off, but a friend earlier gave me the idea that I could write a program that traces execution on a native Windows machine. That program can dump state at various points, and then similarly make the emulator can dump state at those same points, and I could then compare traces between the two to find where I diverge. So I wrote that program.
Windows debugger
This is a native Windows program that spawns another program (the debuggee) and uses the Windows debugger API to introspect its behavior.
For extra fun credit I wrote it in Zig, which has a really pleasant cross-compilation story and which generates a tiny executable (23kb so far!). It's been pretty interesting how writing code in an environment where allocating memory is just a little bit less convenient naturally leads you down paths where you don't wantonly allocate. One last highlight I appreciated is that Zig models Sentinel-Terminated Pointers in its type system which was particularly important to get right when shoveling parsed command-line argument slices into calls into the Windows library that wants NUL-terminated strings.
There are downsides too. Zig is basically a statically typed language with a
large compile-time component that is effectively dynamically typed. I think this
is a really fascinating tradeoff to explore but so far it has some pretty rough
edges that I am not sure they will be able to solve, particularly around
build-time tooling; much like writing Python, a lot of the errors you get are of
the form "I ran a bunch of code and something went wrong" and there's not much
better it can do.
(This blog post is
worth your time.) For similar reasons it also feels relatively easy to create
references to un-nameable types (like my stdout
variable in the above). There
are also plenty of smaller clunky bits (such as incomplete docs) but I
appreciate the language is in development and I'm not as concerned about those
as I am about the fundamental model.
I hadn't before explored how a debugger works, so let me tell you the
interesting part. My program wants to say "when the CPU hits the instruction at
address X, stop execution and let me do something". To do this you just
overwrite address X with an int3
instruction, which is helpfully just the
single byte 0xcc (which also now possibly explains
my old friend's choice of domain name). Once the program
hits that address it suspends and hands control back to the debugger.
To resume execution, the debugger needs to repair the program by (1) writing the
previous code back over the int3
, (2) backing up the instruction pointer by
one to resume execution starting at address X, and (3) setting a CPU flag so
the CPU single-steps. This last part allows the debugger to single-step execute
the repaired instruction and then immediately repatch it back to the int3
so
that it can intercept the next execution.
Unfortunately, at least so far, my emulator trace matches my native Windows one, so the ultimate fix here will be something I still need to track down.
A bit of progress
As a reward for reading this far, you can see the demo in question render some graphics (click 'run' once it loads). This relies on the above async support for invoking some callbacks (from DirectDraw back into the executable), but with the DLL bits stubbed out because they don't work yet.