retrowin32: async, DLL loading, tracing execution, and Zig

May 22, 2023

This post is part of a series on retrowin32.

One demo I've been attempted to get running under retrowin32, my win32 emulator loads an external DLL to play its music and, as best as I can tell from the disassembly, uses progress counters from playing the music to advance the graphical state. So while I didn't especially want to tackle DLL loading or sound, here I am!

In Windows, DLLs are PE files much like executables so loading one was relatively easy. The main wrinkle is that DLLs have their own DllMain() that must be invoked when the DLL is loaded for initialization purposes. In retrowin32's architecture we previously started the CPU pointed directly at the executable's WinMain() equivalent, so it took me some effort to puzzle through how to layer things to invoke code in the proper order.


Semantically, I wanted to write the equivalent of

for dll in dlls {

but I have no single emulate function like that; there is just an x86 CPU that knows how to emulate basic blocks. In other words, the code I write in the emulator is ordinary synchronous Rust code, while emulated x86 code is invoked a subset at a time in a loop. But I needed to fundamentally solve this problem, because there are plenty of Windows APIs I'm implementing like DispatchMessage() that need to synchronously call back into the executable's code.

Ultimately what I wanted was coroutines and I managed to cobble something together with Rust's async support and the result feels pretty decent. Observe how I define EnumDisplayModes as an async function and it manages to await a call to a callback.

I'm pretty sure I didn't do it quite right but it seems to at least work for my limited purposes. It feels like it might nicely generalize to cases where emulated code wants to synchronously perform some operation that ends up async in the web platform (like reading files) but I haven't explored it too much yet.

Failed self-check

In any case, once I had DLL loading roughly in place, I found this particular library's initialization code performs some sort of self-check and fails with a MessageBox(): "This file has been tampered with and MAY BE INFECTED BY A VIRUS!"

What this actually points to is some bug in the emulator; running the program on a native Windows machine is fine. But how can I isolate the bug? Via a disassembler I can see that the initialization code runs a bunch of loops over memory and at some point things go wrong ... but where?

I had been putting it off, but a friend earlier gave me the idea that I could write a program that traces execution on a native Windows machine. That program can dump state at various points, and then similarly make the emulator can dump state at those same points, and I could then compare traces between the two to find where I diverge. So I wrote that program.

Windows debugger

This is a native Windows program that spawns another program (the debuggee) and uses the Windows debugger API to introspect its behavior.

For extra fun credit I wrote it in Zig, which has a really pleasant cross-compilation story and which generates a tiny executable (23kb so far!). It's been pretty interesting how writing code in an environment where allocating memory is just a little bit less convenient naturally leads you down paths where you don't wantonly allocate. One last highlight I appreciated is that Zig models Sentinel-Terminated Pointers in its type system which was particularly important to get right when shoveling parsed command-line argument slices into calls into the Windows library that wants NUL-terminated strings.

There are downsides too. Zig is basically a statically typed language with a large compile-time component that is effectively dynamically typed. I think this is a really fascinating tradeoff to explore but so far it has some pretty rough edges that I am not sure they will be able to solve, particularly around build-time tooling; much like writing Python, a lot of the errors you get are of the form "I ran a bunch of code and something went wrong" and there's not much better it can do. (This blog post is worth your time.) For similar reasons it also feels relatively easy to create references to un-nameable types (like my stdout variable in the above). There are also plenty of smaller clunky bits (such as incomplete docs) but I appreciate the language is in development and I'm not as concerned about those as I am about the fundamental model.

I hadn't before explored how a debugger works, so let me tell you the interesting part. My program wants to say "when the CPU hits the instruction at address X, stop execution and let me do something". To do this you just overwrite address X with an int3 instruction, which is helpfully just the single byte 0xcc (which also now possibly explains my old friend's choice of domain name). Once the program hits that address it suspends and hands control back to the debugger.

To resume execution, the debugger needs to repair the program by (1) writing the previous code back over the int3, (2) backing up the instruction pointer by one to resume execution starting at address X, and (3) setting a CPU flag so the CPU single-steps. This last part allows the debugger to single-step execute the repaired instruction and then immediately repatch it back to the int3 so that it can intercept the next execution.

Unfortunately, at least so far, my emulator trace matches my native Windows one, so the ultimate fix here will be something I still need to track down.

A bit of progress

As a reward for reading this far, you can see the demo in question render some graphics (click 'run' once it loads). This relies on the above async support for invoking some callbacks (from DirectDraw back into the executable), but with the DLL bits stubbed out because they don't work yet.