retrowin32, a win32 emulator

October 13, 2022

My first post-Figma hobby project is a win32 emulator I've called retrowin32. It is now barely capable of executing a few unmodified Windows exe files in a browser (see the site for some links).

I think I started thinking about this idea when I read the remark "win32 is the stable Linux userland ABI". That is, Linux churns so much that a given program written today won't work in a year; meanwhile, programs written in the Windows 95 32-bit era are guaranteed to never change again. You can kinda look at win32 (that is, 32-bit Windows) today as similar to a NES — an old platform that you can emulate by "just" emulating all the chips.

There are other projects to run old Windows programs. WoW64 is the name of the system within 64-bit Windows that makes old 32-bit Windows programs run. Wine shims the Windows API onto your host system — see the great How Wine works for a deep dive on what that means. And system emulator projects like qemu emulate a full x86 machine such that you can install Windows onto them. But Wow64 requires running 64-bit Windows, Wine requires x86 hardware, and qemu requires installing the full Windows OS into the emulator to run a Windows program.

In contrast, my toy emulates an x86 and enough of the Windows API to take a plain exe file and run it directly in my browser. It's definitely still at the "toy" phase of things but I've only been tinkering on it a month and it's been fun — exactly the kind of "not very useful but brain-tickling" kind of work I would love to be able to retire and spend my days on.

How it works

There's two major pieces to emulating a Windows program, the x86 part and the Windows part.

Fundamentally a Windows program includes x86 instructions, so you need to emulate an x86. I made the dumbest slowest thing that iterates the instruction stream and runs the operations as it sees them. In contrast, qemu has a neat technique for an architecture-independent(!) JIT for x86 code. (I really recommend the paper, it's an easy read and pretty rad approach — pretty much what you'd expect from a genius like Bellard...). I think the CheerpX people have done a similar thing targeting Wasm, though I can't find much about it. I'm interested in looking into this area in general but I haven't yet.

The other piece to running a Windows program is all the things that make Windows distinct from other x86 operating systems. A Windows exe file encodes a bunch of information about how the file is to be loaded into memory and in particular which calls it makes into the massive Windows API. retrowin32 loads such a file into its simulated x86 memory and provides implementations of the API (more on that in a sec).

File formats

It's kind of amazing to look at the archaeology of how Windows works because it's the accumulation of cruft over literally decades. For example every exe file first begins with a DOS program that prints "This program cannot be run in DOS mode" followed by more headers that are then parsed by Windows. All over the various formats there are places where things feel like they were retrofitted in later — for example in the BMP format if the encoded image height is negative that means the encoded pixel rows are top-down (vs BMP's default bottom-up). Resources (static structured data like menus and icons) are modeled using a generic nested directory structure of blocks pointing to child blocks, only to only ever be used to have exactly three levels of nesting. And so on.

It's easy to say from today's world of JSON and protobufs that it feels much more obvious that a file format benefits from having a unified structure that has evolution built in. A lot of parsing PE ends up with one-offy like "if the high bit is set that means this is an integer, otherwise it's an offset relative to some other field that points at a string" kinds of encoding. In contrast, for example Figma files are (mostly) in the kiwi format (basically the same idea as protobuf) so the parser for them is a simple codegen problem.

Hooking the Windows API

At a high level, DLLs work like this. The program's code at various points will say "call the function at memory address X" for some specific X. Then there's a table available at load time that says "on startup, put the address of user32.dll's LoadIconA function at address X". What retrowin32 does instead is poke a special otherwise-inaccessible address at those locations. Then if the CPU ever attempts to jump to one of these addresses, it instead calls out to my custom implementations of these functions. (You can click 'imports' in the UI to see these.)

It's a curious echo of how Wasm vs the host system works. My functions are passed parameters such as addresses, but those addresses refer to data within the emulator's memory. And similarly to return data they must poke that data back into the emulator's memory.

There's a further layer of indirection because the ultimate target of many of these calls the TypeScript running the web page. So for example a call to WriteFile() that passes the stdout handle will first jump to my WriteFile() implementation, which then decodes the arguments and forwards them onward (via a Wasm bridge) to a TypeScript interface.

COM and DirectDraw

One of the finickiest pieces of this so far has been COM, which is (in part) roughly the Windows mechanism for doing dynamic probing of API. In particular DirectDraw (the "fast" graphics API) uses it.

Let me take a deep breath and try to describe the DirectDrawCreate function:

And those functions themselves may return further DirectDraw objects! DirectDraw surfaces themselves get plumbed up to eventually map onto HTML <canvas> elements. In all there are a lot of pointers, as well as multiple different memory spaces, to keep straight.

Old goop

In getting the basic DirectDraw demo (see link on the project site) going I was surprised by how much API I had to implement. It turns out that between the kernel loading the executable and C reaching the main() function, there's a ton of stuff getting computed — all this parsing of the command line and the environment and so on. It's interesting how even in a low-level language like C complexity has accumulated.

(If you load the demo, click "imports" in the lower left to see a dump of all the Windows functions this program uses. All that to show a spinny car.)


To get everything working (to the weak extent it does) I definitely spent some quality time with a debugger on the Windows side of things. In particular it's been fun learning Ghidra and mapping out my best guesses of what executables are attempting.

Special shout out here to Dean, one of the gnarliest hackers I know, for helping me talk through some of this.

Meanwhile, I also needed to debug what was going on within the emulator, so there's a web UI on top of the whole thing that lets me single step and examine the state. (I know the UI isn't great, sorry!) So on top of the emulator I guess I've built the beginnings of an x86/win32 debugger.

I think having a web frontend to it ends up a pretty powerful approach, though maybe this is biased by my experience in web things. For one small example, if you click the memory tab in the UI and then click an address in the instruction stream it will jump the view to that address, and there are further effects on mouse hover. Implementing that kind of UI is trivial with tools like React.

On the other hand, I'm also pretty deep in a stack of technology, between x86 and Rust and Wasm and TypeScript, so it's possible a more straightforward simple thing would've been easier to put together.

(The way the code is structured it ought to be relatively easy to write a native frontend for the emulator, one that doesn't involve any web stuff at all. There's a "Host" abstraction that is implemented by a Wasm bridge but it could just as well have been SDL.)

Where I'm headed

The thing that started me down this path was looking at one of my favorite demos and being sad that I basically can't run this anymore. I tinkered a lot with trying to get it to run under qemu running an old Windows but it would just close on startup, maybe something about DirectDraw init failing? I ended up debugging it on a native Windows computer, but that computer is getting pretty old.

It feels like there was kind of a sweet spot before GPUs took over everything where there's software like demos and games that likely don't require a lot of Windows API and are possibly old enough to be possible to emulate on one of these fast new Macs. It remains to be seen whether I am ever able to actually successfully execute chillin.exe — in particular I haven't thought much yet about sound and also it possibly uses threads, eek, and maybe it will still be too slow — but in all I feel like even if I fail, the general idea here is one whose time has come.