retrowin32: Minesweeper and the four month bug
This post is part of a series on retrowin32.
retrowin32 now runs enough of Minesweeper to let you sort of play it in your browser:
It is likely to crash if you explore it too far — for example, if you win it attempts to bring up a "you win" dialog that triggers an unimplemented codepath — but still! It kind of works!
Getting this working involved fleshing out a lot more of the Windows API. The demoscene executables I had been focusing on up to this point mostly brought up a window and sent pixels to it, while Minesweeper's startup pokes at the registry, ini files, and in particular has a bunch of drawing code. If you click "view in debugger" in the above UI and then "imports" you can see a list of all of the various Windows API that this pulls in and which I have (partially) implemented.
For example, the red numbers in the UI come from bottom-up bitmaps that are stored as 4 bits per pixel and which are 13 pixels wide, so each row uses 6.5 bytes. I briefly went down a rabbithole of reasoning about generically decoding these before I realized the BMP format uses padding.
Getting Minesweeper to render definitely makes this project feel more "real" and I think is a cool demo, but I also am not really sure I ultimately want to reimplement a bunch of old Windows APIs. For example, I looked a bit into SkiFree. It has 1bpp bitmaps and various raster ops and I am just not too sure it is interesting.
Introspecting, I think I looked at Minesweeper because I was curious to see how hard it would be, but also because I was just avoiding The Big Scary Bug.
The Big Scary Bug
Last November I posted about an emulator CPU bug: to resummarize, a demo worked when using Apple's CPU emulator but not mine, but it manifested as just the demo doing the wrong thing and not as any smoking gun crash.
Here are some of the approaches I have tried to isolate this bug over the last few months:
- Tracing the executable on native Windows and my emulator and comparing execution traces; failed because native execution is too different from my emulator
- Integrating a 3rd CPU emulator (Unicorn) and comparing execution traces; failed because I couldn't get Unicorn to reliably report CPU state, possibly due to either bugs in it or how I interacted with it
- Figuring out enough of the LLDB API to attempt to get it to dump an execution trace of running under Rosetta; failed because I got the traces closer but they still just diverged at some point, and also the LLDB API is very frustrating — how can I print an 80-bit x86 float, still not even sure!
In any case I put it all on the back burner while I fiddled with Minesweeper for a bit.
Then today I was looking over my notes on different demos I had tried in the emulator and for one my note on why it didn't work was "uses lots of windows apis, CreateDIBSection bitmap flags, SetTimer, etc". And I thought, huh, I recently have implemented that kind of thing, I should try it again...
...and it gets much farther along, before of course encountering some other
problems, including that it somehow is underflowing the FPU stack. As I glanced
through some of the FPU code around its stack handling, I randomly noticed that
I had typo-misimplemented the fild
instruction. It is supposed to take a
64-bit integer from memory and put it on the FPU stack (converting it to a
float), but I had made it take a 64-bit float from memory and put it on the
stack.
This did not fix this new demo, but apparently it did fix the four month long bug. Argh.
Test suite
I think this is the third dumb typo bug I've discovered in my FPU implementation, where all three would have been caught if I had had even the most trivial of tests (e.g. to test multiplication, "does 2 * 3 produce 6?"). I noticed if I pasted some of the relevant code into an LLM it was able to spot one, but also when I pasted the whole file into the same LLM it couldn't find those bugs nor the new one. Someday soon though, maybe?
I have been circling around writing an x86 CPU test suite because I haven't been sure exactly which way in which I want to test my implementation. There's of course the "does it work at all" tests which would have caught these bugs, but there are also lots of cases that would only be exercised by particular inputs, like whether the overflow flag gets set based on particular combinations of inputs. I have a basic approach at this that I have written in C but updating it is pretty annoying; I even as far as building out GitHub CI goop to compile this C via the MSVC toolchain. My most recent retrowin32 update, about cross-compiling Rust, was prompted exactly by the thought of writing this test suite in Rust.
One recent idea I had for those is that I could just exhaustively run all combinations of inputs for the implementations of operations that involve 8-bit integers — there are only 2**16 of them — and compare these against a native CPU implementation. It has the downside that wouldn't flush out bugs in the 16 or 32-bit implementations, but for many of these I have written them as generics over the operand size (example) so maybe it would be enough.
Wine on Mac on Rosetta
By the way, if you actually wanted to run Minesweeper on your Mac, recent
releases on Wine do
the 32-bit x86 on 64-bit Rosetta thing
— after all, I learned about it from Wine — and Wine is a much more competent
implementation than mine. At this point you can just brew install wine-stable
and then wine some.exe
and it will work, even on Apple silicon.
With that in hand, what is even the point of my own project? Of course, the
ultimate goal is just my own interest, but does Wine subsume its functionality?
It turns out the
one demo that got me started on this whole thing
still crashes under Rosetta due to an illegal instruction. Just guessing, but it
appears Rosetta doesn't implement the "nested pointers" variants of the
enter
instruction that chillin
uses. So at least until they fix that, I still have a (totally made-up) purpose.