Tech Notestag:neugierig.org,2010:tech-notes2024-03-16T00:00:00ZEvan Martinevan.martin@gmail.comtag:neugierig.org,2010:tech-notes/2024-03-16/retrowin32-minesweeper-bug2024-03-16T00:00:00Zretrowin32: Minesweeper and the four month bug<p><em>This post is part of a
<a href="/software/blog/2023/09/retrowin32.html">series on retrowin32</a>.</em></p>
<p><a href="https://github.com/evmar/retrowin32">retrowin32</a> now runs enough of Minesweeper
to let you sort of play it in your browser:</p>
<figure>
<img src=minesweeper.png width=192 height=258 alt='screenshot of minesweeper in browser'>
<figcaption>Minesweeper; <a href='https://evmar.github.io/retrowin32/run.html?file=msvcrt.dll&exe=winmine.exe'>try it yourself</a></figcaption>
</figure>
<p>It is likely to crash if you explore it too far — for example, if you win it
attempts to bring up a "you win" dialog that triggers an unimplemented codepath
— but still! It kind of works!</p>
<p>Getting this working involved fleshing out a lot more of the Windows API. The
demoscene executables I had been focusing on up to this point mostly brought up
a window and sent pixels to it, while Minesweeper's startup pokes at the
registry, ini files, and in particular has a bunch of drawing code. If you click
"view in debugger" in the above UI and then "imports" you can see a list of all
of the various Windows API that this pulls in and which I have (partially)
implemented.</p>
<p>For example, the red numbers in the UI come from bottom-up bitmaps that are
stored as 4 bits per pixel and which are 13 pixels wide, so each row uses 6.5
bytes. I briefly went down a rabbithole of reasoning about generically decoding
these before I realized the BMP format uses padding.</p>
<p>Getting Minesweeper to render definitely makes this project feel more "real" and
I think is a cool demo, but I also am not really sure I ultimately want to
reimplement a bunch of old Windows APIs. For example, I looked a bit into
SkiFree. It has 1bpp bitmaps and various raster ops and I am just not too sure
it is interesting.</p>
<p>Introspecting, I think I looked at Minesweeper because I was curious to see how
hard it would be, but also because I was just avoiding The Big Scary Bug.</p>
<h2>The Big Scary Bug</h2>
<p><a href="/software/blog/2023/11/retrowin32-unicorn.html">Last November I posted about an emulator CPU bug</a>:
to resummarize, a demo worked when using Apple's CPU emulator but not mine, but
it manifested as just the demo doing the wrong thing and not as any smoking gun
crash.</p>
<p>Here are some of the approaches I have tried to isolate this bug over the last
few months:</p>
<ul>
<li><a href="/software/blog/2023/05/retrowin32-async-dll-tracing-zig.html">Tracing the executable on native Windows</a>
and my emulator and comparing execution traces; failed because native
execution is too different from my emulator</li>
<li><a href="/software/blog/2023/11/retrowin32-unicorn.html">Integrating a 3rd CPU emulator (Unicorn)</a>
and comparing execution traces; failed because I couldn't get Unicorn to
reliably report CPU state, possibly due to either bugs in it or how I
interacted with it</li>
<li>Figuring out
<a href="https://github.com/evmar/retrowin32/blob/main/lldb-trace.py">enough of the LLDB API</a>
to attempt to get it to dump an execution trace of running under Rosetta;
failed because I got the traces closer but they still just diverged at some
point, and also the LLDB API is very frustrating — how can I print an 80-bit
x86 float, still not even sure!</li>
</ul>
<p>In any case I put it all on the back burner while I fiddled with Minesweeper for
a bit.</p>
<p>Then today I was looking over my notes on different demos I had tried in the
emulator and for one my note on why it didn't work was "uses lots of windows
apis, CreateDIBSection bitmap flags, SetTimer, etc". And I thought, huh, I
recently have implemented that kind of thing, I should try it again...</p>
<p>...and it gets much farther along, before of course encountering some other
problems, including that it somehow is underflowing the FPU stack. As I glanced
through some of the FPU code around its stack handling, I randomly noticed that
I had typo-misimplemented the <code>fild</code> instruction. It is supposed to take a
64-bit <em>integer</em> from memory and put it on the FPU stack (converting it to a
float), but I had made it take a 64-bit <em>float</em> from memory and put it on the
stack.</p>
<p>This did not fix this new demo, but apparently it did fix the four month long
bug. Argh.</p>
<h2>Test suite</h2>
<p>I think this is the third dumb typo bug I've discovered in my FPU
implementation, where all three would have been caught if I had had even the
most trivial of tests (e.g. to test multiplication, "does 2 * 3 produce 6?"). I
noticed if I pasted some of the relevant code into an LLM
<a href="https://inuh.net/@evmar/112038779696641333">it was able to spot one</a>, but also
when I pasted the whole file into the same LLM it couldn't find those bugs nor
the new one. Someday soon though, maybe?</p>
<p>I have been circling around writing an x86 CPU test suite because I haven't been
sure exactly which way in which I want to test my implementation. There's of
course the "does it work at all" tests which would have caught these bugs, but
there are also lots of cases that would only be exercised by particular inputs,
like whether the overflow flag gets set based on particular combinations of
inputs. I have a basic approach at this that I have written in C but updating it
is pretty annoying; I even as far as building out
<a href="https://github.com/evmar/retrowin32/blob/b90b05de0a64800da2e63b8ee7b9a78f6047c805/.github/workflows/exe.yml#L38">GitHub CI goop to compile this C
via the MSVC toolchain</a>.
My most recent retrowin32 update,
<a href="http://localhost:8000/software/blog/2024/02/cross-compile.html">about cross-compiling Rust</a>,
was prompted exactly by the thought of writing this test suite in Rust.</p>
<p>One recent idea I had for those is that I could just exhaustively run all
combinations of inputs for the implementations of operations that involve 8-bit
integers — there are only 2**16 of them — and compare these against a native
CPU implementation. It has the downside that wouldn't flush out bugs in the 16
or 32-bit implementations, but for many of these I have written them as generics
over the operand size
(<a href="https://github.com/evmar/retrowin32/blob/b90b05de0a64800da2e63b8ee7b9a78f6047c805/x86/src/ops/math.rs#L43">example</a>)
so maybe it would be enough.</p>
<h2>Wine on Mac on Rosetta</h2>
<p>By the way, if you actually wanted to run Minesweeper on your Mac, recent
releases on Wine do
<a href="/software/blog/2023/08/x86-x64-aarch64.html">the 32-bit x86 on 64-bit Rosetta thing</a>
— after all, I learned about it from Wine — and Wine is a much more competent
implementation than mine. At this point you can just <code>brew install wine-stable</code>
and then <code>wine some.exe</code> and it will work, even on Apple silicon.</p>
<p>With that in hand, what is even the point of my own project? Of course, the
ultimate goal is just my own interest, but does Wine subsume its functionality?
It turns out the
<a href="https://www.pouet.net/prod.php?which=567">one demo that got me started on this whole thing</a>
still crashes under Rosetta due to an illegal instruction. Just guessing, but it
appears Rosetta doesn't implement the "nested pointers" variants of the
<a href="https://www.felixcloutier.com/x86/enter"><code>enter</code> instruction</a> that chillin
uses. So at least until they fix that, I still have a (totally made-up) purpose.</p>tag:neugierig.org,2010:tech-notes/2024-02-04/cross-compile2024-02-04T00:00:00ZCross compiling Rust to win32<p><em>This post is part of a
<a href="/software/blog/2023/09/retrowin32.html">series on retrowin32</a>.</em></p>
<p>In <a href="https://github.com/evmar/retrowin32">retrowin32</a> I sometimes want to make my
own win32 programs that I can run either natively or through the emulator to
compare behaviors.</p>
<p>The <a href="/software/blog/2023/02/retrowin32-progress.html">first time</a> I wrote about
this I tried Zig, found it promising, but gave up and used C. The
<a href="/software/blog/2023/05/retrowin32-async-dll-tracing-zig.html">second time</a> I
tried Zig again and had a good time but also wrote about downsides. Recently I
tried a third time, this time trying to build an app that actually brought up a
window, and again got frustrated by Zig's (lack of) documentation and language
churn.</p>
<p>The bulk of retrowin32 is written in Rust, so an obvious question is why I
haven't used Rust for this purpose. The answer is that I hadn't figured out how
to make it work... until recently!</p>
<p>Naively you might expect all the cross compilation pieces to be in place already
and wonder why this was even hard. The answer is they kind of are, but the
details are finicky, in part due to my requirements. This post walks through the
details.</p>
<p>To start, Rust itself supports cross compilation via the <code>--target</code> flag. For
win32 there are two targets: <code>i686-pc-windows-gnu</code> and <code>i686-pc-windows-msvc</code>.
The differing last component here specifies the ABI but for our purposes that
mostly means which toolchain you end up using for all the pieces after
compilation (linker, libraries).</p>
<p>The <code>-gnu</code> target uses mingw, which is the world of gcc. On a Mac it's an easy
install with <code>brew</code>. Using some of the tricks discussed here I was able to
produce a binary, but due to this unconventional (for Windows) toolchain it ends
being fairly different from the binaries retrowin32 is intending to emulate. I
could surely make them work but it's a bit of a yak shave.</p>
<p>The <code>-msvc</code> target instead relies on the Visual Studio toolchain, producing
binaries much closer to what we want. This is our goal, but at this point there
are a number of hurdles and churn between Rust versions that meant my previous
attempts at getting this working failed.</p>
<p>I don't have all the details straight but here are hopefully enough keywords for
future searchers to find this post.</p>
<p><strong>Linker</strong>: Running the MSVC toolchain includes running <code>link.exe</code> to link. This
of course thwarts cross compiling from a Mac, but LLVM has its own linker that
Rust can use. It's all confusing and churning (see e.g.
<a href="https://github.com/rust-lang/rust/issues/71520">this bug</a>) but I found creating
a <code>.cargo/config</code> with <code>linker = "rust-lld"</code> worked.</p>
<p><strong>Libraries</strong>: Windows programs make system calls into system libraries like
<code>kernel32.dll</code>. To link against these there is a corresponding <code>.lib</code> file that
the linker uses, and without them you get errors like</p>
<pre><code>rust-lld: error: could not open 'kernel32.lib': No such file or directory
</code></pre>
<p>Getting these files is kind of a mess. Someone wrote a
<a href="https://github.com/Jake-Shadle/xwin">nice installer</a> that attempts to download
these from Microsoft (note that if you want to run this on Mac you have to
<a href="https://github.com/Jake-Shadle/xwin/issues/95">pass a weird flag</a>), but I
eventually twiddled enough that I didn't need them.</p>
<p>(Update: after writing this post, I dug into why the <code>windows-sys</code> crate can
build code without these. The answer is that (1) it actually bundles its own
<code>.lib</code> files via some
<a href="https://crates.io/crates/windows-targets/0.52.0/dependencies">magic crates</a>,
and there is
<a href="https://kennykerr.ca/rust-getting-started/understanding-windows-targets.html">more documentation about why</a>;
and also (2) there is some work on a "<code>raw-dylib</code>" feature in Rust to avoid
needing <code>.lib</code> files at all. Unfortunately with a small amount of poking I
wasn't able to convince Rust to use the <code>.lib</code> files from <code>windows-targets</code> to
satisfy the above failing link.)</p>
<p><strong>no_std</strong>: I believe the reason you need all of the above <code>.lib</code> files is
because Rust's standard library uses them. For my emulator purposes I'd rather
have less code in my executables in general — I'm often tracing through the
assembly — so next we descend into disabling Rust's standard library with a</p>
<pre><code>#![no_std]
</code></pre>
<p>at the top of the file.</p>
<p><strong>eh_personality</strong>: With that you might see an error about
<code>language item required, but not found: eh_personality</code>. There is Rust syntax
for defining this (if you look it up) but it requires unstable Rust. Instead you
can make panics abort rather than unwind via <code>-C panic=abort</code>.</p>
<p><strong>panic_handler</strong>: You need to provide an implementation of what to do on
panics. Here's some code with some comments:</p>
<pre><code>fn print(buf: &[u8]) {
unsafe {
let stdout = GetStdHandle(STD_OUTPUT_HANDLE);
WriteFile(
stdout,
buf.as_ptr(),
buf.len() as u32,
core::ptr::null_mut(),
core::ptr::null_mut(),
);
}
}
// rust-analyzer gets confused about this function, so we hide it from rust-analyzer
// following https://github.com/phil-opp/blog_os/issues/1022
#[cfg(not(test))]
#[panic_handler]
unsafe fn handle_panic(_: &core::panic::PanicInfo) -> ! {
print(b"panicked");
windows_sys::Win32::System::Threading::ExitProcess(1);
}
</code></pre>
<p><strong>The Windows API</strong>: In the above you see a call to <code>ExitProcess</code> via the
<code>windows-sys</code> crate, which is autogenerated to cover the whole Windows API and
is compatible with <code>no-std</code>.</p>
<p><strong>main</strong>: As best I understand it, the normal Windows startup call stack is that
the executable entry point goes to the C runtime
<code>mainCRTStartup</code>/<code>WinMainCRTStartup</code> which then invokes <code>main</code> or <code>WinMain</code>.
Rust makes it always <code>mainCRTStartup</code>. So the entry point is:</p>
<pre><code>#![no_main]
...
#[no_mangle]
pub unsafe extern "C" fn mainCRTStartup() {
</code></pre>
<p><strong>Compiler intrinsics</strong>: And, for my final trick! Once you start writing some
code you might encounter errors like
<code>rust-lld: error: undefined symbol: _memcmp</code>. These are some functions that the
compiler emits references to with the expectation that some other layer provides
them. There's <a href="https://github.com/rust-lang/compiler-builtins">some crate</a> that
is supposed to help with this but I couldn't get it to work. I am not really
clear on why it works (possibly inlining?) but I found that building with
<code>--release</code> was enough for my program to work.</p>
<p>(Update: after writing this post and writing more code, I eventually still ran
into failures due to these missing symbols. I ended up defining my own
implementations manually.)</p>
<p><strong>Putting it all together</strong>: Finally, here's a
<a href="https://github.com/evmar/retrowin32/tree/main/exe/gdi">fully worked example program</a>
(along with the
<a href="https://github.com/evmar/retrowin32/blob/main/.cargo/config"><code>.cargo/config</code></a>)
that brings up a window. The resulting exe is 3kb.</p>tag:neugierig.org,2010:tech-notes/2023-11-18/retrowin32-unicorn2023-11-18T00:00:00Zretrowin32's third x86 emulator<p><em>This post is part of a
<a href="/software/blog/2023/09/retrowin32.html">series on retrowin32</a>.</em></p>
<p>Since my last update, I refined my <a href="/software/blog/2023/08/x86-x64-aarch64.html">emulate-via-Rosetta</a> code to the
point where a demo ran with some cool graphics!</p>
<figure>
<img src=mofo.jpg width=661 height=524 alt='screenshot of a MacOS window showing the demo'>
<figcaption><a href='https://www.pouet.net/prod.php?which=519'>mofo by Psikorp</a>, Dreamhack 1999 64k winner</figcaption>
</figure>
<p>But I only have a screenshot to share with you and not a URL, because to run on
the web retrowin32 uses its own x86 emulator and this demo fails to work under
that. It's clearly a bug, but where?</p>
<p>Unfortunately this bug is significantly difficult to track down because it
manifests as the logic in the demo quietly just not going quite right, without a
crash or any similar smoking gun to point at the flaw. I spent more time trying
to get my Windows-native tracing debugger (described
<a href="/software/blog/2023/05/retrowin32-async-dll-tracing-zig.html">in a previous post</a>)
to run the program under a similar environment but I couldn't quite get the
execution traces to align — the emulator and native Windows were too different.</p>
<p>What I really needed, I thought, was a second emulator that I could run in
lockstep with mine such that I could find exactly the point where the two
emulators diverge in behavior.</p>
<h2>Unicorn emulator</h2>
<p><a href="https://www.unicorn-engine.org/">Unicorn</a> is a CPU emulator that is basically
QEMU wrapped up as a library. Just what I needed! I "just" needed to retarget
retrowin32 to work with Unicorn.</p>
<p>It's never quite so easy, of course. One piece of Windows emulation is that the
FS register must point at exactly a thread-specific Windows data structure. In
retrowin32's own emulator I just specially handle memory accesses that involve
FS. <a href="/software/blog/2023/08/x86-x64-aarch64.html">Under Rosetta I added an entry to the LDT</a> for this. But Unicorn
more fully emulates the CPU which meant I needed to
<a href="https://github.com/evmar/retrowin32/blob/6f4da1871208c48353d569ab844fd609d64133b5/win32/src/winapi/kernel32/mod.rs#L226">set up a GDT</a>,
not only for FS but all the other segments.</p>
<p>The other biggest hurdle was handling calls between the emulator and emulated
code. The emulator hands off control the the emulated executable's <code>main()</code>,
which then may call back into the emulator via Windows API calls, which then may
need to call back into the emulated executable — for example, the Windows
<code>DispatchMessage()</code> API calls the window's registered wndproc. Getting the hooks
in exactly the right places to make this work was more challenging than I
expected, in part because Unicorn is not exactly well-documented.</p>
<p>In all, Unicorn is a useful tool to be aware of. While we're on the subject, the
<a href="https://qiling.io/">Qiling Framework</a> wraps Unicorn with OS-specific loaders to
let you load executables from multiple different operating systems and poke at
them. Digging through their code, I noticed
<a href="https://github.com/qilingframework/qiling/blob/9a78d186c97d6ff42d7df31155dda2cd9e1a7fe3/qiling/os/windows/fiber.py#L44">they too struggled</a>
with making calls work in both directions.</p>
<h2>Bringing up a new emulator</h2>
<p>But in retrowin32 today Unicorn seems to work. And now, since I've now brought
up Windows emulation via three different x86 emulators, I have a better picture
of how to do it. For future reference (or if you too somehow decide writing a
Windows emulator is a good idea!) here is a recipe:</p>
<ol>
<li>Get my <a href="https://github.com/evmar/retrowin32/tree/main/exe/winapi"><code>winapi</code></a>
executable working. This is a trivial Windows executable that just calls a
few Windows APIs; I could have comfortably written it in assembly even.
Getting this working means you have the PE loading bits in place and can
handle calls from emulated code to the emulator.</li>
<li>Get my
<a href="https://github.com/evmar/retrowin32/tree/main/exe/zig_hello"><code>zig_hello</code></a>
executable working. This program is superficially even simpler than the
previous one but Zig uses the FS register to look up where stdout goes, so it
requires figuring all that out.</li>
<li>Get my
<a href="https://github.com/evmar/retrowin32/tree/main/exe/callback"><code>callback</code></a>
executable working. This program calls into the emulator and passes a
callback, exercising the full call stack of calls in both directions.</li>
<li>Finally, pick up any other Windows EXE and
<a href="https://knowyourmeme.com/memes/how-to-draw-an-owl">draw the rest of the owl</a>.</li>
</ol>
<p>With this in place, I next hope to set things up such that I can finally compare
my x86 emulation against the QEMU emulation to track down exactly which of the
many corners I cut ended up mattering.</p>