Cross compiling C/Rust to win32, again
This post is part of a series on retrowin32.
Earlier I wrote about cross compiling Rust to win32. I ended up not following through on that approach due to missing compiler intrinsics. Instead here are two closely related dives.
Cross compiling C to win32
Clang supports cross compilation, including targeting Windows, but I kept getting it wrong. After a few false starts, I reached out to Nico, who actually knows things, and he set me straight, so thank him for anything you learn from this post!
Clang has a clang-cl
binary that is intended to be a drop-in substitute for
Visual Studio's cl.exe
, which means not only matching the command-line
interface in terms of how flags are spelled (which I don't care much about) but
importantly also in terms of preconfiguring the compiler settings to properly
produce a Windows output (which ends up critical). For example, if your program
#include
s a Windows header file, you need the compiler to be configured to
understand all the minor language variations found in Windows-style source code.
This means cross compiling C++ code to Windows ends up being as simple as
something like this. (I'm still not exactly clear on when flags are hyphenated
or slashed, but I do know to pass linker args to cl.exe
they must follow the
/link
switch...)
$ clang-cl -fuse-ld=lld -target i686-pc-windows-msvc \
-vctoolsdir $xwin_path/crt -winsdkdir $xwin_path/sdk \
foo.cc \
/link /subsystem:console
xwin
The above invocation needs Windows headers and libraries in a particular file
system layout. They are available from Microsoft but only in the form of
installer .exe
files.
winetricks is a script that downloads and unpacks those (and many other redistributable packages) by invoking the executables through Wine. (I briefly thought about running this through retrowin32 but the exes target modern Windows, not the old Windows that retrowin32 targets.)
But as I mentioned in the previous post, there is a tool called
xwin that downloads and unpacks the
.exe
s directly. This is a subtle process — the files can be VSIX or CAB files
which contain XML blob indexing other files and there's some manual shuffling of
files around when unpacking happens — which doesn't give me a lot of confidence
it will keep working into the future, but for now, an invocation like this:
$ xwin --accept-license --arch x86 splat --output redist --disable-symlinks
(where --disable-symlinks
is only needed on a case insensitive file system)
produces a directory layout that the above clang-cl
invocation accepts.
Back to Rust, and calling conventions
Looking back, many of the problems I encountered in my previous post were due to
using Rust's no_std
. I was avoiding the Rust standard library for a few
reasons but primarily because it used SSE instructions, which retrowin32 doesn't
(yet?) support. But it turns out that switching the Rust target from
i686-pc-windows-msvc
to i585-pc-windows-msvc
was sufficient to avoid these.
I could then run a Rust "hello world" with-std app in retrowin32 with a few more
function stubs. The only tricky one was that it needs memcpy
from
vcruntime140.dll
. Implementing memcpy is pretty easy; the tricky part was
realizing it uses the cdecl
calling convention. To understand why this matters
you need a bit of background.
First, on Windows there are two main calling conventions, called stdcall
and
cdecl
. In both, the caller pushes its arguments onto the stack right to left.
In stdcall
, the callee is then responsible for popping those arguments, while
in cdecl
it's the caller. (There's a lot more to it than just this; I found
this reference
especially helpful.)
I'm not clear on why both exist. A benefit of stdcall
is you only need the
stack-popping code in one location (the callee) and not once per caller, and in
fact the ret
instruction takes an integer argument of how much to pop so it
only costs two extra bytes in the binary. A benefit of cdecl
is that it's
arguably necessary for varargs functions — or at least that is what sources
online say, but it seems to me you could make it work either way so I'm not
really sure. (I guess it would be more fragile in the case where the callee
doesn't consume all its varargs inputs? Maybe it'd more complex with all the
modern stack corruption mitigations?)
In any case, up to this point retrowin32 implements hundreds of Windows
functions but they were all stdcall
. It implements these functions in two
layers.
First, I implement a given Windows API function via an annotated Rust function using Rust types, like the following:
#[win32_derive::dllexport]
pub fn SetThreadDescription(
machine: &mut Machine,
hThread: HTHREAD,
lpThreadDescription: Option<&Str16>,
) -> bool { ... }
Second, a code generator collects up all the functions that were annotated as
dllexport
and generates code for each that translates emulator state into a
call to these functions. For example, for the above it knows to pull hThread
and lpThreadDescription
off the emulated stack, and the latter is an optional
pointer to a NUL-terminated WTF-16 string that needs to be read from emulated
memory. The bool
return value becomes an integer that goes in the emulator's
eax
register. And finally that the stack is popped by 8 bytes because that's
what the arguments used.
To support cdecl
, I only needed to adjust the dllexport
code generation to
control how much stack was popped. With that in place retrowin32 can now at
least run a simple Rust program with println!("hello, world");
.
Inline assembly
The reason I have been poking at all of this is because I want to write a test suite over my implementations of x86 opcodes, and I want to run that suite on a native x86 to verify my emulator behavior matches the real thing.
Because I want to test specifically the invocations of opcodes I write inline assembly to do it. This has allowed me to contrast the way inline assembly works in C and in Rust.
Clang follows gcc syntax for inline assembly, as documented in the gcc docs (and I believe nowhere in the Clang docs?). It is syntactically surprisingly clunky. For example:
- to write multiple instructions of assembly you must embed literal
'\n'
s into the string; - it treats
%
as an escape metacharacter, but meanwhile (in AT&T syntax) registers are prefixed with%
, which means to refer to a register you must write it doubled as%%
; - to specify inputs/outputs/clobbers, you write them after
:
in the blocks, which means you end up with awkward empty blocks (see e.g. the/* No outputs */
comment in the example here); - the syntax within these blocks is itself a kind printf-like format string with
docs describing things like
"g"
means "Any register, memory or immediate integer operand is allowed, except for registers that are not general registers."
I have no context here so I imagine a lot of the above probably organically grew from how assemblers worked over gcc's history. It doesn't feel like the thing you would design if you were inventing it today.
Meanwhile, clang-cl
configures clang to work like a Windows compiler, which
means it also supports Microsoft's inline assembly syntax. I don't have much
experience with this except that it looks a lot simpler than the gcc syntax. It
seems the compiler must infer a lot more of the context that is explicitly
stated in the gcc format to get things like clobbers right; that is at least
what LLVM does, where you can see the
~{flags}
bits on the right.
Finally, Rust has its own syntax for inline assembly that feels very sensible. (Rust supports many fewer architectures than gcc, which possibly makes the problem space a lot easier?) Since it backs onto LLVM anyway it feels semantically close to Clang, but the syntactical problems like how options are specified is in a simple syntax with keywords.
One bit of cute syntax I enjoyed: in assembly in various places you end up using
a letter suffix to specify the size of a given operation; for example in AT&T
assembly you write addb
to specify a byte-sized add but addw
for a 16-bit
("word") add. Unfortunately I keep getting these letter suffixes confused, where
AT&T uses e.g. l
for "long" when Intel uses d
for "dword", both to specify a
32-bit operation. (The worst is q
is for "quad word", aka
(2-byte word) * 4 = 8 bytes
.) LLVM has a
zoo of single-letter codes
which is surely necessary for modeling all the necessary complexity.
Meanwhile, it's not the same context so it's a little unfair, but in Rust asm
templates the formatting codes just correspond to the register names — e.g. you
use e
(as in eax
, ebx
) to refer to 32 bits, or l
(as in al
, bl
) for
the low 8 bits.
Between these options and given how the rest of retrowin32 is in Rust already I am leaning towards using Rust (if anything, my main complaint is that the Rust autoformatter skips my asm blocks, possibly because it's a macro?). The binaries are pretty large, so maybe that is what I will look into next...