Theseus unpacking
Theseus, my new Windows binary translator, must see all the code it might run ahead of time to translate it. In that post I highlighted how this means it doesn't support programs that have a JIT. Someone emailed to ask about the packed executables found in the demoscene, which also unpack code at runtime. This absolutely nerd sniped me.
I wrote before about packed executables, so if you're not familiar with the term this is useful background context.
At a high level unpacking is simple. You run the packed program up to the point
where it finishes decompressing and is about to jump to the original main()
function, at which point you grab all the decompressed state and write it out to
a new unpacked program.
I could use Theseus to do the same, just as two passes: run Theseus once to translate the packed program, amend it to make it write an executable file to disk when ran, run it, and now you have a normal executable to run Theseus a second time on. When I implemented unpacking in retrowin32 I added some flags to support that "write out an executable" mode. I could do the same here.
But the big picture idea I have with Theseus is this framing that the translated program is this mutable thing you can reach into and monkey with. Here's a kind of tidier approach to unpacking that uses that.
First, I run Theseus on the packed program, and in its output it prints:
WARN tc/src/traverse.rs:40 omitting 004085dd: block appears zero-filled
This is saying "I saw a jump to this address, but on the other side there is no
code". This immediately reveals the address of the original main() function:
once unpacked, that's where the real program starts.
Next, I manually implement that function
within the Theseus output
to call my own do_unpack function. When control reaches there, I know the
program has now unpacked itself into memory, and I can invoke Theseus itself on
that memory to have it generate a program from the unpacked code.
In other words, I don't need to write out an intermediate .exe file and invoke
Theseus again — I can modify the generated unpacker program to directly link
and call back to Theseus itself! This is weird because Theseus-generated
programs don't generally need to link the Theseus translator itself. But there's
no reason it can't, the code is right there.
The total implementation ends up extremely simple, because I don't need to go through generating a proper PE file, I just gather the data Theseus needs. (Generic unpackers are thousands of lines of code; UPX's own functionality for unpacking is significantly more code than this; even retrowin32's implementation is twice as long.) And Theseus doesn't grow an unpacker mode, and rather just supports programs with manual modifications in general.
Unfortunately, I was so consumed by the nerd snipe here that I forgot the other reason you typically have to unpack a packed executable: to load it into a debugger. For that I would need to generate an exe, oh well. It's still neat.