Theseus unpacking

April 24, 2026

Theseus, my new Windows binary translator, must see all the code it might run ahead of time to translate it. In that post I highlighted how this means it doesn't support programs that have a JIT. Someone emailed to ask about the packed executables found in the demoscene, which also unpack code at runtime. This absolutely nerd sniped me.

I wrote before about packed executables, so if you're not familiar with the term this is useful background context.

At a high level unpacking is simple. You run the packed program up to the point where it finishes decompressing and is about to jump to the original main() function, at which point you grab all the decompressed state and write it out to a new unpacked program.

I could use Theseus to do the same, just as two passes: run Theseus once to translate the packed program, amend it to make it write an executable file to disk when ran, run it, and now you have a normal executable to run Theseus a second time on. When I implemented unpacking in retrowin32 I added some flags to support that "write out an executable" mode. I could do the same here.

But the big picture idea I have with Theseus is this framing that the translated program is this mutable thing you can reach into and monkey with. Here's a kind of tidier approach to unpacking that uses that.

First, I run Theseus on the packed program, and in its output it prints:

WARN  tc/src/traverse.rs:40 omitting 004085dd: block appears zero-filled

This is saying "I saw a jump to this address, but on the other side there is no code". This immediately reveals the address of the original main() function: once unpacked, that's where the real program starts.

Next, I manually implement that function within the Theseus output to call my own do_unpack function. When control reaches there, I know the program has now unpacked itself into memory, and I can invoke Theseus itself on that memory to have it generate a program from the unpacked code.

In other words, I don't need to write out an intermediate .exe file and invoke Theseus again — I can modify the generated unpacker program to directly link and call back to Theseus itself! This is weird because Theseus-generated programs don't generally need to link the Theseus translator itself. But there's no reason it can't, the code is right there.

The total implementation ends up extremely simple, because I don't need to go through generating a proper PE file, I just gather the data Theseus needs. (Generic unpackers are thousands of lines of code; UPX's own functionality for unpacking is significantly more code than this; even retrowin32's implementation is twice as long.) And Theseus doesn't grow an unpacker mode, and rather just supports programs with manual modifications in general.

Unfortunately, I was so consumed by the nerd snipe here that I forgot the other reason you typically have to unpack a packed executable: to load it into a debugger. For that I would need to generate an exe, oh well. It's still neat.