Unpacking packed executables

April 26, 2025

This post is part of a series on retrowin32.

Demoscene programs sometimes are packed with an executable compressor like UPX. An EXE packer ingests an EXE and emits a smaller one that does the same thing. This is useful for meeting demo competition file size limits. (Here's a few 4kb demos from last week. 4kb, including the music! Just the text of this blog post is nearly twice that!)

At a high level, a packed executable is a tiny program that, when run, uncompresses the real program into memory and then runs it. From the perspective of an emulator, packed executables are the same as regular executables; they are all just programs that eventually make some system calls.

But from the perspective of debugging what has gone wrong when running a packed executable, all of the interesting executable state — including even which system functions it calls — is only something that shows up after the executable has unpacked itself. For this reason it is useful to be able to unpack a packed executable, reconstructing the original program, which you can then feed into a program analyzer like Ghidra. (Tools like UPX even provide unpacking themselves, much like a zip utility also provides unzip.)

But what if your program isn't packed with UPX? Or (what I have encountered more) what if it's packed with an old enough version of UPX that current UPX will no longer unpack it?

An emulator can treat the unpacking logic as a black box and just execute it. If you pause the program right at the point after that, you could grab the in-memory unpacked state and dump it back into a new exe. This exe then looks like an ordinary program, ready for analysis.

I recently implemented just this idea in retrowin32 and successfully dumped at least one packed exe. This post talks about how it works.

Problem one: finding main

The first step is figuring out the execution point to dump from. Ideally you'd grab the memory state after the executable has unpacked but before any of its main code runs, but where exactly is that?

A packed exe looks something like:

entry_point:
  ; ...some loop to unpack things here
  jmp some_address  ; jmp to the unpacked code

If you load a packed exe into a tool like Ghidra, the jmp looks like it's jumping to a garbage address, because the code on the other side of that jump only exists after the unpacking logic runs. I think you might be able to automatically spot it because of that.

For now, since I'm deep into analyzing these executables in the first place, suppose I just manually identify the some_address address that we want to break on.

The next problem is that you can't just set an ordinary breakpoint at that point. A software breakpoint works by patching over the target with an int3 instruction, but if we set one of those at startup it gets overwritten by the unpacking loop. So instead, I can set a breakpoint at the last instruction of the unpacking code (the jmp in the above example) and then single step once (to follow the jmp).

This also will work with another type of packer I've seen, which generates code like:

entry_point:
  ; ...some loop to unpack things here,
  ; including:
  push some_address
  ; ...more asm
  ret  ; jmps to some_address

It's easier to set a breakpoint on the ret than it is to find which address it happens to jump to.

From that state I can then create a .exe file by dumping the current memory state, with the exe's entry_point properly set, and everything works... except for one important piece.

Problem two: reconstructing imports

To minimize the file size, a packed executable only declares dependencies on a minimal set of system functions. The unpacking process decompresses a list of all the real dependencies of the underlying executable, and it dynamically loads those dependencies so that they can be called once the underlying executable starts.

For the purposes of offline static analysis, we need these dynamic dependencies resolved. We want to load our executable into Ghidra and have it understand which system functions it calls.

To understand how to manage this you need to understand a little bit about how reasonable programs resolve imports. I wrote earlier about this, including some pictures. I'll resummarize the broad description that's relevant here.

A regular program that calls system functions like GetStartupInfoA() will call via the Import Address Table ("IAT"). In assembly this looks like:

call [imp_GetStartupInfoA]  ; where imp_GetStartupInfoA is a fixed address

That is, each call site says "call the address found at this fixed address, a point within the IAT". The executable loader reads a list of named imports (called the Import Directory Table ("IDT")) and fills in the IAT with the resolved addresses.

A packed executable's IDT is nearly empty because all the interesting work happens at unpack time. But the underlying executable that the packer started with had its own IAT; the packer fills it in as part of its unpacking. To reverse this, in our unpacked executable we need to reconstruct our own IDT that points at the IAT such that Ghidra believes it.

Where is that IAT? We don't know. From the emulator's perspective, the unpacking code does a bunch of grinding, eventually with a series of calls like:

hmodule = LoadLibrary("ddraw.dll")
func = GetProcAddress(hmodule, "DirectDrawCreate")
... stash func in the IAT of the in-memory unpacked executable

Unfortunately it's not obvious how to follow that "stash" operation. (In writing this post, I had the idea that maybe we could observe all memory writes?) But we can observe these calls to GetProcAddress and record which addresses we vended out, and then at dumping time we can scan the program's memory to find where those values ended up. (This is similar to how Scylla, an exe dumping tool, attempts to solve this problem. But Scylla doesn't get to cheat by using an emulator.)

We expect each value to show in memory exactly once, in the unpacked executable's IAT. (What if an value happens to be found in memory more than once due to accidentally colliding with some unrelated sequence of bytes? This hasn't come up yet but one idea I had is I could emulate the unpack sequence twice with different memory layouts such that the GetProcAddress calls return different values, and then compare the two for overlap.)

Once we have the IAT address, we can construct an IDT that tells the loader which functions correspond to which addresses, and stuff that new IDT into our executable as additional data. With a few minor adjustments to the exe headers I now have an exe unpacker that llvm-objdump can understand, and which Ghidra renders complete with system function calls like:

00428c37 6a 05         PUSH     0x5
00428c39 50            PUSH     EAX
00428c3a ff 15 b0      CALL     dword ptr [->USER32.DLL::ShowWindow]
         20 43 00
00428c40 ff 76 10      PUSH     dword ptr [ESI + 0x10]
00428c43 ff 15 ac      CALL     dword ptr [->USER32.DLL::UpdateWindow]
         20 43 00