Tech Notes: Polymorphic shellcode

Polymorphic shellcode

March 08, 2023

As part of my emulator explorations I went down a curiosity rabbit hole about polymorphic shellcode and learned about something fun I'd like to share with you!

The world of exploits is pretty sophisticated and if you're not into it it can be hard to follow. (I recently saw some Twitter poll go by that was like "does anyone actually understand any posts from the Project Zero blog?") I personally love reading about it — my earliest computer wizard friends came from the hacking scene — but in this post I aim to be high level enough that an interested reader who isn't as into it could follow.

Shellcode is roughly "the code the attacker wants to inject into the attacked system". It is typically a small snippet just used as the initial stage of an attack that bootstraps into remote access or downloading more code. And all I knew about polymorphic shellcode is that it refers to shellcode that somehow changes its form to be more difficult to identify. So here's what I've learned!

Why polymorphism? One defense against exploits is to create a database of known shellcode signatures (like, literally the bytes of the shellcode), and then somehow respond if you spot those bytes, by e.g. having a router examine packets and drop the network connection before relaying them. If the attack doesn't have a recognizable signature it's harder to stop. Another reason is that sometimes the mechanism of attack means you can't use some particular bytes — for example, an attack via an HTTP header might be unable to use the byte values of \r or \n in the shellcode.

One idea for producing polymorphism is encoding the shellcode in some encrypted form and decoding that payload on the fly when it runs. To avoid particular forbidden bytes, you can just randomly try different encryption keys until the output is the form you need. But it seems to have an obvious bootstrapping problem: you need the attacked system to execute the decoder to get the whole attack started, which means the decoder itself must be sent unencrypted. The trick that makes this work is that the decoder can be smaller than the full shellcode. So the full attack sequence then becomes:

Decoder decodes shellcode
Shellcode establishes some sort of foothold for remote access, e.g. downloading more code
Full attack is now deployed

Metaspoit's "shikata ga nai" (SGN), which is described in this presentation, uses this technique. It produces code that looks something like this:

int key = an initial fixed number;
int* code = address of the encoded payload below;
int len = the encoded payload length;
loop:
  *code = *code ^ key;  // xor-decode some payload code
  code++;  // advance to next block
  key = key + *code;  // evolve key
if (--len > 0) goto loop;
...shellcode encoded payload here...

In words and not code, the decoder loop runs xors over the encoded payload. And that encoded payload is in memory directly after the loop's code, so once the loop finishes decoding, the next instruction executed will be the first instruction of the now decoded payload.

This decoder can be really small (~nine x86 instructions). The SGN generator for it represents it in an abstract form such that it can generate it in many different but equivalent forms to evade detection or forbidden bytes.

For some examples, the SGN encoder can use different registers for the various variables, it can write addition as subtracting a negative, and it can do the code++ earlier and adjust the math in other expressions to accommodate for it. And further, it represents the steps of the code (the initialization logic, which is more complex than sketched out here) in a dependency graph such that it can reorder parts that aren't interdependent. For example, you can imagine swapping the first and second lines of the above code.

(One more advanced question I had was: how do you get the initial value of code, given that it needs the address of this code's own instruction stream wherever it ends up on the system it's attacking? It is not a constant, but must instead be computed. It turns out this is a thing that is mostly solved in exploit development; hilariously there's a floating-point instruction that you can use to get there...)

SGN has one final cute trick. The decode loop is fixed enough that it might be possible to detect, even with the above obfuscation — it's some sort of xor followed by a loop. But the encoder actually hides the tail of the loop itself within the encoded payload! Look at the above code: the very first iteration of the decoding loop runs (the xor) before the loop actually loops. As long as that first iteration decodes the end of the loop instruction itself, the rest of the decoding process works. So the actual unencrypted part of the payload is the code only up to roughly where the if statement appears in the above pseudocode. Pretty clever!

That is it for my high-level summary. You can see some sample assembly output in slides linked above from 63 onward. The code itself is pretty short but has more subtlety to it. It turns out there is a decent quantity of research investigating ways to detect it.