34C3 CTF nope writeup - diluted shellcodes

A quick writeup for nope from 34C3.

Overview

The challenge was an x64 Linux binary with a simple premise, it reads in and executes our shellcode. But there’s a twist:

Available opcodes

First I generated all the available opcodes with disasm from pwntools, filtered out the bad ones, created a python dict and manually removed some that are privileged or otherwise obviously useless for us. The result can be found here.

dis = [disasm(p8(i) + '\x90' * 15).splitlines()[0] for i in range(256)]
nobad = filter(lambda x: '(bad)' not in x, dis)
inst_lst = map(lambda x: re.findall(r'''.*:\s+([0-9a-f]{2})[\d\s]+(.*)''', x), nobad)
inst = {re.sub(r'''\s+''', ' ', x[0][1]): p8(int(x[0][0],16)) for x in inst_lst}

What can immediately be seen is that there’s no way we can make a syscall with just these opcodes as there’s no syscall, sysenter or int 0x80.

Execution environment

When our code starts running, we have rsp set to an RW page and rcx is set to the address of the region containing our code, which also happens to be the only thing executable in the address space (well, also vsyscall, but that’s not going to help here). The shellcode is prepended with the unmapping/mprotecting stub, then an exit_group syscall invocation is added at the end, followed by nops until the end of the vm region. Both of these are separated by a random number of nops from our code, and the final mprotect call of the stub makes the stub itself read-only. So we are left with our diluted input and the final exit_group syscall. This, combined with the fact that we cannot inject syscall instructions means we have to solve the challenge by hijacking the syscall instruction meant to execute exit_group and invoke an execve with the right arguments.

Building blocks

To do that, we need to build more useful primitives from the limited instruction set we can use.

Emulating mov

There are single-byte push/pop instructions for the 8 registers extended from x86, these can be used to emulate mov.

 'pop rax': 'X',
 'pop rbp': ']',
 'pop rbx': '[',
 'pop rcx': 'Y',
 'pop rdi': '_',
 'pop rdx': 'Z',
 'pop rsi': '^',
 'pop rsp': '\\',
 'push rax': 'P',
 'push rbp': 'U',
 'push rbx': 'S',
 'push rcx': 'Q',
 'push rdi': 'W',
 'push rdx': 'R',
 'push rsi': 'V',
 'push rsp': 'T',

Arithmetic

We have a limited set of arithmetic operations (see below) but they are enough to build any value in eax/al.

 'adc al,0x90': '\x14',
 'adc eax,0x90909090': '\x15',
 'add al,0x90': '\x04',
 'add eax,0x90909090': '\x05',
 'clc ': '\xf8',
 'sbb al,0x90': '\x1c',
 'sbb eax,0x90909090': '\x1d',
 'stc ': '\xf9',
 'sub al,0x90': ',',
 'sub eax,0x90909090': '-',

Setting the carry flag, then adding 0x90 to al with carry, then subtracting 0x90 without carry will add 1 to al. The same can be done in eax but this will zero the higher double word of rax.

Writing strings

There are convenient stos instructions to use that can be combined with the previous arithmetic primitive to buld arbitrary strings on the stack.

 'stos BYTE PTR es:[rdi],al': '\xaa',
 'stos DWORD PTR es:[rdi],eax': '\xab',

Building arbitrary 64-bit values

As mentioned previously, 32-bit arithmetic will zero out the higher dw, so creating arbitrary 64-bit values is a bit more tricky. I started working on this primitive by using 32-bit arithmetic, writing them to the stack in two part with stos DWORD PTR es:[rdi],eax and then getting them via pop but realized that it’s not necessary for a functioning exploit.

Finding the executable syscall inst

There’s a random amount of nop padding to make finding the syscall instruction harder. We can locate it using an available scas instruction.

 'scas al,BYTE PTR es:[rdi]': '\xae',

The exploit

The final, commented exploit puts these building blocks together to invoke an execve syscall with ['/bin/sh\0', '-c\0', cmd + '\0'])) as the argv array.