Now let’s remove all unnecessary things. First of all the section headers are totally irrelevant, we can just remove them. With the section headers removed, we can also remove the strings for the string tables, since they were only referenced from section headers.
We can also remove most of the segments and only leave a single
PT_LOAD segment which holds the code and the DYNAMIC data. And we can remove most of the
Elf64_Dyn entries, since they are not needed.
Now we only have two Program Headers:
PT_LOAD which loads the whole file to address 0
PT_DYNAMIC which points to the
We still have a few
DT_INIT for our initializer
But this is still too large.
We can start to overlap data. The last Program Header doesn’t need a value in
p_memsz since it is not a
PT_LOAD segment. We could start our
Elf64_Dyn list in this field, such that it overlaps with the Program Header.
We can save 6 more bytes by overlapping the first Program Header with the end of the ELF header, since the last 3 fields in the ELF header are not used.
We cannot remove any more
Elf64_Dyn records, since removing any of them makes the linker crash.
Seccond Attempt: golf.c
We now have 229 bytes, but it is still too large.
golf.so preloaded in a debugger reveals that our initializer is called via
call rax, so
rax has a pointer to the shellcode. Now it’s time to write some custom shellcode which is smaller. Since we have the address to our shellcode, we can store the
/bin/sh string in some unused ELF header field and directly reference it. Since the offset to the Section Headers
e_shoff is not used, we can just store the string in there.
The shellcode now looks like this:
401000: 48 8d b8 5e ff ff ff lea rdi,[rax-0xa2]
401007: 31 c0 xor eax,eax
401009: 50 push rax
40100a: 57 push rdi
40100b: 48 89 e6 mov rsi,rsp
40100e: 50 push rax
40100f: 48 89 e2 mov rdx,rsp
401012: b0 3b mov al,0x3b
401014: 0f 05 syscall
Third Attempt: golf.c
This generates a 224 byte large golf.so, but that’s still too big. We have to reach 192 bytes!
Shrinking the file even further is no longer trivial. We now have to overlap everything, and completely get rid of the shellcode section.
There are a few pointers within the ELF header as well as within Program Headers that are never used. We can overwrite them with arbitrary values and the linker won’t care. We can use this to split the shellcode and embed it in those pointers.
Unfortunately the first command (
lea rdi,[rax-0xa2]) not only depends on the address of the initializer function, it also consists of 7 bytes, so we cannot add a jump afterwards. Luckily there are some pointers where the exact value of the next field doesn’t matter that much, so we can enter something slightly too big. More specifically, if we put this command into the
p_addr field of the DYNAMIC segment, we can put the offset for a
jmp into the
p_filesz field. The
DT_INIT has to point to the
p_addr field. The
lea is executed, and then the jump is taken.
The next shellcode fragment can be placed in the
e_entry field of the ELF header. We can fit 4 shellcode instructions in there, but then we only have one byte left for the next jump. This time we cannot change the next byte, but since the next field is the pointer to the Program Headers and the Program Headers start at offset 0x3A, we conveniently jump to address 0x5A. This is right into the
p_filesz field of the
PT_LOAD header. We put another jump there, back to the
p_addr field of the same Program Header. In this field we can finally put the remaining commands of the shellcode.
Since we don’t need the shellcode at the end of the file anymore, the last
Elf64_Dyn entry ends with a NULL value. We don’t have to store that in our ELF file, since everything past the file end is implicitly zero. We don’t even have to store the whole
d_tag of the last
Elf64_Dyn record, since only the first byte is nonzero.
With those optimizations, the final ELF file is only 187 bytes large.
Final Solution: golf.c
Interestingly enough, IDA Pro refuses to load this ELF file.
golf.so file, we finally get our flags: