Write-Up: Misc - Appnote.txt from Google CTF 2022

Challenge

You are given an archive file that contains the flag somehow, but conventionally unpacking it does not lead to anything useful.

Looking at the file directly with a hex editor reveals:

the file hello.txt, containing “There’s more to it than meets the eye…”
the file hi.txt, containing “Find a needle in the haystack…” and then a collection of files named flagXX, where XX is a number between 00 and 18, and each of these files occurring 36 times. If you were to extract all the “text” from those files, you would get “abcdefghijklmnopqrstuvwxyz{CTF0137}_” one time for each flagXX file.

This on its own does not bring us closer to the flag, but the clue to solve this one is in the end of the file, where there is a collection of ZIP End of central directory records.

Here is a nice hexdump of the EOCDs

0000eecc: 504b 0506 0000 0000 0100 0100 00ee 0000  PK..............
0000eedc: cc00 0000 b801 504b 0506 0000 0000 0100  ......PK........
0000eeec: 0100 5ae4 0000 880a 0000 a201 504b 0506  ..Z.........PK..
0000eefc: 0000 0000 0100 0100 93d7 0000 6517 0000  ............e...
0000ef0c: 8c01 504b 0506 0000 0000 0100 0100 ccca  ..PK............
0000ef1c: 0000 4224 0000 7601 504b 0506 0000 0000  ..B$..v.PK......
0000ef2c: 0100 0100 69bf 0000 bb2f 0000 6001 504b  ....i..../..`.PK
0000ef3c: 0506 0000 0000 0100 0100 ceb6 0000 6c38  ..............l8
0000ef4c: 0000 4a01 504b 0506 0000 0000 0100 0100  ..J.PK..........
0000ef5c: 29a5 0000 274a 0000 3401 504b 0506 0000  )...'J..4.PK....
0000ef6c: 0000 0100 0100 e79c 0000 7f52 0000 1e01  ...........R....
0000ef7c: 504b 0506 0000 0000 0100 0100 428b 0000  PK..........B...
0000ef8c: 3a64 0000 0801 504b 0506 0000 0000 0100  :d....PK........
0000ef9c: 0100 2186 0000 7169 0000 f200 504b 0506  ..!...qi....PK..
0000efac: 0000 0000 0100 0100 7173 0000 377c 0000  ........qs..7|..
0000efbc: dc00 504b 0506 0000 0000 0100 0100 6670  ..PK..........fp
0000efcc: 0000 587f 0000 c600 504b 0506 0000 0000  ..X.....PK......
0000efdc: 0100 0100 e359 0000 f195 0000 b000 504b  .....Y........PK
0000efec: 0506 0000 0000 0100 0100 ac52 0000 3e9d  ...........R..>.
0000effc: 0000 9a00 504b 0506 0000 0000 0100 0100  ....PK..........
0000f00c: a247 0000 5ea8 0000 8400 504b 0506 0000  .G..^.....PK....
0000f01c: 0000 0100 0100 8e33 0000 88bc 0000 6e00  .......3......n.
0000f02c: 504b 0506 0000 0000 0100 0100 9a2a 0000  PK...........*..
0000f03c: 92c5 0000 5800 504b 0506 0000 0000 0100  ....X.PK........
0000f04c: 0100 161c 0000 2cd4 0000 4200 504b 0506  ......,...B.PK..
0000f05c: 0000 0000 0100 0100 3815 0000 20db 0000  ........8... ...
0000f06c: 2c00 504b 0506 0000 0000 0100 0100 2f02  ,.PK........../.
0000f07c: 0000 3fee 0000 1600 504b 0506 0000 0000  ..?.....PK......
0000f08c: 0100 0100 34f0 0000 5000 0000 0000       ....4...P.....

Each entry always begins with “PK\x05\x06”, then 8 bytes we don’t really care about in this case, and then 4 bytes telling us about the “size” and 4 bytes telling us about the “offset”

Then, the way that this is “legally encoded” as a proper zip file, is that it claims all further EOCDs are in the comment field of the previous one.

However, if we use the sizes and offsets we find and extract the text from the “subsections” we carve from the file, we recover this series of strings:

abcdefghijklmnopqrstuvwxyz{CTF0137}_
TF0137}_
F0137}_
0137}_
abcdefghijklmnopqrstuvwxyz{CTF0137}_
CTF0137}_
qrstuvwxyz{CTF0137}_
137}_
tuvwxyz{CTF0137}_
}_
nopqrstuvwxyz{CTF0137}_
137}_
efghijklmnopqrstuvwxyz{CTF0137}_
7}_
stuvwxyz{CTF0137}_
opqrstuvwxyz{CTF0137}_

{CTF0137}_
37}_
qrstuvwxyz{CTF0137}_
_

Now, this here is already almost our flag. If we assume the flag format CTF{something}, then we can estimate two things:

the first line is a sort of “dictionary”
all further lines represent a character of the flag, specifically the exact character that comes before the line in the “dictionary”

So, I created a script with that knowledge to extract the flag from the file:

 import re

rex = b"flag[0-9][0-9](.)PK\x01\x02"
rexf = b"PK\x05\x06[\x00-\xFF]{8}([\x00-\xFF]{4})([\x00-\xFF]{4})"
rexm = b"(.)"
patt = b""
first = True

def collect(data):
    global first, patt
    if first:
        for datum in re.findall(rex, data):
            patt += datum
        patt = bytes(patt[-1]) + patt
        first = False
        return ""
    datum = re.findall(rex, data)[0]
    return patt[patt.rfind(datum)-1:patt.rfind(datum)].decode("ASCII")

with open("dump.zip", "rb") as dump:
    data = dump.read()
    for size_b, offset_b in re.findall(rexf, data):
        size = int.from_bytes(size_b, byteorder="little")
        offset = int.from_bytes(offset_b, byteorder="little")
        c = collect(data[offset:offset+size])
        print(c, end="")
        if c == "}":
            break
    print()

which then yields the flag CTF{p0s7m0d3rn_z1p}