team-logo
Published on

DownUnderCTF 2025 - Hungry Hungry Caterpillar

Authors

Description

Just like how the author confused chrysalides for cocoons, I always get the title of this book confused.

NOTE: The flag format is DUCTF{[a-z_]*}

Challenge overview

Challenge contains two files:

Step 1: Analyse the files

In output.txt, we see 7 lines of hex — each one is the result of XOR-ing part of the flag with a random keystream.

Looking at challenge.py, we find the part:

flag = open("flag.txt", "rb").read()
assert flag[1] == ord("U")
flag += os.urandom(len(flag) * 6)
keystream = os.urandom(len(flag))

Observations:

  • The original flag is padded: final length = 7× original length.
  • XOR is applied with a keystream of equal (padded) length.
  • Assertion reveals the second character is U.

The output lines were generated like this:

xor(flag[::i], keystream).hex()  # i = 1 to 7

Each line uses a different step: 1, 2, ..., 7 — skipping letters accordingly.

This means:

  • Line 1 reveals flag[0], flag[1], flag[2]...
  • Line 2 reveals flag[0], flag[2], flag[4]...
  • etc.

Base knowledge on how XOR works:

C = A ^ B  =>  A = B ^ C

So if we have any two of the three values, we can find the third.

Reversing XOR with Known Prefix

We know the flag starts with DUCTF{. That gives us:

keystream[0] = ord('D') ^ int('f3', 16)
keystream[1] = ord('U') ^ int('c9', 16)
...

I wrote a quick helper function to extract the prefix of the keystream:

def get_keystream_prefix(prefix_flag: list[int], first_line: list[int]) -> list[int]:
    return list(xor(prefix_flag, first_line))

Step 2: Stitching the Flag Together

To track progress, I created a placeholder-filled list equal to the full padded flag length.

full_flag = [ MISSING_VAL for _ in range(flag_length) ]

Each output line reveals flag characters at intervals:

  • Line 2 reveals f\[0], f\[2], f\[4], ...
  • Line 3 reveals f\[0], f\[3], f\[6], ...

We use this to fill in as many characters as possible:

for val in result:
  temp_flag += val
  if idx > 0:
    temp_flag += ''.join([MISSING_VAL for _ in range(idx)])

This resulted in output like:

D?C?F?t?e?h?
D??T??t??_??n??y??
D???F???e???n???_???t???
...

Bringing us to a partial flag:

DUCTF{the_h?n?ry_?i?tl??p_??o?t????t???????????????????????????????????

Step 3: Prime numbers

Since each output line is based on a specific step size from 1 to 7, characters at indices divisible by these values can be reliably recovered. However, positions corresponding to prime numbers (e.g., 11, 13, 17...) are only sparsely covered or missed entirely.

To resolve this, I supplemented the decoding process with manual guessing. I focused on inserting two new characters at a time into the known parts of the flag, then re-ran the analysis to validate the structure.

To reduce the search space, I used the following assumptions:

  • Uppercase characters appear only in the prefix DUCTF
  • Allowed characters in the flag are limited to lowercase letters, {, }, and underscores

Below is the core loop I used to test candidate character pairs:

for letterA in alphabet:
  for letterB in alphabet:
    starting_flag = "DUCTF{the_h" + letterA + "n" + letterB
    full_flag = [ MISSING_VAL for _ in range(flag_length) ]

    for idx, val in enumerate(starting_flag):
      full_flag[idx] = val
    
    for _ in range(MAX_ROUNDS):
      search_flag(full_flag, int_lines)

After refining guesses and cross-validating against all XOR outputs...

Recovered flag is: DUCTF{the_hungry_little_p_smooth_caterpillar_wow_an_allegory_for_life}

Attachement

My full script:

import os
import string
alphabet = list(string.ascii_lowercase)
alphabet.append("_")
MISSING_VAL = "*"

MAX_ROUNDS = 1

# flag[1] is ord("U") -> so this is the prefix DUCTF
def xor(a, b):
    return bytes(left ^ right for left, right in zip(a, b))

def prefix_to_int_list(flag: str) -> list[int]:
    return [ ord(letter) for letter in flag ]

def get_keystream_prefix(prefix_flag: list[int], first_line: list[int]) -> list[int]:
    return list(xor(prefix_flag, first_line))

def convert_to_text(num_arr: list[int]) -> str:
    return "".join([ chr(val) for val in num_arr ])

def is_valid(character: chr) -> bool:
    return character != MISSING_VAL

def get_input_lines(filename: str) -> list[list[int]]:
    int_lines = []
    with open("input.txt", "r") as file:
        for line in file:
            bytes_arr = bytes.fromhex(line)
            int_lines.append(list(bytes_arr))
    return int_lines
 
def is_valid_flag(flag: str) -> bool:
    for letter in flag[6:-1]:
        if (letter.isalpha() and letter.isupper()):
            return False
        if (not letter.isalpha()) and (letter not in ["_", "*"]):
            return False
    return True

def search_flag(full_flag: str, int_lines: list[list[int]]) -> None:
    flag_string = "".join(full_flag)
    prefix_flag = flag_string.split(MISSING_VAL)[0]
    prefix_int_arr = prefix_to_int_list(prefix_flag)
    
    first_line = int_lines[0]
    keystream_prefix = get_keystream_prefix(first_line, prefix_int_arr)
    lines = []

    for idx, line in enumerate(int_lines):
        result = list(xor(line, keystream_prefix))
        text = convert_to_text(result)
        
        temp_flag = ""
        for val in text:
            temp_flag += val
            if idx > 0:
                temp_flag += "".join([MISSING_VAL for _ in range(idx)])
        
        for idx, val in enumerate(temp_flag):
            if (idx < len(prefix_flag)) or (idx >= len(full_flag)):
                continue
             
            if full_flag[idx] == MISSING_VAL:
                full_flag[idx] = val
            elif is_valid(val) and (full_flag[idx] != val):
                print(f"Collision at idx: { idx } of { full_flag[idx] } with val: { val }")
    part_flag = ("".join(full_flag)).strip()
    if is_valid_flag(part_flag):
        print(f"Part flag: { part_flag }")

def main():
    int_lines = get_input_lines("input.txt")

    flag_length = int(len(int_lines[0]) / 7) # It is the flag + len(flag) * 6 random characters
    print(f"Flag length is: { flag_length }")

    for letterA in alphabet:
        for letterB in alphabet:
            starting_flag = "DUCTF{the_h" + letterA + "n" + letterB
            full_flag = [ MISSING_VAL for _ in range(flag_length) ]

            for idx, val in enumerate(starting_flag):
                full_flag[idx] = val
            
            for _ in range(MAX_ROUNDS):
                search_flag(full_flag, int_lines)

if __name__ == "__main__":
    main()