- Published on
DownUnderCTF 2025 - Hungry Hungry Caterpillar
- Authors
- Name
- agnik
Description
Just like how the author confused chrysalides for cocoons, I always get the title of this book confused.
NOTE: The flag format is
DUCTF{[a-z_]*}
Challenge overview
Challenge contains two files:
Step 1: Analyse the files
In output.txt
, we see 7 lines of hex — each one is the result of XOR-ing part of the flag with a random keystream.
Looking at challenge.py
, we find the part:
flag = open("flag.txt", "rb").read()
assert flag[1] == ord("U")
flag += os.urandom(len(flag) * 6)
keystream = os.urandom(len(flag))
Observations:
- The original flag is padded: final length = 7× original length.
- XOR is applied with a keystream of equal (padded) length.
- Assertion reveals the second character is
U
.
The output lines were generated like this:
xor(flag[::i], keystream).hex() # i = 1 to 7
Each line uses a different step: 1, 2, ..., 7 — skipping letters accordingly.
This means:
- Line 1 reveals flag[0], flag[1], flag[2]...
- Line 2 reveals flag[0], flag[2], flag[4]...
- etc.
Base knowledge on how XOR works:
C = A ^ B => A = B ^ C
So if we have any two of the three values, we can find the third.
Reversing XOR with Known Prefix
We know the flag starts with DUCTF{
. That gives us:
keystream[0] = ord('D') ^ int('f3', 16)
keystream[1] = ord('U') ^ int('c9', 16)
...
I wrote a quick helper function to extract the prefix of the keystream:
def get_keystream_prefix(prefix_flag: list[int], first_line: list[int]) -> list[int]:
return list(xor(prefix_flag, first_line))
Step 2: Stitching the Flag Together
To track progress, I created a placeholder-filled list equal to the full padded flag length.
full_flag = [ MISSING_VAL for _ in range(flag_length) ]
Each output line reveals flag characters at intervals:
- Line 2 reveals
f\[0], f\[2], f\[4], ...
- Line 3 reveals
f\[0], f\[3], f\[6], ...
We use this to fill in as many characters as possible:
for val in result:
temp_flag += val
if idx > 0:
temp_flag += ''.join([MISSING_VAL for _ in range(idx)])
This resulted in output like:
D?C?F?t?e?h?
D??T??t??_??n??y??
D???F???e???n???_???t???
...
Bringing us to a partial flag:
DUCTF{the_h?n?ry_?i?tl??p_??o?t????t???????????????????????????????????
Step 3: Prime numbers
Since each output line is based on a specific step size from 1 to 7, characters at indices divisible by these values can be reliably recovered. However, positions corresponding to prime numbers (e.g., 11, 13, 17...) are only sparsely covered or missed entirely.
To resolve this, I supplemented the decoding process with manual guessing. I focused on inserting two new characters at a time into the known parts of the flag, then re-ran the analysis to validate the structure.
To reduce the search space, I used the following assumptions:
- Uppercase characters appear only in the prefix
DUCTF
- Allowed characters in the flag are limited to lowercase letters,
{
,}
, and underscores
Below is the core loop I used to test candidate character pairs:
for letterA in alphabet:
for letterB in alphabet:
starting_flag = "DUCTF{the_h" + letterA + "n" + letterB
full_flag = [ MISSING_VAL for _ in range(flag_length) ]
for idx, val in enumerate(starting_flag):
full_flag[idx] = val
for _ in range(MAX_ROUNDS):
search_flag(full_flag, int_lines)
After refining guesses and cross-validating against all XOR outputs...
Recovered flag is: DUCTF{the_hungry_little_p_smooth_caterpillar_wow_an_allegory_for_life}
Attachement
My full script:
import os
import string
alphabet = list(string.ascii_lowercase)
alphabet.append("_")
MISSING_VAL = "*"
MAX_ROUNDS = 1
# flag[1] is ord("U") -> so this is the prefix DUCTF
def xor(a, b):
return bytes(left ^ right for left, right in zip(a, b))
def prefix_to_int_list(flag: str) -> list[int]:
return [ ord(letter) for letter in flag ]
def get_keystream_prefix(prefix_flag: list[int], first_line: list[int]) -> list[int]:
return list(xor(prefix_flag, first_line))
def convert_to_text(num_arr: list[int]) -> str:
return "".join([ chr(val) for val in num_arr ])
def is_valid(character: chr) -> bool:
return character != MISSING_VAL
def get_input_lines(filename: str) -> list[list[int]]:
int_lines = []
with open("input.txt", "r") as file:
for line in file:
bytes_arr = bytes.fromhex(line)
int_lines.append(list(bytes_arr))
return int_lines
def is_valid_flag(flag: str) -> bool:
for letter in flag[6:-1]:
if (letter.isalpha() and letter.isupper()):
return False
if (not letter.isalpha()) and (letter not in ["_", "*"]):
return False
return True
def search_flag(full_flag: str, int_lines: list[list[int]]) -> None:
flag_string = "".join(full_flag)
prefix_flag = flag_string.split(MISSING_VAL)[0]
prefix_int_arr = prefix_to_int_list(prefix_flag)
first_line = int_lines[0]
keystream_prefix = get_keystream_prefix(first_line, prefix_int_arr)
lines = []
for idx, line in enumerate(int_lines):
result = list(xor(line, keystream_prefix))
text = convert_to_text(result)
temp_flag = ""
for val in text:
temp_flag += val
if idx > 0:
temp_flag += "".join([MISSING_VAL for _ in range(idx)])
for idx, val in enumerate(temp_flag):
if (idx < len(prefix_flag)) or (idx >= len(full_flag)):
continue
if full_flag[idx] == MISSING_VAL:
full_flag[idx] = val
elif is_valid(val) and (full_flag[idx] != val):
print(f"Collision at idx: { idx } of { full_flag[idx] } with val: { val }")
part_flag = ("".join(full_flag)).strip()
if is_valid_flag(part_flag):
print(f"Part flag: { part_flag }")
def main():
int_lines = get_input_lines("input.txt")
flag_length = int(len(int_lines[0]) / 7) # It is the flag + len(flag) * 6 random characters
print(f"Flag length is: { flag_length }")
for letterA in alphabet:
for letterB in alphabet:
starting_flag = "DUCTF{the_h" + letterA + "n" + letterB
full_flag = [ MISSING_VAL for _ in range(flag_length) ]
for idx, val in enumerate(starting_flag):
full_flag[idx] = val
for _ in range(MAX_ROUNDS):
search_flag(full_flag, int_lines)
if __name__ == "__main__":
main()