DownUnderCTF 2022 – ezpz-rev

This was a fun little break from all the school work piling up. School is tiring, just like something else...

Challenge Description

CTFing is tiring. Take a break with this easy puzzle!
Note: The binary and server in this challenge are the same as "ezpz-pwn".

The description is rather terse. But it seems like the same binary is used for two challenges.

The binary is available on the DownUnderCTF repository. If you want to have a stab at the challenge, go for it!

Writeup

As with all reverse challenges, let’s first throw the executable into a decompiler and see what crops up. I'll be using Ghidra, a free open-source decompiler perfect for ezpz challenges.

Navigating to main, we see a function call, a gets (for reading user input), and a chain of four if-statements barring our way to some statements which print out the contents of flag-rev.txt (presumably the flag for this reverse challenge).

The most important functions to crack are the four guarding the file read. But we’ll take a look at the initialisation procedures first since they provide some helpful context towards understanding the program. Along the way, we’ll incrementally build a z3 logic solver to emulate/reverse each function.

Initialisation

The first initialisation function modifies some options for the stdin/stdout buffers. Pretty standard C-style CTF stuff. The second initialisation function is much more interesting. Here’s a screenshot of the decompiled program:

Decompilers are great and all, but the lack of useful variable names, comments, and readability can be a pain. Here’s a cleaned up version:

char* init_run_length_encoded(char* initcode) {
    char* data = malloc(0xc4);
    int ind = 0;    // Index used to insert into data. Serves the dual purpose of tracking the size of the data.
    for (int i = 0; initcode[i]; i += 2) {
        int length = initcode[i] - '0'; // Convert digit (char) to int.
        char c = initcode[i+1]; // Character to fill.
        
        // Unpack run-length atom into `data`.
        while (length--) {
            data[ind++] = c;
        }
    }
    return data;
}

In effect, the function initialises an array of chars by decoding a run-length encoded string. For example, the string 5a4b3c2d gets expanded to aaaaabbbbcccdd. Of course, the encoded string used is much longer, and it turns out that it fills all 196 (0xc4) chars allocated. There are a few more things I’d like to point out:

196 (0xc4) is a number that pops up a lot in this binary.
Although typical run-length encoded strings may have more than 9 digits packed together (e.g. 12c13e42f), the implementation above assumes a run-length of at most 9 (i.e. one digit).
Interestingly, the characters encoded only range from a to n. We’ll see how these come into play later on.

We can begin writing our solve script! Let’s perform this initialisation and decode the string into a chars variable.

encoded = '5a4b3c2d5a4b4c1d2a1e3a3b4c1d2a1e1f2a3b3c1g1d3e5f1b2c2g1d1f2e4f6g1d7f3h3g1d4f6h3g1d2f1i4j1h2k2l1m1d3i2j5k2l2m2i3j4k3l2m2i3j4k3l2m1i5j2k2n2l2m1i4j5n4l'
chars = ''
for i in range(len(encoded) // 2):
    skip, char = encoded[2*i], encoded[2*i+1]   # Select pair of characters.
    chars += int(skip) * char                   # Unpack run-length atom.

assert len(chars) == 0xc4

The First and Second Guards: Move Along Now, Nothing to See Here

If the initialisation procedures are the appetiser, then the four guards are the main course.

I’ll be using the term guards and tests interchangeably. Guards are a term I’m borrowing from CS and functional programming. It’s basically an elevated if-statement.

Each guard checks if the input satisfies some condition. If the condition passes, the guard function returns 1, else 0. To get to the delicious dessert (the flag :P), we want all four functions to return 1.

Easier said than done. Let’s start with the first function and see what it decompiles into.

Here’s a more readable version:

int test1(char* input) {
    for (int start = 0; start < 196; start += 14) {
        int count = 0;
        for (int i = 0; i < 14; i++) {
            if (input[start + i] == '1')
                count++;
        }
        if (count != 3) {
            return 0;
        }
    }
    return 1;
}

Essentially, it groups input into blocks of 14 chars; then it checks if the number of '1's in each block is exactly three. If any check is false, then the function returns 0, meaning we have failed miserably.

We can easily model this symbolically in z3. We’ll first create 196 symbols, each representing one character in input. Then, we'll create a z3 solver object: this is our interface to z3’s automagical solver. Through the solver, we can add constraints, remove constraints, generate concrete values, etc.

# Create a bunch of symbols.
inputs = [z3.Int(f'g_{i}') for i in range(196)]

# Create a constraint solver object.
s = z3.Solver()

# Constrain possible input. We just care about the 1s, really.
for sym in inputs:
    s.add(z3.Or(sym == 0, sym == 1))

size = 14

# Constraint 1: row-wise constraint.
for i in range(size):
    s.add(z3.Sum(inputs[size*i:size*(i+1)]) == 3)

Our simplest constraint is our input constraint. For this program, we’ll limit our symbols to take on only 0s and 1s. Why? Because it is easy for z3 to sum integer symbols. This way, if we want to count the number of 1s in a list, we can just call z3.Sum on it.

We then add the constraints of the first guard by selecting groups of 14 elements using Python’s slice notation inputs[size*i:size*(i+1)] and constrain it to 3.

Nifty! Let’s see what the second function looks like.

This looks surprisingly similar to the first test. In fact, the only thing that really changed is the order of the loops. Observe how the first function moves along the grid in a row-major fashion, whereas the second function moves in a column-major fashion.

It may begin to dawn on you that the nested loops are hinting at a grid-like structure, a 14-by-14 2D array of characters. This realisation is pretty crucial, as it’ll help us reverse our last two functions much faster.

Let’s add these constraints to our solve script:

# Constraint 2: col-wise constraint.
for i in range(size):
    s.add(z3.Sum(inputs[i::size]) == 3)

inputs[i::size] is Python slice notation to iterate beginning from i and picking every size elements, e.g. [*range(10)][1::3] == [1, 4, 7].

Onto the next function!

The Third Guard: Please Wear Your Mask

Using our vectorised 1D representation, we can simplify the above to:

int test3(char* chars, char* input) {
    int counter[14] = {0};
    // Mask and count characters.
    for (int i = 0; i < 196; i += 1) {
        if (input[i] == '1') {
            counter[chars[i] - 'a']++;  // Increment the count of `chars[i]`.
        }
    }
    // Bad bad if a letter doesn't appear exactly thrice.
    for (int i = 0; i < 14; i++) {
        if (counter[i] != 3) {
            return 0;
        }
    }
    return 1;
}

This initialises an array of 14 integers that serve as counters. This is followed by two loops. In the first loop, we mask the input and initialised characters, then count the number occurrences of each character. In the second loop, we simply check that all counters equal 3, meaning each letter appears exactly three times in the masked string.

By “mask”, I mean something like bitwise-and, but for characters. For example, abcdefg masked with 1011001 would be a cd g, as only those four letters have 1 associated with them. Alternatively, you can think of the 0s as masks hiding letters from sight, just like how face masks hide mouths.

This is the first of the four functions which actually combines both the user input and the character grid used in initialisation. The previous two functions performed checks only on the user input, but now we begin to see how the two relate.

Also recall that the letters initialised range from a to n, and in fact n is the 14th letter of the alphabet. 14 rows, 14 columns, 14 letters, checking if everything equals 3… seems a little sus. Maybe the challenge authors are hinting at the number 42, the answer to life and everything in the universe? Is this the true reverse behind this challenge?

Anyway, let’s try to add these constraints to z3. In proper reverse fashion, instead of first checking the user input like the original function, we’ll start with the letters. We’ll collect the indices containing a letter (say 'a'), then collect all the input symbols associated with those indices, sum them, and constrain them to three.

# Constraint 3: mask constraint.
letters = 'abcdefghijklmn'
for l in letters:
    # Find all indices in chars which have letter `l`, then sum the corresponding Int symbols from `inputs`.
    indices = [i for i, c in enumerate(chars) if c == l]
    s.add(z3.Sum([inputs[i] for i in indices]) == 3)

Great! Three down, one to go.

The Fourth Guard: Wonky Jumps and Guesswork

Here's a snippet of the fourth guard:

The first half is understandable... But the rest... Hoh! What an eye sore! What’s the jump doin— I can’t even— ugnahfpwifhlqufjcjwalfhwowjrvkaufjwp.

As far as my monkey brain understands, the function iterates through every grid cell, and if it equals 1, performs some complicated check in an inner loop. At this point, I tried to understand what the inner loop was doing. Not much success. Looking at the decompiled output, keep in mind that 0xfffffff1 is only the unsigned interpretation. There is also the signed interpretation, which in this case is -15.

I then trained my eyes on the other "magic constants" in the inner loop and focused all my reverse powers by staring the numbers down. -1, -15, -14, -13, 1, 15, 14, 13. Hmm…

Ooh! Perhaps it’s checking adjacent cells orthodiagonally (i.e. orthogonal + diagonal)? If you’ve implemented a Minesweeper game before, you’ll know where I’m going :P. Indexing i - 14 would pick the cell above cell i, since each row is 14 cells wide. Similarly, i + 14 would pick the cell below. i + 15 would pick the bottom-right cell, and i - 1 would pick the left cell. Interesting!

The other complicated checks, I presume, are out-of-bound checks. For example, if we’re at i == 0 (the top-left corner), then the only adjacent cells are i + 1, i + 14, and i + 15. Anything else would go outside the array. After guessing this, I didn’t bother figuring out whether the decompiled loops actually follow this logic. ~~Terrible software engineering practice, but eh, we’re playing reverse, so might as well try it.~~

To the solver! For each symbol, we’ll add constraints saying, “If this cell is 1, its surrounding cells can’t be 1.” We’ll use z3.Implies to encode this logic. We only consider cells orthodiagonal to the current cell and within grid bounds.

# Constraint 4: check orthodiagonal adjacents.
for index, sym in enumerate(inputs):
    x = index % size
    y = index // size

    # Build list of adjacent indices. Bounds checking mania.
    orthodiag = []
    if y > 0:
        orthodiag += [-14]
        if x > 0:
            orthodiag += [-15]
        if x < size - 1:
            orthodiag += [-13]
    if y < size - 1:
        orthodiag += [14]
        if x > 0:
            orthodiag += [13]
        if x < size - 1:
            orthodiag += [15]
    if x > 0:
        orthodiag += [-1]
    if x < size - 1:
        orthodiag += [1]
    
    # If this cell is 1 --> surrounding cells can't be one.
    for od in orthodiag:
        s.add(z3.Implies(sym == 1, inputs[index + od] != 1))

And with that, we’ve defeated the four guards! Huzzah!

The Final Touch

All that’s left is to let z3 solve our constraints and generate concrete values for our symbols. We’ll push these concrete values into a payload variable and print it out.

# Construct payload.
payload = ''
m = s.model()
for sym in inputs:
    payload += str(m.eval(sym).as_long())

print(payload)
# 0101000000001000000101010000101000000001000000000101000110100000000100000010101000000100000000100100001010100000001000000010101000101000000000000000101010010101000000000000000001010100010101000000

We obtained a z3 model from the solver by calling s.model() and evaluated our symbols to produce concrete integer 1s and 0s by calling m.eval(sym).as_long().

After feeding this to the server, we obtain the flag. Yippee!

For an “easy” challenge, I definitely spent too much time on this. I wasted some time switching between a pure z3 logic solver and a full-blown angr solver, finding out angr was taking too long, and spamming Ctrl+Z. I could’ve spent this time sleeping instead. 😕

One of the difficulties of reverse is trying to understand both the high-level aspects and low-level aspects. Sometimes skimming through the program structure helps. Other times, getting deep into the nitty-gritty aspects may help us recognise some familiar elements. This challenge was a fun little exercise to practice these two approaches.

Solve Script

Flag

Payload

0101000000001000000101010000101000000001000000000101000110100000000100000010101000000100000000100100001010100000001000000010101000101000000000000000101010010101000000000000000001010100010101000000

Flag