Overview
pact is a two-part challenge based around reversing a bytecode-based virtual machine architecture called “pactvm”. The first challenge is to reverse a predefined flag-checker program written for the Pact VM. The second challenge is to automatically reverse engineer a series of randomly-generated Pact programs.
Part 1 - Bubble Wrap
Problem Description
- Solves: 9
- Score: 423
- Category: rev
You look stressed, here’s some bubble wrap to play with.
Author: jrm0xE
Difficulty: Hard
Attachments:
bubblewrap.pactb
7.67 KB MD5: b418fffa6f292bc1c8cd3884bdab4066pactvm
46.26 KB MD5: 578c21735cd31e8537b9063268eef5f4
Introduction
We’re given a stripped Linux 64-bit binary called pactvm
. When run without arguments, it provides this usage message:
Usage ./pactvm input.pactb
Giving it the bubblewrap.pactb
file produces this prompt:
Flag Validator:
If we type in some text, we get Try again :(
. So, we have to give this program the correct flag as input.
Browsing bubblewrap.pactb
in a hex editor, we can spot the strings Flag Validator:
, Congrats!
and Try again :(
, along with several random 10-character
strings like cdarzowkky
, bldbefsarc
, xpklorelln
, nwlrbbmqbh
, etc.
Reversing pactvm
The main task is to reverse pactvm
and figure out the file format of the pactb
files, as well as how the virtual machine works. I used Ghidra for reversing. All function names given below, aside from standard libc
functions, are guesses based on functionality.
The program is not obfuscated, merely stripped, so reversing proceeds relatively smoothly. There is
quite a lot of functionality to reverse; for example, the program allocates and deallocates memory almost exclusively through the function at 401bf0
(35 cross-references), which takes a memory address, old size, and new size, and performs the functions of free
(if the new size is zero), malloc
(if the address and old size are zero), or realloc
(if the address, old size, and new size are all nonzero). This function also does a bunch of other custom-allocation stuff that I did not fully reverse or understand.
The main VM loop is the function at 404bc0
, which contains a 45-way switch statement for each of the opcodes in the VM architecture. The VM is structured as a stack machine: most opcodes pop values off of the stack, perform some operations on those values, and then (optionally) push new values onto the stack, using the functions stack_push
(407990
) and stack_pop
(408380
).
Stack values are 16-byte tagged union structures, with a 64-bit type tag and a 64-bit value. Handily, one of the VM opcodes, 0x09, calls print_value
(403790
) which prints a human-readable representation of a stack value, which made it easy to figure out the VM’s type system.
There are two types of tagged-union structures in the VM: “objects” and “values”. Values live on the stack, and have the following tags:
- 0:
bool
, LSB of value indicatestrue
orfalse
- 1:
nil
- 2:
char
, LSB of value is a signed character - 3:
int
, value is a 64-bit signed int - 4:
float
, value is adouble
- 5:
object
, value is a pointer to anobject
structure
Objects are heap-allocated via 401bf0
, and are variable-length with a common header { int type; object *next; }
. Type tags are as follows, per print_object
(402c90
) and reversing the places where the object type is used:
- 0:
function
, structure contains argument count, bytecode, constants, function name, etc. - 1:
closure
, structure a reference to the code, and an array of “upvals”, which are references to values in the closure’s context - 2:
native_fn
, structure contains a function pointer to a VM-callable function implemented in C - 3:
string
, structure contains a length,char *
pointer, and a 32-bit hash. This structure is used extensively throughout the VM, e.g. as hash table keys - 4:
upvalue
. Did not bother reversing this structure because no VM programs use upvalues - 5:
class
, structure contains a name and a hash table of class variables - 6:
instance
, structure contains a reference to the class and a hash table of instance variables - 7:
array
, structure contains a length, capacity, and a pointer to avalue
array - 8:
method
, structure contains a value (presumably the class instance on which a method is called?), and a pointer to a closure.
With the basic data types of the VM reversed, we can identify the basic design of the VM. It’s a strict stack machine, with no registers (apart from a program counter). There are two global hash tables: one hash table which stores interned strings (presumably to reduce allocation of string objects), and one hash table which stores global variables. There’s also a separate stack which holds call frames.
The pactb
file format starts with an array of 14 strings which are treated as “imports” of the 14 built-in native functions (403a80
, 403ac0
, …). Effectively, this gives these 14 functions a unique (obfuscated) name, which helps to obfuscate the pactb
(these are the cdarzowkky
, bldbefsarc
, etc. strings seen in the file). These 14 functions are actually the following:
- 0:
clock
, gets the currentclock()
value divided by 1000000 - 1:
append
, appends a value to an array - 2:
remove
, removes a value from an array - 3:
input
, prints an optional string prompt and then reads a line of text from the user - 4:
len
, obtains the length of a string or array - 5:
range
, works like the Python 2.x range function, producing an array of integers in a given range - 6:
typeof
, produces a string representing the type of the value (bool
,nil
,char
, etc.) - 7:
chr
, obtains achar
for the givenint
- 8:
ord
, obtains anint
from the givenchar
or first character of the givenstring
- 9:
int
, obtains anint
from the givenchar
orfloat
- 10:
bytes
, turns an array ofchar
s into astring
- 11:
bytearray
, turns a string into an array ofchar
s - 12:
zeros
, creates an array of zeroint
s of the given length - 13:
exit
, calls the Cexit
function.
After the assignment of native functions to names, the pactb
contains a serialized function
object representing the script. This contains the number of argument, number of upvals, bytecode array, line number table (with one line number for each byte of the bytecode), an array of constants (serialized values), and a function name. The line numbers are used by print_error
(407a00
), which prints a backtrace by walking the call frame stack and printing out line numbers and function names.
The constants will be values of different types, and can also include nested function
objects corresponding to functions defined within the script.
So, with all of the file format understood, and the general structure of the VM reversed, the only thing left to do is to reverse each of the instructions. There are quite a lot, and some are somewhat hard to understand, so instead of reversing all of them, I just reversed the ones that are used by bubblewrap
and the programs from Search and Rescue. The final disassembler looks like this:
import struct
from ctypes import c_char
from dataclasses import dataclass
from typing import Any, List, Optional
@dataclass
class Code:
nargs: int
nupvals: int
bytecode: bytes
linenos: List[int]
consts: List[Any]
name: Optional[bytes]
@classmethod
def from_file(cls, f: "FileReader") -> "Code":
nargs = f.read_u32()
nupvals = f.read_u32()
codelen = f.read_u32()
bytecode = f.read_bytes(codelen)
linenos = [f.read_u32() for _ in range(codelen)]
nconsts = f.read_u32()
consts = []
for _ in range(nconsts):
val = f.read_val()
consts.append(val)
named = f.read_u8()
if named:
name = f.read_obj()
else:
name = None
return Code(nargs, nupvals, bytecode, linenos, consts, name)
class FileReader:
def __init__(self, f):
self.f = f
def read_fmt(self, fmt):
return struct.unpack(fmt, self.f.read(struct.calcsize(fmt)))
def read_u32(self):
return self.read_fmt("<I")[0]
def read_u8(self):
return self.read_fmt("<B")[0]
def read_u64(self):
return self.read_fmt("<Q")[0]
def read_double(self):
return self.read_fmt("<d")[0]
def read_bytes(self, n):
start = self.f.tell()
res = self.f.read(n)
assert len(res) == n, "Failed to read %d bytes at %d" % (n, start)
return res
def read_str(self):
sz = self.read_u32()
return self.read_bytes(sz).decode("latin1")
def read_obj(self) -> Any:
type = self.read_u32()
if type == 3:
return self.read_str()
elif type == 0:
return Code.from_file(self)
else:
raise Exception("Invalid object type %d" % type)
def read_val(self) -> Any:
type = self.read_u32()
if type == 0:
return bool(self.read_u8())
elif type == 1:
self.read_u64()
return None
elif type == 2:
return c_char(self.read_u8())
elif type == 3:
return self.read_u64()
elif type == 4:
return self.read_double()
elif type == 5:
return self.read_obj()
else:
raise Exception("Invalid value type %d" % type)
def disas(natives, code: Code):
if code.name:
print(f"{code.name}:")
print(f"[{code.nargs} args, {code.nupvals} upvals]")
pc = 0
while pc < len(code.bytecode):
print(f" 0x{pc:04x}:", end=' ')
opcode = code.bytecode[pc]
pc += 1
if opcode == 0:
print("return")
elif opcode == 1:
disp, = struct.unpack(">H", code.bytecode[pc:pc+2])
pc += 2
dest = pc + disp
print(f"if not(top()): goto(0x{dest:04x})")
elif opcode == 2:
disp, = struct.unpack(">H", code.bytecode[pc:pc+2])
pc += 2
dest = pc + disp
print(f"goto(0x{dest:04x})")
elif opcode == 3:
disp, = struct.unpack(">H", code.bytecode[pc:pc+2])
pc += 2
dest = pc - disp
print(f"goto(0x{dest:04x})")
elif opcode == 4:
count = code.bytecode[pc]
pc += 1
print(f"args = popn({count}); pop()(*args)")
elif opcode == 7:
idx = code.bytecode[pc]
pc += 1
subcode = code.consts[idx]
assert isinstance(subcode, Code)
assert subcode.nupvals == 0, "upvals not supported"
print(f"push(new_closure(<Code {subcode.name or id(subcode)}>))")
elif opcode == 9:
print(f"print(pop())")
elif opcode == 10:
print(f"push(not pop())")
elif opcode == 11:
print(f"push(-pop())")
elif opcode == 12:
print(f"push(pop() + pop())")
elif opcode == 13:
print(f"a, b = popn(2); push(a - b)")
elif opcode == 16:
idx = code.bytecode[pc]
pc += 1
obj = code.consts[idx]
print(f"push({obj!r})")
elif opcode == 17:
print(f"push(None)")
elif opcode == 20:
print(f"pop()")
elif opcode == 21:
idx = code.bytecode[pc]
pc += 1
print(f"push(frame[{idx}])")
elif opcode == 22:
idx = code.bytecode[pc]
pc += 1
print(f"frame[{idx}] = top()")
elif opcode == 25:
idx = code.bytecode[pc]
pc += 1
name = code.consts[idx]
if name in natives:
name = natives[name]
print(f"push({name})")
elif opcode == 26:
idx = code.bytecode[pc]
pc += 1
name = code.consts[idx]
print(f"{name} = top()")
elif opcode == 27:
idx = code.bytecode[pc]
pc += 1
name = code.consts[idx]
print(f"{name} = pop()")
elif opcode == 28:
idx = code.bytecode[pc]
pc += 1
name = code.consts[idx]
print(f"tbl, obj = popn(2); tbl.{name} = obj; push(obj)")
elif opcode == 29:
idx = code.bytecode[pc]
pc += 1
name = code.consts[idx]
print(f"obj = pop(); push(obj.{name}))")
elif opcode == 31:
print(f"push(pop() == pop())")
elif opcode == 32:
print(f"push(pop() < pop())")
elif opcode == 33:
print(f"push(pop() > pop())")
elif opcode == 34:
idx = code.bytecode[pc]
pc += 1
name = code.consts[idx]
print(f"push(new_class({name!r}))")
elif opcode == 36:
idx = code.bytecode[pc]
pc += 1
name = code.consts[idx]
print(f"tbl, obj = popn(2); tbl.{name} = obj; push(obj)")
elif opcode == 37:
count = code.bytecode[pc]
pc += 1
print(f"push(new_array(popn({count})))")
elif opcode == 38:
print(f"arr, idx = popn(2); push(arr[idx])")
elif opcode == 39:
print(f"arr, idx, obj = popn(3); arr[idx] = obj; push(obj)")
elif opcode == 40:
print(f"push(pop() ^ pop())")
elif opcode == 42:
print(f"push(pop() & pop())")
elif opcode == 44:
print(f"a, b = popn(2); push(a >> b )")
else:
raise Exception("unknown opcode %d" % opcode)
real_natives = [
"clock",
"append",
"remove",
"input",
"len",
"range",
"typeof",
"chr",
"ord",
"int",
"bytes",
"bytearray",
"zeros",
"exit",
]
def read_pactb(fn):
f = FileReader(open(fn, "rb"))
# natives table
num_natives = f.read_u32()
assert num_natives == 14
natives = [f.read_obj() for _ in range(num_natives)]
natives = dict(zip(natives, real_natives))
program = f.read_val()
disas(natives, program)
for const in program.consts:
if isinstance(const, Code):
disas(natives, const)
import sys
read_pactb(sys.argv[1])
I deliberately chose to use a Pythonic syntax for the VM instructions, instead of the usual mnemonics, in order to make the semantics crystal-clear and make reversing easier.
Reversing bubblewrap
The disassembled bubblewrap.pactb
looks like this:
[0 args, 0 upvals]
0x0000: push(new_closure(<Code nwlrbbmqbh>))
0x0002: nwlrbbmqbh = pop()
0x0004: push(nwlrbbmqbh)
0x0006: args = popn(0); pop()(*args)
0x0008: push(0)
0x000a: push(pop() == pop())
0x000b: if not(top()): goto(0x001c)
0x000e: pop()
0x000f: push('Congrats!')
0x0011: print(pop())
0x0012: push(exit)
0x0014: push(0)
0x0016: args = popn(1); pop()(*args)
0x0018: pop()
0x0019: goto(0x0027)
0x001c: pop()
0x001d: push('Try again :(')
0x001f: print(pop())
0x0020: push(exit)
0x0022: push(1)
0x0024: args = popn(1); pop()(*args)
0x0026: pop()
0x0027: push(None)
0x0028: return
nwlrbbmqbh:
[0 args, 0 upvals]
0x0000: push(bytearray)
0x0002: push(input)
0x0004: push('Flag Validator: ')
0x0006: args = popn(1); pop()(*args)
0x0008: args = popn(1); pop()(*args)
0x000a: push(len)
0x000c: push(frame[1])
0x000e: args = popn(1); pop()(*args)
0x0010: push(36)
0x0012: push(pop() == pop())
0x0013: push(not pop())
0x0014: if not(top()): goto(0x001e)
0x0017: pop()
0x0018: push(1)
0x001a: return
0x001b: goto(0x001f)
0x001e: pop()
0x001f: push(range)
0x0021: push(len)
0x0023: push(frame[1])
0x0025: args = popn(1); pop()(*args)
0x0027: args = popn(1); pop()(*args)
0x0029: push(0)
0x002b: push(frame[3])
0x002d: push(len)
0x002f: push(frame[1])
0x0031: args = popn(1); pop()(*args)
0x0033: push(pop() > pop())
0x0034: if not(top()): goto(0x0058)
0x0037: pop()
0x0038: goto(0x0046)
0x003b: push(frame[3])
0x003d: push(1)
0x003f: push(pop() + pop())
0x0040: frame[3] = top()
0x0042: pop()
0x0043: goto(0x002b)
0x0046: push(frame[2])
0x0048: push(frame[3])
0x004a: push(int)
0x004c: push(frame[1])
0x004e: push(frame[3])
0x0050: arr, idx = popn(2); push(arr[idx])
0x0051: args = popn(1); pop()(*args)
0x0053: arr, idx, obj = popn(3); arr[idx] = obj; push(obj)
0x0054: pop()
0x0055: goto(0x003b)
0x0058: pop()
0x0059: pop()
0x005a: push(frame[2])
0x005c: push(1)
0x005e: arr, idx = popn(2); push(arr[idx])
0x005f: push(frame[2])
0x0061: push(15)
0x0063: arr, idx = popn(2); push(arr[idx])
0x0064: a, b = popn(2); push(a - b)
0x0065: push(17)
0x0067: push(-pop())
0x0068: push(pop() == pop())
0x0069: push(not pop())
0x006a: if not(top()): goto(0x0074)
0x006d: pop()
0x006e: push(1)
0x0070: return
0x0071: goto(0x0075)
0x0074: pop()
0x0075: push(frame[2])
0x0077: push(25)
0x0079: arr, idx = popn(2); push(arr[idx])
0x007a: push(frame[2])
0x007c: push(22)
0x007e: arr, idx = popn(2); push(arr[idx])
0x007f: push(pop() ^ pop())
0x0080: push(43)
0x0082: push(pop() == pop())
0x0083: push(not pop())
0x0084: if not(top()): goto(0x008e)
0x0087: pop()
0x0088: push(1)
0x008a: return
...
0x03ee: push(frame[2])
0x03f0: push(0)
0x03f2: arr, idx = popn(2); push(arr[idx])
0x03f3: push(frame[2])
0x03f5: push(21)
0x03f7: arr, idx = popn(2); push(arr[idx])
0x03f8: push(pop() + pop())
0x03f9: push(198)
0x03fb: push(pop() == pop())
0x03fc: push(not pop())
0x03fd: if not(top()): goto(0x0407)
0x0400: pop()
0x0401: push(1)
0x0403: return
0x0404: goto(0x0408)
0x0407: pop()
0x0408: push(0)
0x040a: return
0x040b: push(None)
0x040c: return
The script itself is simply the following code:
def nwlrbbmqbh():
...
if nwlrbbmqbh() == 0:
print("Congrats!")
exit(0)
else:
print("Try again :(")
exit(1)
The main nwlrbbmqbh
validation function looks like this:
def nwlrbbmqbh():
frame1 = bytearray(input("Flag Validator: "))
if len(frame1) != 36:
return 1
frame2 = range(len(frame1))
for frame3 in range(len(frame1)):
frame2[frame3] = int(frame1[frame3])
if frame2[1] - frame2[15] != -17:
return 1
if frame2[25] ^ frame2[22] != 43:
return 1
...
if frame2[0] + frame2[21] != 198:
return 1
return 0
Aha, it’s just a very long sequence of constraints on the flag. We can just solve this with Z3. I used a bunch of ugly regexes to turn the disassembly into the correct expressions:
\w+: push\(frame\[2\]\)\n \w+: push\((\d+)\)\n \w+: arr, idx = popn\(2\); push\(arr\[idx\]\)
=>
input[\1]
\w+: push\(pop\(\) == pop\(\)\)\n \w+: push\(not pop\(\)\)\n \w+: if not\(top\(\)\): goto\(\w+\)\n \w+: pop\(\)\n \w+: push\(1\)\n \w+: return\n \w+: goto\(\w+\)\n \w+: pop\(\)\n
=>
\n
(\w+): push\((\d+)\)\n \w+: push\(-pop\(\)\)
=>
\1: push(-\2)
input\[(\d+)\]\n input\[(\d+)\]\n \w+: a, b = popn\(2\); push\(a - b\)\n \w+: push\((-?\d+)\)\n
=>
s.add(input[\1] - input[\2] == \3)
input\[(\d+)\]\n input\[(\d+)\]\n \w+: push\(pop\(\) ([+^]) pop\(\)\)\n \w+: push\((-?\d+)\)\n
=>
s.add(input[\1] \3 input[\2] == \4)
This produces a nice list of Z3-compatible expressions, which we can wrap into a quick script:
from z3 import BitVec, Solver
s = Solver()
input = [BitVec("b%d" % i, 8) for i in range(36)]
# regex-extracted expressions:
s.add(input[1] - input[15] == -17)
s.add(input[25] ^ input[22] == 43)
s.add(input[19] - input[14] == 8)
s.add(input[29] - input[34] == -5)
s.add(input[23] + input[21] == 219)
s.add(input[24] + input[12] == 200)
s.add(input[35] ^ input[25] == 9)
s.add(input[14] ^ input[27] == 62)
s.add(input[22] + input[8] == 190)
s.add(input[3] + input[26] == 206)
s.add(input[32] ^ input[34] == 50)
s.add(input[21] ^ input[23] == 19)
s.add(input[7] + input[10] == 212)
s.add(input[2] + input[10] == 227)
s.add(input[17] - input[35] == -10)
s.add(input[5] + input[18] == 199)
s.add(input[15] ^ input[1] == 23)
s.add(input[30] ^ input[31] == 26)
s.add(input[18] - input[10] == -9)
s.add(input[9] - input[19] == 16)
s.add(input[31] + input[8] == 210)
s.add(input[4] - input[26] == 19)
s.add(input[10] - input[9] == -10)
s.add(input[13] + input[5] == 212)
s.add(input[6] ^ input[13] == 1)
s.add(input[28] ^ input[20] == 17)
s.add(input[34] - input[30] == 4)
s.add(input[11] ^ input[2] == 1)
s.add(input[16] + input[11] == 222)
s.add(input[8] ^ input[18] == 57)
s.add(input[20] ^ input[0] == 7)
s.add(input[27] ^ input[28] == 43)
s.add(input[26] - input[17] == -11)
s.add(input[12] ^ input[31] == 44)
s.add(input[33] - input[8] == 23)
s.add(input[0] + input[21] == 198)
# extra flag format constraints
s.add(input[0] == ord('b'))
s.add(input[1] == ord('c'))
s.check()
m = s.model()
print(bytes([m[p].as_long() for p in input]))
Note that we needed to constrain the first two bytes using the flag format to get a meaningful result. When we run this program, we get the flag:
bctf{are_you_satisfied_with_this_vm}
Part 2 - Search and Rescue
Problem Description
- Solves: 1
- Score: 500
- Category: rev
Search for the flag through the forest. The server provides hex dumps of files runnable with the vm from bubble_wrap.
nc ctf.b01lers.com 9304
Author: jrm0xE
Difficulty: Hard
Attachments:
pactvm
46.26 KB MD5: 578c21735cd31e8537b9063268eef5f4
Note that this is the same pactvm
binary as the bubblewrap
challenge.
Introduction
When we connect to the given server, we’re given an xxd
hexdump followed by a prompt:
00000000: 0e00 0000 0300 0000 0a00 0000 6279 6e65 ............byne
00000010: 6364 7967 6778 0300 0000 0a00 0000 7870 cdyggx........xp
00000020: 6b6c 6f72 656c 6c6e 0300 0000 0a00 0000 klorelln........
00000030: 6d70 6170 7166 776b 686f 0300 0000 0a00 mpapqfwkho......
00000040: 0000 706b 6d63 6f71 686e 776e 0300 0000 ..pkmcoqhnwn....
00000050: 0a00 0000 6b75 6577 6873 716d 6762 0300 ....kuewhsqmgb..
00000060: 0000 0a00 0000 6275 7163 6c6a 6a69 7673 ......buqcljjivs
...
00001fb0: 0000 0300 0000 0a00 0000 6d6c 6e6f 7a6a ..........mlnozj
00001fc0: 6b70 7170 0300 0000 0100 0000 0000 0000 kpqp............
00001fd0: 0500 0000 0300 0000 0a00 0000 6d6c 6e6f ............mlno
00001fe0: 7a6a 6b70 7170 0300 0000 0000 0000 0000 zjkpqp..........
00001ff0: 0000 00 ...
>
We can enter something, after which the connection just closes. Connecting again gives us a slightly different binary. Saving the hexdump, using xxd -r
to reverse it back into a binary, and running it with pactvm
produces the same >
prompt.
Reversing the Program
The disassembly of a sample program is quite long:
[0 args, 0 upvals]
0x0000: push(new_class('nwlrbbmqbh'))
0x0002: nwlrbbmqbh = pop()
0x0004: push(nwlrbbmqbh)
0x0006: push(new_closure(<Code init>))
0x0008: tbl, obj = popn(2); tbl.init = obj; push(obj)
0x000a: pop()
0x000b: push(new_closure(<Code hiddqscdxr>))
0x000d: hiddqscdxr = pop()
0x000f: push(new_closure(<Code jmowfrxsjy>))
0x0011: jmowfrxsjy = pop()
0x0013: push(new_closure(<Code bldbefsarc>))
0x0015: bldbefsarc = pop()
0x0017: push(bldbefsarc)
0x0019: args = popn(0); pop()(*args)
0x001b: q = pop()
0x001d: push('MJZt0FWGkaqpSGqMphfCpDqRgW')
0x001f: k = pop()
0x0021: push(input)
0x0023: push('> ')
0x0025: args = popn(1); pop()(*args)
0x0027: path = pop()
0x0029: push(jmowfrxsjy)
0x002b: push(q)
0x002d: args = popn(1); pop()(*args)
0x002f: root = pop()
0x0031: push(root)
0x0033: cur = pop()
0x0035: push(new_array(popn(0)))
0x0037: c = pop()
0x0039: push(0)
0x003b: push(frame[1])
0x003d: push(len)
0x003f: push(path)
0x0041: args = popn(1); pop()(*args)
0x0043: push(pop() > pop())
0x0044: if not(top()): goto(0x00cf)
0x0047: pop()
0x0048: goto(0x0056)
0x004b: push(frame[1])
0x004d: push(1)
0x004f: push(pop() + pop())
0x0050: frame[1] = top()
0x0052: pop()
0x0053: goto(0x003b)
0x0056: push(ord)
0x0058: push(path)
0x005a: push(frame[1])
0x005c: arr, idx = popn(2); push(arr[idx])
0x005d: args = popn(1); pop()(*args)
0x005f: push(0)
0x0061: push(frame[3])
0x0063: push(8)
0x0065: push(pop() > pop())
0x0066: if not(top()): goto(0x00c9)
0x0069: pop()
0x006a: goto(0x0078)
0x006d: push(frame[3])
0x006f: push(1)
0x0071: push(pop() + pop())
0x0072: frame[3] = top()
0x0074: pop()
0x0075: goto(0x0061)
0x0078: push(cur)
0x007a: obj = pop(); push(obj.l))
0x007c: push(None)
0x007d: push(pop() == pop())
0x007e: if not(top()): goto(0x0088)
0x0081: pop()
0x0082: push(cur)
0x0084: obj = pop(); push(obj.r))
0x0086: push(None)
0x0087: push(pop() == pop())
0x0088: if not(top()): goto(0x009f)
0x008b: pop()
0x008c: push(append)
0x008e: push(c)
0x0090: push(cur)
0x0092: obj = pop(); push(obj.v))
0x0094: args = popn(2); pop()(*args)
0x0096: pop()
0x0097: push(root)
0x0099: cur = top()
0x009b: pop()
0x009c: goto(0x00a0)
0x009f: pop()
0x00a0: push(frame[2])
0x00a2: push(1)
0x00a4: push(pop() & pop())
0x00a5: push(1)
0x00a7: push(pop() == pop())
0x00a8: if not(top()): goto(0x00b6)
0x00ab: pop()
0x00ac: push(cur)
0x00ae: obj = pop(); push(obj.r))
0x00b0: cur = top()
0x00b2: pop()
0x00b3: goto(0x00be)
0x00b6: pop()
0x00b7: push(cur)
0x00b9: obj = pop(); push(obj.l))
0x00bb: cur = top()
0x00bd: pop()
0x00be: push(frame[2])
0x00c0: push(1)
0x00c2: a, b = popn(2); push(a >> b )
0x00c3: frame[2] = top()
0x00c5: pop()
0x00c6: goto(0x006d)
0x00c9: pop()
0x00ca: pop()
0x00cb: pop()
0x00cc: goto(0x004b)
0x00cf: pop()
0x00d0: pop()
0x00d1: push(len)
0x00d3: push(c)
0x00d5: args = popn(1); pop()(*args)
0x00d7: push(len)
0x00d9: push(k)
0x00db: args = popn(1); pop()(*args)
0x00dd: push(pop() == pop())
0x00de: push(not pop())
0x00df: if not(top()): goto(0x00ed)
0x00e2: pop()
0x00e3: push(exit)
0x00e5: push(1)
0x00e7: args = popn(1); pop()(*args)
0x00e9: pop()
0x00ea: goto(0x00ee)
0x00ed: pop()
0x00ee: push(0)
0x00f0: push(frame[1])
0x00f2: push(len)
0x00f4: push(k)
0x00f6: args = popn(1); pop()(*args)
0x00f8: push(pop() > pop())
0x00f9: if not(top()): goto(0x0135)
0x00fc: pop()
0x00fd: goto(0x010b)
0x0100: push(frame[1])
0x0102: push(1)
0x0104: push(pop() + pop())
0x0105: frame[1] = top()
0x0107: pop()
0x0108: goto(0x00f0)
0x010b: push(c)
0x010d: push(frame[1])
0x010f: arr, idx = popn(2); push(arr[idx])
0x0110: push(k)
0x0112: push(frame[1])
0x0114: arr, idx = popn(2); push(arr[idx])
0x0115: push(pop() == pop())
0x0116: push(not pop())
0x0117: if not(top()): goto(0x0131)
0x011a: pop()
0x011b: push(c)
0x011d: push(frame[1])
0x011f: arr, idx = popn(2); push(arr[idx])
0x0120: print(pop())
0x0121: push(k)
0x0123: push(frame[1])
0x0125: arr, idx = popn(2); push(arr[idx])
0x0126: print(pop())
0x0127: push(exit)
0x0129: push(1)
0x012b: args = popn(1); pop()(*args)
0x012d: pop()
0x012e: goto(0x0132)
0x0131: pop()
0x0132: goto(0x0100)
0x0135: pop()
0x0136: pop()
0x0137: push(exit)
0x0139: push(0)
0x013b: args = popn(1); pop()(*args)
0x013d: pop()
0x013e: push(None)
0x013f: return
init:
[1 args, 0 upvals]
0x0000: push(frame[0])
0x0002: push(None)
0x0003: tbl, obj = popn(2); tbl.l = obj; push(obj)
0x0005: pop()
0x0006: push(frame[0])
0x0008: push(None)
0x0009: tbl, obj = popn(2); tbl.r = obj; push(obj)
0x000b: pop()
0x000c: push(frame[0])
0x000e: push(frame[1])
0x0010: tbl, obj = popn(2); tbl.v = obj; push(obj)
0x0012: pop()
0x0013: push(frame[0])
0x0015: push(0)
0x0017: tbl, obj = popn(2); tbl.f = obj; push(obj)
0x0019: pop()
0x001a: push(frame[0])
0x001c: return
hiddqscdxr:
[2 args, 0 upvals]
0x0000: push(nwlrbbmqbh)
0x0002: push(frame[2])
0x0004: args = popn(1); pop()(*args)
0x0006: push(frame[3])
0x0008: push(frame[2])
0x000a: obj = pop(); push(obj.f))
0x000c: tbl, obj = popn(2); tbl.f = obj; push(obj)
0x000e: pop()
0x000f: push(frame[1])
0x0011: push(None)
0x0012: push(pop() == pop())
0x0013: if not(top()): goto(0x001d)
0x0016: pop()
0x0017: push(frame[3])
0x0019: return
0x001a: goto(0x001e)
0x001d: pop()
0x001e: push(frame[1])
0x0020: push(frame[4])
0x0022: obj = pop(); push(obj.r))
0x0024: push(None)
0x0025: push(pop() == pop())
0x0026: push(not pop())
0x0027: if not(top()): goto(0x0036)
0x002a: pop()
0x002b: push(frame[4])
0x002d: obj = pop(); push(obj.r))
0x002f: obj = pop(); push(obj.f))
0x0031: push(frame[3])
0x0033: obj = pop(); push(obj.f))
0x0035: push(pop() > pop())
0x0036: if not(top()): goto(0x0044)
0x0039: pop()
0x003a: push(frame[4])
0x003c: obj = pop(); push(obj.r))
0x003e: frame[4] = top()
0x0040: pop()
0x0041: goto(0x0020)
0x0044: pop()
0x0045: push(frame[3])
0x0047: push(frame[4])
0x0049: obj = pop(); push(obj.r))
0x004b: tbl, obj = popn(2); tbl.r = obj; push(obj)
0x004d: pop()
0x004e: push(frame[4])
0x0050: push(frame[3])
0x0052: tbl, obj = popn(2); tbl.r = obj; push(obj)
0x0054: pop()
0x0055: push(frame[1])
0x0057: return
0x0058: push(None)
0x0059: return
jmowfrxsjy:
[1 args, 0 upvals]
0x0000: push(None)
0x0001: push(0)
0x0003: push(frame[3])
0x0005: push(len)
0x0007: push(frame[1])
0x0009: args = popn(1); pop()(*args)
0x000b: push(pop() > pop())
0x000c: if not(top()): goto(0x0052)
0x000f: pop()
0x0010: goto(0x001e)
0x0013: push(frame[3])
0x0015: push(1)
0x0017: push(pop() + pop())
0x0018: frame[3] = top()
0x001a: pop()
0x001b: goto(0x0003)
0x001e: push(frame[1])
0x0020: push(frame[3])
0x0022: arr, idx = popn(2); push(arr[idx])
0x0023: push(0)
0x0025: push(pop() == pop())
0x0026: push(not pop())
0x0027: if not(top()): goto(0x004e)
0x002a: pop()
0x002b: push(nwlrbbmqbh)
0x002d: push(chr)
0x002f: push(frame[3])
0x0031: args = popn(1); pop()(*args)
0x0033: args = popn(1); pop()(*args)
0x0035: push(frame[4])
0x0037: push(frame[1])
0x0039: push(frame[3])
0x003b: arr, idx = popn(2); push(arr[idx])
0x003c: tbl, obj = popn(2); tbl.f = obj; push(obj)
0x003e: pop()
0x003f: push(hiddqscdxr)
0x0041: push(frame[2])
0x0043: push(frame[4])
0x0045: args = popn(2); pop()(*args)
0x0047: frame[2] = top()
0x0049: pop()
0x004a: pop()
0x004b: goto(0x004f)
0x004e: pop()
0x004f: goto(0x0013)
0x0052: pop()
0x0053: pop()
0x0054: push(frame[2])
0x0056: push(None)
0x0057: push(pop() == pop())
0x0058: push(not pop())
0x0059: if not(top()): goto(0x0064)
0x005c: pop()
0x005d: push(frame[2])
0x005f: obj = pop(); push(obj.r))
0x0061: push(None)
0x0062: push(pop() == pop())
0x0063: push(not pop())
0x0064: if not(top()): goto(0x00b1)
0x0067: pop()
0x0068: push(frame[2])
0x006a: obj = pop(); push(obj.v))
0x006c: push(frame[2])
0x006e: obj = pop(); push(obj.r))
0x0070: frame[2] = top()
0x0072: pop()
0x0073: push(frame[2])
0x0075: obj = pop(); push(obj.v))
0x0077: push(frame[2])
0x0079: obj = pop(); push(obj.r))
0x007b: frame[2] = top()
0x007d: pop()
0x007e: push(nwlrbbmqbh)
0x0080: push(0)
0x0082: args = popn(1); pop()(*args)
0x0084: push(frame[5])
0x0086: push(frame[3])
0x0088: obj = pop(); push(obj.f))
0x008a: push(frame[4])
0x008c: obj = pop(); push(obj.f))
0x008e: push(pop() + pop())
0x008f: tbl, obj = popn(2); tbl.f = obj; push(obj)
0x0091: pop()
0x0092: push(frame[5])
0x0094: push(frame[3])
0x0096: tbl, obj = popn(2); tbl.l = obj; push(obj)
0x0098: pop()
0x0099: push(frame[5])
0x009b: push(frame[4])
0x009d: tbl, obj = popn(2); tbl.r = obj; push(obj)
0x009f: pop()
0x00a0: push(hiddqscdxr)
0x00a2: push(frame[2])
0x00a4: push(frame[5])
0x00a6: args = popn(2); pop()(*args)
0x00a8: frame[2] = top()
0x00aa: pop()
0x00ab: pop()
0x00ac: pop()
0x00ad: pop()
0x00ae: goto(0x0054)
0x00b1: pop()
0x00b2: push(frame[2])
0x00b4: obj = pop(); push(obj.v))
0x00b6: return
0x00b7: push(None)
0x00b8: return
bldbefsarc:
[0 args, 0 upvals]
0x0000: push(zeros)
0x0002: push(256)
0x0004: args = popn(1); pop()(*args)
0x0006: push(18)
0x0008: push(19)
0x000a: push(20)
0x000c: push(14)
0x000e: push(11)
0x0010: push(21)
0x0012: push(17)
0x0014: push(13)
0x0016: push(18)
0x0018: push(15)
0x001a: push(new_array(popn(10)))
0x001c: push(14)
0x001e: push(10)
0x0020: push(16)
0x0022: push(14)
0x0024: push(17)
0x0026: push(17)
0x0028: push(15)
0x002a: push(14)
0x002c: push(19)
0x002e: push(17)
0x0030: push(24)
0x0032: push(15)
0x0034: push(8)
0x0036: push(19)
0x0038: push(16)
0x003a: push(14)
0x003c: push(14)
0x003e: push(25)
0x0040: push(9)
0x0042: push(12)
0x0044: push(17)
0x0046: push(10)
0x0048: push(8)
0x004a: push(15)
0x004c: push(11)
0x004e: push(21)
0x0050: push(new_array(popn(26)))
0x0052: push(11)
0x0054: push(13)
0x0056: push(11)
0x0058: push(17)
0x005a: push(16)
0x005c: push(16)
0x005e: push(14)
0x0060: push(22)
0x0062: push(14)
0x0064: push(18)
0x0066: push(19)
0x0068: push(15)
0x006a: push(18)
0x006c: push(17)
0x006e: push(16)
0x0070: push(18)
0x0072: push(8)
0x0074: push(13)
0x0076: push(12)
0x0078: push(19)
0x007a: push(17)
0x007c: push(19)
0x007e: push(18)
0x0080: push(12)
0x0082: push(12)
0x0084: push(17)
0x0086: push(new_array(popn(26)))
0x0088: push(ord)
0x008a: push('0')
0x008c: args = popn(1); pop()(*args)
0x008e: push(frame[5])
0x0090: push(ord)
0x0092: push('9')
0x0094: args = popn(1); pop()(*args)
0x0096: push(pop() < pop())
0x0097: push(not pop())
0x0098: if not(top()): goto(0x00bf)
0x009b: pop()
0x009c: goto(0x00aa)
0x009f: push(frame[5])
0x00a1: push(1)
0x00a3: push(pop() + pop())
0x00a4: frame[5] = top()
0x00a6: pop()
0x00a7: goto(0x008e)
0x00aa: push(frame[1])
0x00ac: push(frame[5])
0x00ae: push(frame[2])
0x00b0: push(frame[5])
0x00b2: push(ord)
0x00b4: push('0')
0x00b6: args = popn(1); pop()(*args)
0x00b8: a, b = popn(2); push(a - b)
0x00b9: arr, idx = popn(2); push(arr[idx])
0x00ba: arr, idx, obj = popn(3); arr[idx] = obj; push(obj)
0x00bb: pop()
0x00bc: goto(0x009f)
0x00bf: pop()
0x00c0: pop()
0x00c1: push(ord)
0x00c3: push('a')
0x00c5: args = popn(1); pop()(*args)
0x00c7: push(frame[5])
0x00c9: push(ord)
0x00cb: push('z')
0x00cd: args = popn(1); pop()(*args)
0x00cf: push(pop() < pop())
0x00d0: push(not pop())
0x00d1: if not(top()): goto(0x00f8)
0x00d4: pop()
0x00d5: goto(0x00e3)
0x00d8: push(frame[5])
0x00da: push(1)
0x00dc: push(pop() + pop())
0x00dd: frame[5] = top()
0x00df: pop()
0x00e0: goto(0x00c7)
0x00e3: push(frame[1])
0x00e5: push(frame[5])
0x00e7: push(frame[3])
0x00e9: push(frame[5])
0x00eb: push(ord)
0x00ed: push('a')
0x00ef: args = popn(1); pop()(*args)
0x00f1: a, b = popn(2); push(a - b)
0x00f2: arr, idx = popn(2); push(arr[idx])
0x00f3: arr, idx, obj = popn(3); arr[idx] = obj; push(obj)
0x00f4: pop()
0x00f5: goto(0x00d8)
0x00f8: pop()
0x00f9: pop()
0x00fa: push(ord)
0x00fc: push('A')
0x00fe: args = popn(1); pop()(*args)
0x0100: push(frame[5])
0x0102: push(ord)
0x0104: push('Z')
0x0106: args = popn(1); pop()(*args)
0x0108: push(pop() < pop())
0x0109: push(not pop())
0x010a: if not(top()): goto(0x0131)
0x010d: pop()
0x010e: goto(0x011c)
0x0111: push(frame[5])
0x0113: push(1)
0x0115: push(pop() + pop())
0x0116: frame[5] = top()
0x0118: pop()
0x0119: goto(0x0100)
0x011c: push(frame[1])
0x011e: push(frame[5])
0x0120: push(frame[4])
0x0122: push(frame[5])
0x0124: push(ord)
0x0126: push('A')
0x0128: args = popn(1); pop()(*args)
0x012a: a, b = popn(2); push(a - b)
0x012b: arr, idx = popn(2); push(arr[idx])
0x012c: arr, idx, obj = popn(3); arr[idx] = obj; push(obj)
0x012d: pop()
0x012e: goto(0x0111)
0x0131: pop()
0x0132: pop()
0x0133: push(frame[1])
0x0135: push(ord)
0x0137: push('{')
0x0139: args = popn(1); pop()(*args)
0x013b: push(13)
0x013d: arr, idx, obj = popn(3); arr[idx] = obj; push(obj)
0x013e: pop()
0x013f: push(frame[1])
0x0141: push(ord)
0x0143: push('}')
0x0145: args = popn(1); pop()(*args)
0x0147: push(19)
0x0149: arr, idx, obj = popn(3); arr[idx] = obj; push(obj)
0x014a: pop()
0x014b: push(frame[1])
0x014d: push(ord)
0x014f: push('_')
0x0151: args = popn(1); pop()(*args)
0x0153: push(9)
0x0155: arr, idx, obj = popn(3); arr[idx] = obj; push(obj)
0x0156: pop()
0x0157: push(frame[1])
0x0159: return
0x015a: push(None)
0x015b: return
It’s a lot of code. Luckily, after reversing bubblewrap, we have a general feel for the structure of the code. This program uses classes, specifically a class called nwlrbbmqbh
, along with a constructor function called init
which creates instance variables l
, r
, v
, and f
. Combined with the problem description (“Search for the flag through the forest.”), I made a guess that this class represented a binary tree node with children l
and r
, and value v
.
Reversing this program took some time, but eventually resulted in the following Python implementation:
append = list.append
class nwlrbbmqbh:
def __init__(self, arg):
self.l = None
self.r = None
self.v = arg
self.f = 0
def main():
q = bldbefsarc()
k = 'MJZt0FWGkaqpSGqMphfCpDqRgW'
path = input('> ')
root = jmowfrxsjy(q)
cur = root
c = []
for f1 in range(len(path)):
val = ord(path[f1])
for f3 in range(8):
if cur.l is None and cur.r is None:
append(c, cur.v)
cur = root
if (val & 1) == 1:
cur = cur.r
else:
cur = cur.l
val >>= 1
if len(c) != len(k):
exit(1)
for f1 in range(len(k)):
if c[f1] != k[f1]:
print(c[f1])
print(k[f1])
exit(1)
exit(0)
def hiddqscdxr(f1, f2):
f3 = nwlrbbmqbh(f2)
f3.f = f2.f
if f1 is None:
return f3
f4 = f1
while f4.r is not None and f3.f > f4.r.f:
f4 = f4.r
f3.r = f4.r
f4.r = f3
return f1
def jmowfrxsjy(f1):
f2 = None
for f3 in range(len(f1)):
if f1[f3] != 0:
f4 = nwlrbbmqbh(chr(f3))
f4.f = f1[f3]
f2 = hiddqscdxr(f2, f4)
while f2 is not None and f2.r is not None:
f3 = f2.v
f2 = f2.r
f4 = f2.v
f2 = f2.r
f5 = nwlrbbmqbh(0)
f5.f = f3.f + f4.f
f5.l = f3
f5.r = f4
f2 = hiddqscdxr(f2, f5)
return f2.v
def bldbefsarc():
f1 = [0] * 256
f2 = [18, 19, 20, 14, 11, 21, 17, 13, 18, 15]
f3 = [14, 10, 16, 14, 17, 17, 15, 14, 19, 17, 24, 15, 8, 19, 16, 14, 14, 25, 9, 12, 17, 10, 8, 15, 11, 21]
f4 = [11, 13, 11, 17, 16, 16, 14, 22, 14, 18, 19, 15, 18, 17, 16, 18, 8, 13, 12, 19, 17, 19, 18, 12, 12, 17]
f5 = ord('0')
while f5 <= ord('9'):
f1[f5] = f2[f5 - ord('0')]
f5 += 1
f5 = ord('a')
while f5 <= ord('z'):
f1[f5] = f3[f5 - ord('a')]
f5 += 1
f5 = ord('A')
while f5 <= ord('Z'):
f1[f5] = f4[f5 - ord('A')]
f5 += 1
f1[ord('{')] = 13
f1[ord('}')] = 19
f1[ord('_')] = 9
return f1
The program, in essence, does is the following:
bldbefsarc
initializes an array of 256 ints, one per byte, in which each int is a “frequency count” (or zero if the int is not present)jmowfrxsjy
converts the frequency count array into a Huffman code binary tree. It does this by usinghiddqscdxr
to insert all of the non-zero frequency counts into a sorted linked list (formatted as a tree with only right children), then repeatedly taking the lowest two counts and joining them into a single tree node. The result is a binary tree which efficiently encodes strings with the provided frequency distribution. (There’s a bug in the sorting algorithm which results in the first element being out of order, so we have to use this specific Huffman tree implementation to attain bug compatibility).main
(artificial name for the script “function”) then takes your input, treats it as a bit string, and repeatedly reads Huffman codes from it. It then compares the resulting decoded characters with the expected output.
The server’s randomly-generated binaries only differ in the frequency table in bldbefsarc
and the expected output in main
.
Armed with the knowledge of this implementation, reversing it is easy: we just have to take the desired output and encode it using the same Huffman code tree into a bitstring. As it turns out, once you get the input correct, the server sends another random program and expects another input. You have to successfully pass five rounds of this to get the flag.
One extra detail was that the bitstring decoded by main
must be a multiple of 8 bits in length. However, the Huffman code for the final expected character might not end at a byte boundary, so we have to pad the final byte out to 8 bits. If we choose the wrong padding bits, however, we might end up accidentally adding a valid Huffman code and producing an extra character. Lazily, I decided to just bruteforce the correct final character by calling the pactvm
program with every possibility, although this did require patching the program to fix a bug in the native input
function(): by default, it reads input from fd 1 (stdout), and I had to patch it to read from fd 0 (stdin) instead.
Aside - Line Numbers
The bytecode comes with line numbers, which made me curious to know if you could use line numbers to infer the syntax of the programming language that compiles down to pactb
. In fact, it turns out you can, to some extent! The line numbers tell us that there’s a for-range syntax, and that there are likely braces (or at least explicit “end” markers) at the end of blocks.
The following disassembly, presented in very mangled C/Python-ish syntax, presents each line of code on the line indicated in the bytecode (some line breaks were added/removed to make line numbers correspond). Some bytecode corresponds to things like automatic nil
returns and loop jumps, and corresponds to close-braces in the code below.
class nwlrbbmqbh {
def init(arg) {
this.l = nil
this.r = nil
this.v = arg
this.f = 0
return this
}
}
def hiddqscdxr(f1, f2) {
f3 = nwlrbbmqbh(f2)
f3.f = f2.f
if f1 == nil {
return f3
}
f4 = f1
while f4.r != nil and f3.f > f4.r.f {
f4 = f4.r
}
f3.r = f4.r
f4.r = f3
return f1
}
def jmowfrxsjy(f1) {
f2 = nil
for f3 = 0; f3 < len(f1); f3++ {
if f1[f3] != 0 {
f4 = nwlrbbmqbh(chr(f3))
f4.f = f1[f3]
f2 = hiddqscdxr(f2, f4)
}
}
while f2 != nil and f2.r != nil {
f3 = f2.v
f2 = f2.r
f4 = f2.v
f2 = f2.r
f5 = nwlrbbmqbh(0)
f5.f = f3.f + f4.f
f5.l = f3
f5.r = f4
f2 = hiddqscdxr(f2, f5)
}
return f2.v
}
def bldbefsarc() {
f1 = zeros(256)
f2 = [18, 19, 20, 14, 11, 21, 17, 13, 18, 15]
f3 = [14, 10, 16, 14, 17, 17, 15, 14, 19, 17, 24, 15, 8, 19, 16, 14, 14, 25, 9, 12, 17, 10, 8, 15, 11, 21]
f4 = [11, 13, 11, 17, 16, 16, 14, 22, 14, 18, 19, 15, 18, 17, 16, 18, 8, 13, 12, 19, 17, 19, 18, 12, 12, 17]
for f5 = ord('0'); f5 <= ord('9'); f5++ {
f1[f5] = f2[f5 - ord('0')]
}
for f5 = ord('a'); f5 <= ord('z'); f5++ {
f1[f5] = f3[f5 - ord('a')]
}
for f5 = ord('A'); f5 <= ord('Z'); f5++ {
f1[f5] = f4[f5 - ord('A')]
}
f1[ord('{')] = 13
f1[ord('}')] = 19
f1[ord('_')] = 9
return f1
}
q = bldbefsarc()
k = 'MJZt0FWGkaqpSGqMphfCpDqRgW'
path = input('> ')
root = jmowfrxsjy(q)
cur = root
c = []
for f1 = 0; f1 < len(path); f1++ {
val = ord(path[f1])
for f3 = 0; f3 < 8; f3++ {
if cur.l == nil and cur.r == nil {
append(c, cur.v)
cur = root
}
if (val & 1) == 1 {
cur = cur.r
} else {
cur = cur.l
}
val >>= 1
}
}
if len(c) != len(k) {
exit(1)
}
for f1 = 0; f1 < len(k); f1++ {
if c[f1] != k[f1] {
print(c[f1])
print(k[f1])
exit(1)
}
}
exit(0)
Solution
All we need to do is extract out the new frequency table and expected output in each iteration and build the expected input. Here’s the script to do that; attach this to the Python code above:
def get_input(q, k):
root = jmowfrxsjy(q)
lookup = {}
def visit(cur, path):
if cur.l:
visit(cur.l, path + "0")
if cur.r:
visit(cur.r, path + "1")
if cur.l is None and cur.r is None:
lookup[cur.v] = path
visit(root, "")
fullpath = "".join(lookup[c] for c in k)
out = bytearray()
for i in range(0, len(fullpath) - 8, 8):
ch = 0
for j in range(8):
ch |= int(fullpath[i+j]) << j
out.append(ch)
return out
from pwn import *
import subprocess
import re
import sys
import ast
# context.update(log_level='debug')
s = remote('ctf.b01lers.com', 9304)
for i in range(5):
hexdump = s.recvuntil(b"\n> ", drop=True)
with open('test.xxd', 'wb') as f:
f.write(hexdump)
subprocess.check_call(["xxd", "-r", "test.xxd", "test.bin"])
disas = subprocess.check_output([sys.executable, "disas.py", "test.bin"])
with open('test.txt', 'wb') as f:
f.write(disas)
disas = disas.decode()
k = re.findall(r"push\(([^)]+)\)\n.+k = pop\(\)", disas)[0]
print(k)
k = ast.literal_eval(k)
_, x = disas.split("bldbefsarc:\n", 1)
pushes = [int(c) for c in re.findall(r"push\((\d+)\)", x)]
assert pushes[0] == 256
digits = pushes[1:11]
lcase = pushes[11:37]
ucase = pushes[37:63]
assert pushes[63] == 1
assert pushes[64] == 1
assert pushes[65] == 1
lbrace = pushes[66]
rbrace = pushes[67]
uscore = pushes[68]
assert len(pushes) == 69
q = [0] * 256
f5 = ord('0')
while f5 <= ord('9'):
q[f5] = digits[f5 - ord('0')]
f5 += 1
f5 = ord('a')
while f5 <= ord('z'):
q[f5] = lcase[f5 - ord('a')]
f5 += 1
f5 = ord('A')
while f5 <= ord('Z'):
q[f5] = ucase[f5 - ord('A')]
f5 += 1
q[ord('{')] = lbrace
q[ord('}')] = rbrace
q[ord('_')] = uscore
poss = get_input(q, k)
log.info("Prefix: %s", poss)
# The requirement for the trailing bits is that they cannot produce a valid
# path; otherwise, an extra character would be appended. We could work out
# what paths would work, but it'll be easier to just bruteforce it.
for last in range(32, 128):
proc = subprocess.Popen(["../pactvm.patched", "test.bin"], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout, stderr = proc.communicate(poss + bytes([last, 10]))
if proc.returncode == 0:
password = poss + bytes([last])
log.info("Valid password: %s", password)
break
s.sendline(password)
s.interactive()
This quickly chews through all five iterations and spits out our flag:
bctf{search_the_forest_through_the_trees}