AI LOAD INSTRUCTION: Expert symbolic execution techniques using angr, Z3, and Unicorn Engine. Covers CTF challenge automation, constraint solving patterns, function hooking, SimProcedure replacement, and emulation-based unpacking. Base models often produce broken angr scripts due to incorrect state initialization or missing hooks for libc functions.
python
# Bitvector (fixed-size integer)
sym_input = claripy.BVS("input", 64) # 64-bit symbolic
sym_byte = claripy.BVS("byte", 8) # 8-bit symbolic
sym_buf = claripy.BVS("buffer", 8 * 32) # 32-byte buffer
# Concrete bitvector
concrete = claripy.BVV(0x41, 8) # concrete value 0x41
# Constraints
state.solver.add(sym_input > 0)
state.solver.add(sym_input < 100)
state.solver.add(sym_byte >= 0x20) # printable ASCII
state.solver.add(sym_byte <= 0x7e)
# Evaluate
value = state.solver.eval(sym_input)
all_values = state.solver.eval_upto(sym_input, 10) # up to 10 solutions
python
# Hook by address (skip N bytes of original code)
@proj.hook(0x401100, length=5)
def skip_check(state):
state.regs.eax = 1 # force success
# SimProcedure: replace library function
class MyStrcmp(angr.SimProcedure):
def run(self, s1, s2):
return claripy.If(
self.state.memory.load(s1, 32) == self.state.memory.load(s2, 32),
claripy.BVV(0, 32),
claripy.BVV(1, 32)
)
proj.hook_symbol('strcmp', MyStrcmp())
# Hook common problematic functions
proj.hook_symbol('printf', angr.SIM_PROCEDURES['libc']['printf']())
proj.hook_symbol('scanf', angr.SIM_PROCEDURES['libc']['scanf']())
proj.hook_symbol('puts', angr.SIM_PROCEDURES['libc']['puts']())
python
# Read memory (symbolic-aware)
data = state.memory.load(addr, size) # returns BV
data_concrete = state.solver.eval(data, cast_to=bytes)
# Write memory
state.memory.store(addr, claripy.BVV(0x41, 8))
state.memory.store(addr, sym_buf)
# Read/write registers
rax = state.regs.rax
state.regs.rdi = claripy.BVV(0x1000, 64)
python
# Serial key validation: each char satisfies constraints
key = [BitVec(f'k{i}', 8) for i in range(16)]
s = Solver()
for k in key:
s.add(k >= 0x30, k <= 0x7a) # alphanumeric-ish
# XOR key recovery
plaintext = b"known_plaintext"
ciphertext = b"\x12\x34..."
key_byte = BitVec('key', 8)
s = Solver()
for p, c in zip(plaintext, ciphertext):
s.add(p ^ key_byte == c)
# System of linear equations (modular)
a, b, c = BitVecs('a b c', 32)
s = Solver()
s.add(3*a + 5*b + 7*c == 0x12345678)
s.add(2*a + 4*b + 6*c == 0xDEADBEEF)
s.add(a ^ b ^ c == 0xCAFEBABE)
python
from unicorn import *
from unicorn.x86_const import *
from capstone import Cs, CS_ARCH_X86, CS_MODE_64
mu = Uc(UC_ARCH_X86, UC_MODE_64)
CODE_ADDR = 0x400000
STACK_ADDR = 0x7fff0000
STACK_SIZE = 0x10000
mu.mem_map(CODE_ADDR, 0x10000)
mu.mem_map(STACK_ADDR, STACK_SIZE)
mu.mem_write(CODE_ADDR, code_bytes)
mu.reg_write(UC_X86_REG_RSP, STACK_ADDR + STACK_SIZE - 0x1000)
mu.reg_write(UC_X86_REG_RBP, STACK_ADDR + STACK_SIZE - 0x1000)
mu.emu_start(CODE_ADDR, CODE_ADDR + len(code_bytes))
result = mu.reg_read(UC_X86_REG_RAX)
python
# Hook memory access
def hook_mem(uc, access, address, size, value, user_data):
if access == UC_MEM_WRITE:
print(f"Write {value:#x} to {address:#x}")
elif access == UC_MEM_READ:
print(f"Read from {address:#x}")
mu.hook_add(UC_HOOK_MEM_READ | UC_HOOK_MEM_WRITE, hook_mem)
# Hook specific instruction (for tracing)
def hook_code(uc, address, size, user_data):
code = uc.mem_read(address, size)
md = Cs(CS_ARCH_X86, CS_MODE_64)
for insn in md.disasm(bytes(code), address):
print(f" {insn.address:#x}: {insn.mnemonic} {insn.op_str}")
mu.hook_add(UC_HOOK_CODE, hook_code)
This dramatically speeds up execution: concrete code runs natively via Unicorn, switching to symbolic only when symbolic variables are involved.
Need to solve a reversing challenge?
│
├─ Is the challenge pure math / equations?
│ └─ Yes → Z3
│ ├─ Linear equations → BitVec + Solver
│ ├─ Modular arithmetic → BitVec (natural mod 2^n)
│ ├─ Boolean logic → Bool + Solver
│ └─ Optimization → Optimize + minimize/maximize
│
├─ Is it a compiled binary with clear success/fail?
│ └─ Yes → angr
│ ├─ Input via stdin → symbolic stdin
│ ├─ Input via argv → full_init_state with symbolic args
│ ├─ Input via file → SimFile
│ ├─ Path explosion → add constraints, avoid paths, hook loops
│ └─ Complex library calls → hook with SimProcedure
│
├─ Need to emulate a specific function/region?
│ └─ Yes → Unicorn Engine
│ ├─ Decryption routine → map code + data, emulate, read result
│ ├─ Shellcode analysis → map shellcode, hook syscalls
│ └─ Key schedule → emulate with different inputs
│
├─ Need to analyze firmware / exotic arch?
│ └─ Yes → Qiling (full system emulation with OS support)
│
├─ Binary has VM protection?
│ └─ angr for handler analysis + Z3 for bytecode constraints
│
└─ None of the above working?
├─ Combine: Unicorn for concrete regions + Z3 for constraints
├─ Manual reverse engineering with debugger
└─ Side-channel approach (timing, power analysis for hardware)