Loading...
Loading...
Code obfuscation analysis and deobfuscation playbook. Use when reversing binaries protected by junk code, opaque predicates, self-modifying code, control flow flattening, VM protection, or string encryption.
npx skill4agent add yaklang/hack-skills code-obfuscation-deobfuscationAI LOAD INSTRUCTION: Expert techniques for identifying, classifying, and defeating code obfuscation in native binaries. Covers junk code, opaque predicates, SMC, control flow flattening, movfuscator, VM protectors (VMProtect/Themida/Code Virtualizer), string encryption, import hiding, and anti-disassembly tricks. Base models often conflate packing with obfuscation and miss the distinction between static and dynamic deobfuscation strategies.
| Symptom in IDA/Ghidra | Likely Obfuscation | Start With |
|---|---|---|
| Flat CFG, single giant switch | Control flow flattening | Symbolic execution to recover CFG |
Only | movfuscator | demovfuscation / trace-based lifting |
| pushad/pushfd → VM entry | VM protector | Handler table extraction |
| XOR loop before code execution | SMC / string encryption | Dynamic analysis, breakpoint after decode |
| Impossible conditions (opaque predicates) | Junk code insertion | Pattern-based removal |
| All strings unreadable | String encryption | Hook decryption routine, or emulate |
| No imports in IAT | Import hiding | Trace GetProcAddress / hash resolution |
| Type | Example | Always Evaluates To |
|---|---|---|
| Arithmetic | | True |
| Number theory | | True (product of consecutive ints) |
| Pointer-based | | True |
| Hash-based | | True |
∀x: predicate(x) = Trueimport z3
x = z3.BitVec('x', 32)
s = z3.Solver()
s.add(x * (x + 1) % 2 != 0)
print(s.check()) # unsat → always truelea esi, [encrypted_code]
mov ecx, code_length
mov al, xor_key
decrypt_loop:
xor byte [esi], al
inc esi
loop decrypt_loop
jmp encrypted_code ; now decrypted1. Identify the decryption routine (look for XOR/ADD/SUB in loops writing to .text)
2. Set breakpoint AFTER the loop completes
3. At breakpoint: dump the decrypted memory region
4. Re-analyze the dumped code in IDA/Ghidra
5. For multi-layer: repeat for each decryption stagefrom unicorn import *
from unicorn.x86_const import *
mu = Uc(UC_ARCH_X86, UC_MODE_32)
mu.mem_map(0x400000, 0x10000)
mu.mem_write(0x400000, binary_code)
mu.emu_start(decrypt_entry, decrypt_end)
decrypted = mu.mem_read(code_start, code_length)Original: A → B → C → D
Flattened: ┌──────────────────┐
│ dispatcher │
│ switch(state) │◄─────┐
├──────────────────┤ │
│ case 1: block A │──────┤
│ case 2: block B │──────┤
│ case 3: block C │──────┤
│ case 4: block D │──────┘
└──────────────────┘state = next_state| Technique | Tool | Effectiveness |
|---|---|---|
| Symbolic execution | angr, Triton, miasm | High — traces all state transitions |
| Trace-based recovery | Pin/DynamoRIO trace → reconstruct CFG | Medium — covers executed paths only |
| Pattern matching | Custom IDA/Ghidra script | Medium — works for known flatteners |
| D-810 (IDA plugin) | IDA Pro | High — specifically designed for CFF |
import angr, claripy
proj = angr.Project('./obfuscated')
cfg = proj.analyses.CFGFast()
# Find dispatcher block (highest in-degree basic block)
dispatcher = max(cfg.graph.nodes(), key=lambda n: cfg.graph.in_degree(n))
# For each case block, symbolically determine successor
for block in case_blocks:
state = proj.factory.blank_state(addr=block.addr)
# ... solve state variable to find real successormovmov| Approach | Description |
|---|---|
| demovfuscator (tool) | Static analysis, recovers original operations from mov patterns |
| Trace + taint analysis | Run with Pin/DynamoRIO, taint inputs, observe computation |
| Symbolic execution | Treat entire function as constraint system |
Protected code → bytecode compiler → custom bytecode
Runtime: VM entry (pushad/pushfd) → fetch → decode → execute → VM exit (popad/popfd); Typical VMProtect entry
pushad ; save all registers
pushfd ; save flags
mov ebp, esp ; VM stack frame
sub esp, VM_LOCALS_SIZE ; allocate VM context
mov esi, bytecode_addr ; bytecode instruction pointer
jmp vm_dispatcher ; enter VM loop1. Find dispatcher (large switch or indirect jump via table)
2. Each case/entry = one VM handler (implements one VM opcode)
3. Map handler addresses to operations by analyzing each handler:
- Handler reads operand from bytecode stream (esi)
- Performs operation on VM registers/stack
- Advances bytecode pointer
- Returns to dispatcher| Method | Description | Tool |
|---|---|---|
| Manual handler mapping | Reverse each handler, build ISA spec | IDA + scripting |
| Trace recording | Record all handler executions, reconstruct program | REVEN, Pin |
| Symbolic lifting | Symbolically execute handlers, lift to IR | Triton, miasm |
| Pattern matching | Match handler patterns to known VM families | Custom scripts |
| Pattern | Example | Recovery |
|---|---|---|
| XOR loop | | Hook or emulate XOR function |
| Stack strings | | IDA FLIRT / Ghidra script to reassemble |
| RC4 encrypted | Encrypted blob + RC4 key in binary | Extract key, decrypt offline |
| AES encrypted | Encrypted blob + AES key derived at runtime | Hook after decryption |
| Custom encoding | Base64 + XOR + reverse | Trace the decode function, replicate |
# Ghidra script: find XOR decryption calls, emulate them
from ghidra.program.model.symbol import SourceType
decrypt_func = getFunction("decrypt_string")
refs = getReferencesTo(decrypt_func.getEntryPoint())
for ref in refs:
call_addr = ref.getFromAddress()
# extract arguments (encrypted buffer ptr, key, length)
# emulate decryption, add comment with plaintextFARPROC resolve(DWORD hash) {
// Walk PEB → LDR → InMemoryOrderModuleList
// For each DLL, walk export table
// Hash each export name, compare with target hash
// Return matching function pointer
}| Name | Algorithm | Used By |
|---|---|---|
| ROR13 | | Metasploit shellcode |
| djb2 | | Various malware |
| CRC32 | Standard CRC32 of function name | Sophisticated packers |
| FNV-1a | | Modern malware |
| Trick | Mechanism | Fix |
|---|---|---|
| Overlapping instructions | | Manual re-analysis from correct offset |
| Misaligned jumps | Jump into middle of multi-byte instruction | Force IDA to re-analyze at target |
| Conditional jump pair | | Convert to unconditional jmp |
| Return address manipulation | | Recognize push+ret as jump |
| Exception-based flow | Trigger exception, real code in handler | Analyze exception handler chain |
| Call + add [esp] | | Calculate actual target |
Right-click → Undefine (U)
Right-click → Code (C) at correct offset
Edit → Patch → Assemble (for permanent fix)Obfuscated binary — how to approach?
│
├─ Can you run it?
│ ├─ Yes → Dynamic analysis first
│ │ ├─ Set BP on interesting APIs (file, network, crypto)
│ │ ├─ Trace execution to understand real behavior
│ │ └─ Dump decrypted code/strings at runtime
│ │
│ └─ No (embedded/firmware/exotic arch) → Static only
│ └─ Identify obfuscation type from patterns below
│
├─ What does the code look like?
│ │
│ ├─ Giant flat switch/dispatcher loop?
│ │ ├─ State variable drives control flow → CFF
│ │ │ └─ Use D-810 or symbolic deflattening
│ │ └─ Bytecode fetch-decode-execute → VM protection
│ │ └─ Extract handlers, build disassembler
│ │
│ ├─ Only mov instructions?
│ │ └─ movfuscator → demovfuscator tool
│ │
│ ├─ XOR/ADD loop writing to .text section?
│ │ └─ SMC → breakpoint after decode, dump
│ │
│ ├─ Impossible conditions in branches?
│ │ └─ Opaque predicates → Z3 proving or pattern removal
│ │
│ ├─ Disassembly looks wrong / functions overlap?
│ │ └─ Anti-disassembly → manual re-analysis at correct offsets
│ │
│ ├─ No readable strings?
│ │ └─ String encryption → hook decrypt function or emulate
│ │
│ ├─ No imports in IAT?
│ │ └─ Import hiding → identify hash, build lookup table
│ │
│ └─ pushad/pushfd → complex code → popad/popfd?
│ └─ VM protector entry/exit → full VM analysis
│
└─ What tool to use?
├─ Known protector (VMProtect/Themida) → specific deprotection guide
├─ Custom obfuscation → combine: IDA scripting + Triton + manual
├─ CTF challenge → angr symbolic execution often fastest
└─ Malware analysis → dynamic (debugger + API monitor) first| Tool | Purpose | Best For |
|---|---|---|
| IDA Pro + Hex-Rays | Disassembly, decompilation, scripting | All-around analysis |
| Ghidra | Free alternative with scripting (Java/Python) | Budget-friendly RE |
| D-810 (IDA plugin) | Automated CFF deflattening | OLLVM-style obfuscation |
| miasm | IR-based analysis framework | Symbolic deobfuscation |
| Triton | Dynamic symbolic execution | Opaque predicate solving, CFF |
| REVEN | Full-system trace recording and replay | VM protector analysis |
| demovfuscator | movfuscator reversal | mov-only binaries |
| x64dbg + plugins | Dynamic analysis with scripting | Windows RE |
| Unicorn Engine | CPU emulation | SMC unpacking, shellcode |
| Capstone | Disassembly library | Custom tooling |
| IDA FLIRT | Function signature matching | Identify library code in stripped binaries |
| Binary Ninja | Alternative disassembler with MLIL/HLIL | Automated analysis |