rev-unicorn-debug

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

rev-unicorn-debug - Unicorn Emulation Debugger

rev-unicorn-debug - Unicorn仿真调试器

Debug and emulate specific code fragments or functions using the Unicorn engine. Analyze context dependencies (JNI, syscalls, library functions) and simulate them through hook mechanisms to complete the user's debugging goal.

使用Unicorn引擎调试和仿真特定代码片段或函数。分析上下文依赖(JNI、syscalls、库函数)并通过钩子机制模拟它们,以完成用户的调试目标。

Core Principles

核心原则

  1. Load file raw first — do NOT parse ELF/PE/Mach-O headers. Read the file as raw bytes and map directly into Unicorn memory. We only need to emulate specific functions, not the entire binary. If raw loading fails (code references segments at specific addresses), then parse minimally — only map the segments needed.
  2. Identify context dependencies — analyze the target code for external calls (JNI, syscalls, libc, imports) and hook them to provide simulated responses.
  3. Use callbacks extensively — leverage Unicorn's hook system for debugging, tracing, error recovery, and environment simulation.
  4. Iterative fix — when emulation crashes, use the callback info to diagnose and fix (map missing memory, hook unhandled calls, fix register state).
  5. Minimal trace output — prefer block-level tracing over instruction-level. Only enable instruction trace on small targeted ranges. Use counters and summaries instead of per-step logging.

  1. 优先以原始格式加载文件 —— 不要解析ELF/PE/Mach-O头文件。将文件作为原始字节读取,直接映射到Unicorn内存中。我们只需要仿真特定函数,而非整个二进制文件。如果原始加载失败(代码引用了特定地址的段),则进行最小程度的解析——仅映射所需的段。
  2. 识别上下文依赖 —— 分析目标代码的外部调用(JNI、syscalls、libc、导入项)并为其添加钩子,提供模拟响应。
  3. 广泛使用回调 —— 利用Unicorn的钩子系统实现调试、跟踪、错误恢复和环境模拟。
  4. 迭代修复 —— 当仿真崩溃时,使用回调信息诊断并修复问题(映射缺失的内存、为未处理的调用加钩子、修复寄存器状态)。
  5. 最小化跟踪输出 —— 优先使用块级跟踪而非指令级跟踪。仅在小范围目标代码上启用指令跟踪。使用计数器和汇总信息代替单步日志。

Environment Simulation Strategy

环境模拟策略

Before emulating, read the target function and identify what it calls. Hook external dependencies by address and simulate in Python:
CategoryExamplesSimulation Strategy
libc
malloc
,
free
,
memcpy
,
strlen
,
printf
Hook address, implement logic in Python (bump allocator for malloc)
JNI
GetStringUTFChars
,
FindClass
,
GetMethodID
Build fake JNIEnv function table in UC memory, write RET stubs at each entry, hook stub addresses
Syscalls
read
,
write
,
mmap
,
ioctl
Hook
UC_HOOK_INTR
, dispatch by syscall number
C++ runtime
operator new
,
__cxa_throw
Hook and simulate
Library calls
pthread_mutex_lock
,
dlopen
Hook and return success/stub
Hook pattern: Register a
UC_HOOK_CODE
callback. When PC hits a known import address, execute the Python simulation, then set PC = LR to skip the original function.

在仿真开始前,读取目标函数并识别其调用的内容。按地址为外部依赖添加钩子,并在Python中实现模拟逻辑:
分类示例模拟策略
libc
malloc
,
free
,
memcpy
,
strlen
,
printf
挂钩对应地址,在Python中实现逻辑(为malloc实现 bump 分配器)
JNI
GetStringUTFChars
,
FindClass
,
GetMethodID
在UC内存中构建伪造的JNIEnv函数表,在每个入口点写入RET存根,为存根地址添加钩子
Syscalls
read
,
write
,
mmap
,
ioctl
挂钩
UC_HOOK_INTR
,按系统调用号分发处理
C++ runtime
operator new
,
__cxa_throw
挂钩并模拟
库调用
pthread_mutex_lock
,
dlopen
挂钩并返回成功/存根值
钩子模式: 注册
UC_HOOK_CODE
回调。当PC命中已知的导入地址时,执行Python模拟逻辑,然后设置PC = LR来跳过原始函数。

Callback Types to Use

要使用的回调类型

CallbackPurpose
UC_HOOK_CODE
Intercept import calls by address; instruction-level trace (use sparingly, narrow range only)
UC_HOOK_BLOCK
Block-level trace (preferred over instruction trace)
UC_HOOK_MEM_UNMAPPED
Auto-map missing pages to recover from unmapped access errors
UC_HOOK_MEM_READ | UC_HOOK_MEM_WRITE
Trace memory access on targeted data ranges only
UC_HOOK_INTR
Intercept SVC/INT for syscall simulation

回调用途
UC_HOOK_CODE
按地址拦截导入调用;指令级跟踪(谨慎使用,仅限定小范围)
UC_HOOK_BLOCK
块级跟踪(优先于指令级跟踪使用)
UC_HOOK_MEM_UNMAPPED
自动映射缺失页面,从未映射访问错误中恢复
UC_HOOK_MEM_READ | UC_HOOK_MEM_WRITE
仅在目标数据范围内跟踪内存访问
UC_HOOK_INTR
拦截SVC/INT以实现系统调用模拟

Iterative Debugging Workflow

迭代调试工作流

When emulation fails, follow this loop:
  1. Run — start emulation, let it crash
  2. Read callback output — which address faulted? What type (read/write/fetch)?
  3. Diagnose:
    • Unmapped memory fetch → missing code page, map it
    • Unmapped memory read/write → missing data section or uninitialized pointer, map or hook
    • Hitting an import stub → identify the function, add a simulation hook
    • Infinite loop → add a code hook with execution counter, stop after threshold
  4. Fix — add the hook / map the memory / adjust registers
  5. Re-run — repeat until the target function completes

当仿真失败时,按照以下循环处理:
  1. 运行 —— 启动仿真,等待其崩溃
  2. 读取回调输出 —— 哪个地址发生了错误?错误类型是什么(读/写/取指)?
  3. 诊断:
    • 未映射内存取指 → 缺失代码页,映射该页面
    • 未映射内存读/写 → 缺失数据段或指针未初始化,映射对应内存或添加钩子
    • 命中导入存根 → 识别对应函数,添加模拟钩子
    • 无限循环 → 添加带执行计数器的代码钩子,超过阈值后停止
  4. 修复 —— 添加钩子/映射内存/调整寄存器
  5. 重新运行 —— 重复流程直到目标函数执行完成

Architecture Quick Reference

架构速查表

ArchUc ConstModeSPLRArgsReturnSyscall
ARM64
UC_ARCH_ARM64
UC_MODE_LITTLE_ENDIAN
SPX30X0-X7X0X8 + SVC #0
ARM32
UC_ARCH_ARM
UC_MODE_THUMB
/
UC_MODE_ARM
SPLRR0-R3R0R7 + SVC #0
x86-64
UC_ARCH_X86
UC_MODE_64
RSP(stack)RDI,RSI,RDX,RCX,R8,R9RAXRAX + syscall
x86-32
UC_ARCH_X86
UC_MODE_32
ESP(stack)(stack)EAXEAX + int 0x80
MIPS32
UC_ARCH_MIPS
UC_MODE_MIPS32 + UC_MODE_BIG_ENDIAN
$sp$ra$a0-$a3$v0$v0 + syscall
架构Uc常量模式SPLR参数返回值系统调用
ARM64
UC_ARCH_ARM64
UC_MODE_LITTLE_ENDIAN
SPX30X0-X7X0X8 + SVC #0
ARM32
UC_ARCH_ARM
UC_MODE_THUMB
/
UC_MODE_ARM
SPLRR0-R3R0R7 + SVC #0
x86-64
UC_ARCH_X86
UC_MODE_64
RSP(栈)RDI,RSI,RDX,RCX,R8,R9RAXRAX + syscall
x86-32
UC_ARCH_X86
UC_MODE_32
ESP(栈)(栈)EAXEAX + int 0x80
MIPS32
UC_ARCH_MIPS
UC_MODE_MIPS32 + UC_MODE_BIG_ENDIAN
$sp$ra$a0-$a3$v0$v0 + syscall