rev-unicorn-debug
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chineserev-unicorn-debug - Unicorn Emulation Debugger
rev-unicorn-debug - Unicorn仿真调试器
Debug and emulate specific code fragments or functions using the Unicorn engine. Analyze context dependencies (JNI, syscalls, library functions) and simulate them through hook mechanisms to complete the user's debugging goal.
使用Unicorn引擎调试和仿真特定代码片段或函数。分析上下文依赖(JNI、syscalls、库函数)并通过钩子机制模拟它们,以完成用户的调试目标。
Core Principles
核心原则
- Load file raw first — do NOT parse ELF/PE/Mach-O headers. Read the file as raw bytes and map directly into Unicorn memory. We only need to emulate specific functions, not the entire binary. If raw loading fails (code references segments at specific addresses), then parse minimally — only map the segments needed.
- Identify context dependencies — analyze the target code for external calls (JNI, syscalls, libc, imports) and hook them to provide simulated responses.
- Use callbacks extensively — leverage Unicorn's hook system for debugging, tracing, error recovery, and environment simulation.
- Iterative fix — when emulation crashes, use the callback info to diagnose and fix (map missing memory, hook unhandled calls, fix register state).
- Minimal trace output — prefer block-level tracing over instruction-level. Only enable instruction trace on small targeted ranges. Use counters and summaries instead of per-step logging.
- 优先以原始格式加载文件 —— 不要解析ELF/PE/Mach-O头文件。将文件作为原始字节读取,直接映射到Unicorn内存中。我们只需要仿真特定函数,而非整个二进制文件。如果原始加载失败(代码引用了特定地址的段),则进行最小程度的解析——仅映射所需的段。
- 识别上下文依赖 —— 分析目标代码的外部调用(JNI、syscalls、libc、导入项)并为其添加钩子,提供模拟响应。
- 广泛使用回调 —— 利用Unicorn的钩子系统实现调试、跟踪、错误恢复和环境模拟。
- 迭代修复 —— 当仿真崩溃时,使用回调信息诊断并修复问题(映射缺失的内存、为未处理的调用加钩子、修复寄存器状态)。
- 最小化跟踪输出 —— 优先使用块级跟踪而非指令级跟踪。仅在小范围目标代码上启用指令跟踪。使用计数器和汇总信息代替单步日志。
Environment Simulation Strategy
环境模拟策略
Before emulating, read the target function and identify what it calls. Hook external dependencies by address and simulate in Python:
| Category | Examples | Simulation Strategy |
|---|---|---|
| libc | | Hook address, implement logic in Python (bump allocator for malloc) |
| JNI | | Build fake JNIEnv function table in UC memory, write RET stubs at each entry, hook stub addresses |
| Syscalls | | Hook |
| C++ runtime | | Hook and simulate |
| Library calls | | Hook and return success/stub |
Hook pattern: Register a callback. When PC hits a known import address, execute the Python simulation, then set PC = LR to skip the original function.
UC_HOOK_CODE在仿真开始前,读取目标函数并识别其调用的内容。按地址为外部依赖添加钩子,并在Python中实现模拟逻辑:
| 分类 | 示例 | 模拟策略 |
|---|---|---|
| libc | | 挂钩对应地址,在Python中实现逻辑(为malloc实现 bump 分配器) |
| JNI | | 在UC内存中构建伪造的JNIEnv函数表,在每个入口点写入RET存根,为存根地址添加钩子 |
| Syscalls | | 挂钩 |
| C++ runtime | | 挂钩并模拟 |
| 库调用 | | 挂钩并返回成功/存根值 |
钩子模式: 注册回调。当PC命中已知的导入地址时,执行Python模拟逻辑,然后设置PC = LR来跳过原始函数。
UC_HOOK_CODECallback Types to Use
要使用的回调类型
| Callback | Purpose |
|---|---|
| Intercept import calls by address; instruction-level trace (use sparingly, narrow range only) |
| Block-level trace (preferred over instruction trace) |
| Auto-map missing pages to recover from unmapped access errors |
| Trace memory access on targeted data ranges only |
| Intercept SVC/INT for syscall simulation |
| 回调 | 用途 |
|---|---|
| 按地址拦截导入调用;指令级跟踪(谨慎使用,仅限定小范围) |
| 块级跟踪(优先于指令级跟踪使用) |
| 自动映射缺失页面,从未映射访问错误中恢复 |
| 仅在目标数据范围内跟踪内存访问 |
| 拦截SVC/INT以实现系统调用模拟 |
Iterative Debugging Workflow
迭代调试工作流
When emulation fails, follow this loop:
- Run — start emulation, let it crash
- Read callback output — which address faulted? What type (read/write/fetch)?
- Diagnose:
- Unmapped memory fetch → missing code page, map it
- Unmapped memory read/write → missing data section or uninitialized pointer, map or hook
- Hitting an import stub → identify the function, add a simulation hook
- Infinite loop → add a code hook with execution counter, stop after threshold
- Fix — add the hook / map the memory / adjust registers
- Re-run — repeat until the target function completes
当仿真失败时,按照以下循环处理:
- 运行 —— 启动仿真,等待其崩溃
- 读取回调输出 —— 哪个地址发生了错误?错误类型是什么(读/写/取指)?
- 诊断:
- 未映射内存取指 → 缺失代码页,映射该页面
- 未映射内存读/写 → 缺失数据段或指针未初始化,映射对应内存或添加钩子
- 命中导入存根 → 识别对应函数,添加模拟钩子
- 无限循环 → 添加带执行计数器的代码钩子,超过阈值后停止
- 修复 —— 添加钩子/映射内存/调整寄存器
- 重新运行 —— 重复流程直到目标函数执行完成
Architecture Quick Reference
架构速查表
| Arch | Uc Const | Mode | SP | LR | Args | Return | Syscall |
|---|---|---|---|---|---|---|---|
| ARM64 | | | SP | X30 | X0-X7 | X0 | X8 + SVC #0 |
| ARM32 | | | SP | LR | R0-R3 | R0 | R7 + SVC #0 |
| x86-64 | | | RSP | (stack) | RDI,RSI,RDX,RCX,R8,R9 | RAX | RAX + syscall |
| x86-32 | | | ESP | (stack) | (stack) | EAX | EAX + int 0x80 |
| MIPS32 | | | $sp | $ra | $a0-$a3 | $v0 | $v0 + syscall |
| 架构 | Uc常量 | 模式 | SP | LR | 参数 | 返回值 | 系统调用 |
|---|---|---|---|---|---|---|---|
| ARM64 | | | SP | X30 | X0-X7 | X0 | X8 + SVC #0 |
| ARM32 | | | SP | LR | R0-R3 | R0 | R7 + SVC #0 |
| x86-64 | | | RSP | (栈) | RDI,RSI,RDX,RCX,R8,R9 | RAX | RAX + syscall |
| x86-32 | | | ESP | (栈) | (栈) | EAX | EAX + int 0x80 |
| MIPS32 | | | $sp | $ra | $a0-$a3 | $v0 | $v0 + syscall |