rev-unicorn-debug

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

rev-unicorn-debug - Unicorn Emulation Debugger

rev-unicorn-debug - Unicorn仿真调试器

Debug and emulate specific code fragments or functions using the Unicorn engine. Analyze context dependencies (JNI, syscalls, library functions) and simulate them through hook mechanisms to complete the user's debugging goal.

使用Unicorn引擎调试和仿真特定代码片段或函数。分析上下文依赖（JNI、syscalls、库函数）并通过钩子机制模拟它们，以完成用户的调试目标。

Core Principles

核心原则

Load file raw first — do NOT parse ELF/PE/Mach-O headers. Read the file as raw bytes and map directly into Unicorn memory. We only need to emulate specific functions, not the entire binary. If raw loading fails (code references segments at specific addresses), then parse minimally — only map the segments needed.
Identify context dependencies — analyze the target code for external calls (JNI, syscalls, libc, imports) and hook them to provide simulated responses.
Use callbacks extensively — leverage Unicorn's hook system for debugging, tracing, error recovery, and environment simulation.
Iterative fix — when emulation crashes, use the callback info to diagnose and fix (map missing memory, hook unhandled calls, fix register state).
Minimal trace output — prefer block-level tracing over instruction-level. Only enable instruction trace on small targeted ranges. Use counters and summaries instead of per-step logging.

优先以原始格式加载文件 —— 不要解析ELF/PE/Mach-O头文件。将文件作为原始字节读取，直接映射到Unicorn内存中。我们只需要仿真特定函数，而非整个二进制文件。如果原始加载失败（代码引用了特定地址的段），则进行最小程度的解析——仅映射所需的段。
识别上下文依赖 —— 分析目标代码的外部调用（JNI、syscalls、libc、导入项）并为其添加钩子，提供模拟响应。
广泛使用回调 —— 利用Unicorn的钩子系统实现调试、跟踪、错误恢复和环境模拟。
迭代修复 —— 当仿真崩溃时，使用回调信息诊断并修复问题（映射缺失的内存、为未处理的调用加钩子、修复寄存器状态）。
最小化跟踪输出 —— 优先使用块级跟踪而非指令级跟踪。仅在小范围目标代码上启用指令跟踪。使用计数器和汇总信息代替单步日志。

Environment Simulation Strategy

环境模拟策略

Before emulating, read the target function and identify what it calls. Hook external dependencies by address and simulate in Python:

Category	Examples	Simulation Strategy
libc	`malloc` , `free` , `memcpy` , `strlen` , `printf`	Hook address, implement logic in Python (bump allocator for malloc)
JNI	`GetStringUTFChars` , `FindClass` , `GetMethodID`	Build fake JNIEnv function table in UC memory, write RET stubs at each entry, hook stub addresses
Syscalls	`read` , `write` , `mmap` , `ioctl`	Hook `UC_HOOK_INTR` , dispatch by syscall number
C++ runtime	`operator new` , `__cxa_throw`	Hook and simulate
Library calls	`pthread_mutex_lock` , `dlopen`	Hook and return success/stub

Hook pattern: Register a

UC_HOOK_CODE

callback. When PC hits a known import address, execute the Python simulation, then set PC = LR to skip the original function.

在仿真开始前，读取目标函数并识别其调用的内容。按地址为外部依赖添加钩子，并在Python中实现模拟逻辑：

分类	示例	模拟策略
libc	`malloc` , `free` , `memcpy` , `strlen` , `printf`	挂钩对应地址，在Python中实现逻辑（为malloc实现 bump 分配器）
JNI	`GetStringUTFChars` , `FindClass` , `GetMethodID`	在UC内存中构建伪造的JNIEnv函数表，在每个入口点写入RET存根，为存根地址添加钩子
Syscalls	`read` , `write` , `mmap` , `ioctl`	挂钩 `UC_HOOK_INTR` ，按系统调用号分发处理
C++ runtime	`operator new` , `__cxa_throw`	挂钩并模拟
库调用	`pthread_mutex_lock` , `dlopen`	挂钩并返回成功/存根值

钩子模式： 注册

UC_HOOK_CODE

回调。当PC命中已知的导入地址时，执行Python模拟逻辑，然后设置PC = LR来跳过原始函数。

Callback Types to Use

要使用的回调类型

Callback	Purpose
`UC_HOOK_CODE`	Intercept import calls by address; instruction-level trace (use sparingly, narrow range only)
`UC_HOOK_BLOCK`	Block-level trace (preferred over instruction trace)
`UC_HOOK_MEM_UNMAPPED`	Auto-map missing pages to recover from unmapped access errors
`UC_HOOK_MEM_READ \| UC_HOOK_MEM_WRITE`	Trace memory access on targeted data ranges only
`UC_HOOK_INTR`	Intercept SVC/INT for syscall simulation

回调	用途
`UC_HOOK_CODE`	按地址拦截导入调用；指令级跟踪（谨慎使用，仅限定小范围）
`UC_HOOK_BLOCK`	块级跟踪（优先于指令级跟踪使用）
`UC_HOOK_MEM_UNMAPPED`	自动映射缺失页面，从未映射访问错误中恢复
`UC_HOOK_MEM_READ \| UC_HOOK_MEM_WRITE`	仅在目标数据范围内跟踪内存访问
`UC_HOOK_INTR`	拦截SVC/INT以实现系统调用模拟

Iterative Debugging Workflow

迭代调试工作流

When emulation fails, follow this loop:

Run — start emulation, let it crash
Read callback output — which address faulted? What type (read/write/fetch)?
Diagnose:
- Unmapped memory fetch → missing code page, map it
- Unmapped memory read/write → missing data section or uninitialized pointer, map or hook
- Hitting an import stub → identify the function, add a simulation hook
- Infinite loop → add a code hook with execution counter, stop after threshold
Fix — add the hook / map the memory / adjust registers
Re-run — repeat until the target function completes

当仿真失败时，按照以下循环处理：

运行 —— 启动仿真，等待其崩溃
读取回调输出 —— 哪个地址发生了错误？错误类型是什么（读/写/取指）？
诊断：
- 未映射内存取指 → 缺失代码页，映射该页面
- 未映射内存读/写 → 缺失数据段或指针未初始化，映射对应内存或添加钩子
- 命中导入存根 → 识别对应函数，添加模拟钩子
- 无限循环 → 添加带执行计数器的代码钩子，超过阈值后停止
修复 —— 添加钩子/映射内存/调整寄存器
重新运行 —— 重复流程直到目标函数执行完成

Architecture Quick Reference

架构速查表

Arch	Uc Const	Mode	SP	LR	Args	Return	Syscall
ARM64	`UC_ARCH_ARM64`	`UC_MODE_LITTLE_ENDIAN`	SP	X30	X0-X7	X0	X8 + SVC #0
ARM32	`UC_ARCH_ARM`	`UC_MODE_THUMB` / `UC_MODE_ARM`	SP	LR	R0-R3	R0	R7 + SVC #0
x86-64	`UC_ARCH_X86`	`UC_MODE_64`	RSP	(stack)	RDI,RSI,RDX,RCX,R8,R9	RAX	RAX + syscall
x86-32	`UC_ARCH_X86`	`UC_MODE_32`	ESP	(stack)	(stack)	EAX	EAX + int 0x80
MIPS32	`UC_ARCH_MIPS`	`UC_MODE_MIPS32 + UC_MODE_BIG_ENDIAN`	$sp	$ra	$a0-$a3	$v0	$v0 + syscall

架构	Uc常量	模式	SP	LR	参数	返回值	系统调用
ARM64	`UC_ARCH_ARM64`	`UC_MODE_LITTLE_ENDIAN`	SP	X30	X0-X7	X0	X8 + SVC #0
ARM32	`UC_ARCH_ARM`	`UC_MODE_THUMB` / `UC_MODE_ARM`	SP	LR	R0-R3	R0	R7 + SVC #0
x86-64	`UC_ARCH_X86`	`UC_MODE_64`	RSP	(栈)	RDI,RSI,RDX,RCX,R8,R9	RAX	RAX + syscall
x86-32	`UC_ARCH_X86`	`UC_MODE_32`	ESP	(栈)	(栈)	EAX	EAX + int 0x80
MIPS32	`UC_ARCH_MIPS`	`UC_MODE_MIPS32 + UC_MODE_BIG_ENDIAN`	$sp	$ra	$a0-$a3	$v0	$v0 + syscall