binary-lifting

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Binary Lifting Skill

二进制提升技术

This skill covers techniques and tools for lifting binary executables to LLVM IR, enabling advanced analysis, transformation, and recompilation of existing binaries.
本技术涵盖将二进制可执行文件转换为LLVM IR的相关技术与工具,可对现有二进制文件进行高级分析、转换和重新编译。

Core Concepts

核心概念

What is Binary Lifting?

什么是二进制提升?

Binary lifting is the process of translating low-level machine code (x86, ARM, etc.) into a higher-level intermediate representation (LLVM IR), enabling:
  • Static and dynamic analysis
  • Deobfuscation and vulnerability research
  • Code recompilation and optimization
  • Cross-architecture translation
二进制提升是将低级机器码(x86、ARM等)转换为更高层级中间表示(LLVM IR)的过程,可实现:
  • 静态与动态分析
  • 反混淆与漏洞研究
  • 代码重新编译与优化
  • 跨架构转换

Lifting Pipeline

提升流程

Binary → Disassembly → IR Generation → Optimization → Analysis/Recompilation
Binary → Disassembly → IR Generation → Optimization → Analysis/Recompilation

Major Lifting Frameworks

主流提升框架

Production-Grade Tools

生产级工具

  • RetDec (Avast): Full decompiler with C output, multi-architecture support
  • McSema (Trail of Bits): x86/x64 to LLVM IR, function recovery
  • revng: Based on QEMU, supports multiple architectures
  • reopt (Galois): Focus on correctness and formal methods
  • RetDec(Avast):支持多架构的全功能反编译器,可输出C代码
  • McSema(Trail of Bits):将x86/x64转换为LLVM IR,支持函数恢复
  • revng:基于QEMU,支持多种架构
  • reopt(Galois):专注于正确性与形式化方法

Research/Specialized Tools

研究/专用工具

  • Rellume: Fast x86-64 to LLVM lifting for JIT scenarios
  • fcd: Pattern-based decompiler with optimization passes
  • bin2llvm: QEMU-based binary to LLVM translator
  • llvm-mctoll: Microsoft's machine code to LLVM lifter
  • Rellume:面向JIT场景的快速x86-64到LLVM提升工具
  • fcd:基于模式的反编译器,带有优化流程
  • bin2llvm:基于QEMU的二进制到LLVM转换器
  • llvm-mctoll:微软推出的机器码到LLVM提升工具

Language-Specific Lifters

特定语言提升工具

  • llvm2c/IR->C: Convert LLVM IR back to C code
  • llvm2cranelift: LLVM IR to Cranelift IR
  • Leaven: LLVM IR to Go language
  • masxinlingvonta: JVM bytecode to LLVM IR
  • llvm2c/IR->C:将LLVM IR转换回C代码
  • llvm2cranelift:将LLVM IR转换为Cranelift IR
  • Leaven:将LLVM IR转换为Go语言
  • masxinlingvonta:将JVM字节码转换为LLVM IR

Implementation Techniques

实现技术

Instruction Semantics Translation

指令语义转换

cpp
// Example: Translating x86 ADD to LLVM IR
Value* translateADD(IRBuilder<> &builder, Value* op1, Value* op2) {
    Value* result = builder.CreateAdd(op1, op2, "add_result");
    
    // Update flags (CF, OF, SF, ZF, etc.)
    updateCarryFlag(builder, op1, op2, result);
    updateOverflowFlag(builder, op1, op2, result);
    updateSignFlag(builder, result);
    updateZeroFlag(builder, result);
    
    return result;
}
cpp
// Example: Translating x86 ADD to LLVM IR
Value* translateADD(IRBuilder<> &builder, Value* op1, Value* op2) {
    Value* result = builder.CreateAdd(op1, op2, "add_result");
    
    // Update flags (CF, OF, SF, ZF, etc.)
    updateCarryFlag(builder, op1, op2, result);
    updateOverflowFlag(builder, op1, op2, result);
    updateSignFlag(builder, result);
    updateZeroFlag(builder, result);
    
    return result;
}

Control Flow Recovery

控制流恢复

  1. Linear Sweep: Simple but misses code with embedded data
  2. Recursive Descent: Follow control flow, better coverage
  3. Speculative Disassembly: Handle indirect jumps/calls
  4. Machine Learning: Use ML to identify function boundaries
  1. 线性扫描:实现简单,但会遗漏包含嵌入式数据的代码
  2. 递归下降:跟踪控制流,覆盖范围更广
  3. 推测反汇编:处理间接跳转/调用
  4. 机器学习:利用ML识别函数边界

Handling Indirect Control Flow

处理间接控制流

  • Value Set Analysis (VSA)
  • Symbolic execution for jump target resolution
  • Type recovery for virtual table reconstruction
  • 值集分析(VSA)
  • 符号执行以解析跳转目标
  • 类型恢复以重建虚表

Triton Integration

Triton集成

Triton symbolic execution engine can be used with lifting:
python
from triton import TritonContext, ARCH, Instruction

ctx = TritonContext(ARCH.X86_64)
Triton符号执行引擎可与提升技术结合使用:
python
from triton import TritonContext, ARCH, Instruction

ctx = TritonContext(ARCH.X86_64)

Symbolically execute and extract AST

Symbolically execute and extract AST

inst = Instruction(b"\x48\x01\xd8") # add rax, rbx ctx.processing(inst)
inst = Instruction(b"\x48\x01\xd8") # add rax, rbx ctx.processing(inst)

Convert Triton AST to LLVM IR

Convert Triton AST to LLVM IR

ast = ctx.getRegisterAst(ctx.registers.rax) llvm_ir = triton_ast_to_llvm(ast)
undefined
ast = ctx.getRegisterAst(ctx.registers.rax) llvm_ir = triton_ast_to_llvm(ast)
undefined

Deobfuscation via Lifting

基于提升的反混淆

Approach

方法

  1. Lift obfuscated binary to LLVM IR
  2. Apply optimization passes to simplify
  3. Use custom passes for specific obfuscation patterns
  4. Re-emit cleaned code
  1. 将混淆后的二进制文件提升为LLVM IR
  2. 应用优化流程简化代码
  3. 针对特定混淆模式使用自定义流程
  4. 重新生成清理后的代码

Useful Optimization Passes

实用优化流程

  • Dead Store Elimination (DSE)
  • Global Value Numbering (GVN)
  • Constant Propagation
  • Instruction Combining
  • Loop Simplification
  • 死存储消除(DSE)
  • 全局值编号(GVN)
  • 常量传播
  • 指令合并
  • 循环简化

VMP/VM Handler Recovery

VMP/VM处理程序恢复

  • Identify dispatcher patterns
  • Extract VM bytecode semantics
  • Convert handlers to native IR
  • Example: TicklingVMProtect for VMProtect analysis
  • 识别调度器模式
  • 提取VM字节码语义
  • 将处理程序转换为原生IR
  • 示例:TicklingVMProtect用于VMProtect分析

Best Practices

最佳实践

  1. Architecture Support: Handle endianness, calling conventions, ABI differences
  2. Memory Modeling: Accurate memory layout for global/stack variables
  3. External Dependencies: Handle library calls and system calls
  4. Validation: Compare execution traces of original vs lifted code
  5. Incremental Lifting: Support partial program analysis
  1. 架构支持:处理字节序、调用约定、ABI差异
  2. 内存建模:准确的全局/栈变量内存布局
  3. 外部依赖:处理库调用与系统调用
  4. 验证:对比原始代码与提升后代码的执行轨迹
  5. 增量提升:支持部分程序分析

Dynamic Binary Lifting

动态二进制提升

Runtime Translation

运行时转换

  • Instrew: Fast instrumentation through LLVM
  • QBDI: QuarkslaB Dynamic Binary Instrumentation
  • binopt: Runtime optimization of binary code
  • Instrew:基于LLVM的快速插桩工具
  • QBDI:QuarkslaB动态二进制插桩工具
  • binopt:二进制代码的运行时优化工具

JIT Recompilation

JIT重新编译

Lift frequently executed code paths for runtime optimization:
  • Profile-guided lifting
  • Hot path detection
  • Speculative optimization
对频繁执行的代码路径进行提升以实现运行时优化:
  • 基于剖面的提升
  • 热点路径检测
  • 推测性优化

Resources

资源

For a complete list of lifting tools and research papers, refer to the LIFT section in the main README.md.
如需完整的提升工具列表和研究论文,请参考主README.md中的LIFT章节。

Getting Detailed Information

获取详细信息

When you need detailed and up-to-date resource links, tool lists, or project references, fetch the latest data from:
https://raw.githubusercontent.com/gmh5225/awesome-llvm-security/refs/heads/main/README.md
This README contains comprehensive curated lists of:
  • Binary lifting frameworks and tools (LIFT section)
  • Related research papers and documentation
  • Implementation examples and tutorials
当你需要详细且最新的资源链接、工具列表或项目参考时,请从以下地址获取最新数据:
https://raw.githubusercontent.com/gmh5225/awesome-llvm-security/refs/heads/main/README.md
该README包含以下内容的综合精选列表:
  • 二进制提升框架与工具(LIFT章节)
  • 相关研究论文与文档
  • 实现示例与教程