extract-elf
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseELF Binary Data Extraction
ELF二进制数据提取
This skill provides guidance for tasks involving extraction of data from ELF binary files, including reading headers, parsing segments, and converting binary content to structured output formats.
本技能为从ELF二进制文件中提取数据的任务提供指导,包括读取文件头、解析程序段以及将二进制内容转换为结构化输出格式等操作。
Approach Overview
方法概述
ELF extraction tasks typically require:
- Parsing the ELF header to understand file structure
- Reading program headers to identify LOAD segments
- Extracting data from segments at correct virtual addresses
- Converting binary data to the required output format
ELF提取任务通常需要:
- 解析ELF头以了解文件结构
- 读取程序头以识别LOAD段
- 从正确的虚拟地址处提取段中的数据
- 将二进制数据转换为所需的输出格式
Implementation Steps
实施步骤
Step 1: Validate ELF Header
步骤1:验证ELF头
Before processing, verify the file is a valid ELF binary:
- Check magic bytes at offset 0: (hex:
0x7F 'E' 'L' 'F')7f 45 4c 46 - Identify ELF class (32-bit vs 64-bit) at offset 4
- Identify endianness at offset 5 (1 = little-endian, 2 = big-endian)
在处理之前,先验证文件是否为有效的ELF二进制文件:
- 检查偏移量0处的魔数:(十六进制:
0x7F 'E' 'L' 'F')7f 45 4c 46 - 在偏移量4处识别ELF类别(32位 vs 64位)
- 在偏移量5处识别字节序(1 = 小端序,2 = 大端序)
Step 2: Parse ELF Header Fields
步骤2:解析ELF头字段
Extract key header fields based on ELF class:
For 32-bit ELF:
- Program header offset: bytes 28-31
- Program header entry size: bytes 42-43
- Number of program headers: bytes 44-45
For 64-bit ELF:
- Program header offset: bytes 32-39
- Program header entry size: bytes 54-55
- Number of program headers: bytes 56-57
根据ELF类别提取关键头字段:
对于32位ELF:
- 程序头偏移量:字节28-31
- 程序头条目大小:字节42-43
- 程序头数量:字节44-45
对于64位ELF:
- 程序头偏移量:字节32-39
- 程序头条目大小:字节54-55
- 程序头数量:字节56-57
Step 3: Process Program Headers
步骤3:处理程序头
Iterate through program headers and identify LOAD segments (type = 1):
- Extract virtual address (p_vaddr)
- Extract file offset (p_offset)
- Extract file size (p_filesz)
- Extract memory size (p_memsz)
遍历程序头并识别LOAD段(类型=1):
- 提取虚拟地址(p_vaddr)
- 提取文件偏移量(p_offset)
- 提取文件大小(p_filesz)
- 提取内存大小(p_memsz)
Step 4: Extract Segment Data
步骤4:提取段数据
For each LOAD segment:
- Read data from file at p_offset
- Map data to virtual addresses starting at p_vaddr
- Handle alignment and padding as specified
对于每个LOAD段:
- 从文件的p_offset处读取数据
- 将数据映射到从p_vaddr开始的虚拟地址
- 按照指定要求处理对齐和填充
Critical Data Type Considerations
关键数据类型注意事项
Signed vs Unsigned Integers
有符号与无符号整数
This is the most common source of errors in binary extraction tasks.
When reading multi-byte integer values from binary data:
- Memory addresses are always unsigned
- Size fields are always unsigned
- Data values should typically be read as unsigned unless the task explicitly requires signed interpretation
Common API distinctions:
- Node.js Buffer: vs
readUInt32LEreadInt32LE - Python struct: (unsigned) vs
'I'(signed)'i' - C/C++: vs
uint32_tint32_t
Verification: If output contains negative numbers but the expected output shows only positive integers, the wrong signedness was used.
这是二进制提取任务中最常见的错误来源。
从二进制数据中读取多字节整数值时:
- 内存地址始终为无符号
- 大小字段始终为无符号
- 数据值通常应读取为无符号,除非任务明确要求有符号解释
常见API的区别:
- Node.js Buffer:vs
readUInt32LEreadInt32LE - Python struct:(无符号) vs
'I'(有符号)'i' - C/C++:vs
uint32_tint32_t
验证方法:如果输出中包含负数,但预期输出只有正整数,则说明使用了错误的符号类型。
Endianness
字节序
Match the endianness specified in the ELF header:
- Little-endian (most common on x86/x64): Use variants
LE - Big-endian: Use variants
BE
匹配ELF头中指定的字节序:
- 小端序(x86/x64平台最常见):使用变体方法
LE - 大端序:使用变体方法
BE
Integer Sizes
整数大小
ELF fields vary by class:
- 32-bit ELF: addresses and offsets are 4 bytes
- 64-bit ELF: addresses and offsets are 8 bytes
ELF字段大小因类别而异:
- 32位ELF:地址和偏移量为4字节
- 64位ELF:地址和偏移量为8字节
Verification Strategies
验证策略
Before Declaring Success
确认成功前的检查
- Validate output format: Ensure JSON is well-formed, keys are correct types
- Check address ranges: Verify addresses fall within expected segment boundaries
- Sample value verification: Manually compute expected values for a few addresses using hex inspection tools
- 验证输出格式:确保JSON格式正确,键的类型无误
- 检查地址范围:验证地址是否在预期的段边界内
- 样本值验证:使用十六进制检查工具手动计算部分地址的预期值
Manual Verification Commands
手动验证命令
Use these tools to verify extracted values:
bash
undefined使用以下工具验证提取的值:
bash
undefinedView ELF header information
查看ELF头信息
readelf -h <binary>
readelf -h <binary>
View program headers (segments)
查看程序头(段)
readelf -l <binary>
readelf -l <binary>
Dump section contents in hex
以十六进制转储段内容
objdump -s <binary>
objdump -s <binary>
View raw hex bytes at specific offset
查看特定偏移量处的原始十六进制字节
xxd -s <offset> -l <length> <binary>
xxd -s <offset> -l <length> <binary>
Calculate expected value from hex bytes (little-endian example)
从十六进制字节计算预期值(小端序示例)
For bytes: 41 42 43 44 -> value = 0x44434241 = 1145258561
对于字节:41 42 43 44 -> 值 = 0x44434241 = 1145258561
undefinedundefinedValue Sanity Checks
值的合理性检查
- If the example output shows only positive integers, verify output contains no negative values
- Compare a few computed values against manual hex calculation
- Verify address coverage matches expected segment ranges
- 如果示例输出仅包含正整数,验证输出中没有负数
- 将部分计算值与手动十六进制计算结果进行比较
- 验证地址覆盖范围是否与预期的段范围匹配
Common Pitfalls
常见陷阱
-
Using signed integer reads for unsigned data - Results in negative numbers for values with high bit set (e.g., -98693133 instead of 4196274163)
-
Incorrect endianness handling - Produces completely wrong values; verify against ELF header byte 5
-
Off-by-one errors in segment boundaries - Carefully track whether sizes are inclusive/exclusive
-
Assuming 4-byte alignment - Check if segment sizes are multiples of the read size; handle partial reads at boundaries
-
Mixing 32-bit and 64-bit field sizes - Always check ELF class and use appropriate field sizes
-
Overconfidence without verification - Never assume "values are read directly from binary, so they should match" - always verify sample values manually
-
对无符号数据使用有符号整数读取 - 当值的高位被设置时会产生负数(例如:-98693133 而非 4196274163)
-
字节序处理错误 - 会产生完全错误的值;请验证ELF头的第5个字节
-
段边界的差一错误 - 仔细跟踪大小是包含性还是排他性
-
假设4字节对齐 - 检查段大小是否为读取大小的倍数;处理边界处的部分读取
-
混淆32位和64位字段大小 - 始终检查ELF类别并使用相应的字段大小
-
未验证就过度自信 - 永远不要假设“值直接从二进制读取,所以应该匹配” - 始终手动验证样本值
Output Format Considerations
输出格式注意事项
When producing structured output (e.g., JSON):
- Use string keys for addresses if they need to be JSON object keys (JSON requires string keys)
- Ensure integer values are within JavaScript/JSON safe integer range (2^53 - 1 for full precision)
- Consider whether addresses should be decimal or hexadecimal strings based on task requirements
生成结构化输出(如JSON)时:
- 如果地址需要作为JSON对象的键,请使用字符串类型的键(JSON要求键为字符串)
- 确保整数值在JavaScript/JSON的安全整数范围内(完全精确的范围是2^53 - 1)
- 根据任务要求考虑地址应使用十进制还是十六进制字符串