extract-elf

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

ELF Binary Data Extraction

ELF二进制数据提取

This skill provides guidance for tasks involving extraction of data from ELF binary files, including reading headers, parsing segments, and converting binary content to structured output formats.

本技能为从ELF二进制文件中提取数据的任务提供指导，包括读取文件头、解析程序段以及将二进制内容转换为结构化输出格式等操作。

Approach Overview

方法概述

ELF extraction tasks typically require:

Parsing the ELF header to understand file structure
Reading program headers to identify LOAD segments
Extracting data from segments at correct virtual addresses
Converting binary data to the required output format

ELF提取任务通常需要：

解析ELF头以了解文件结构
读取程序头以识别LOAD段
从正确的虚拟地址处提取段中的数据
将二进制数据转换为所需的输出格式

Implementation Steps

实施步骤

Step 1: Validate ELF Header

步骤1：验证ELF头

Before processing, verify the file is a valid ELF binary:

Check magic bytes at offset 0:
```
0x7F 'E' 'L' 'F'
```
(hex:
```
7f 45 4c 46
```
)
Identify ELF class (32-bit vs 64-bit) at offset 4
Identify endianness at offset 5 (1 = little-endian, 2 = big-endian)

在处理之前，先验证文件是否为有效的ELF二进制文件：

检查偏移量0处的魔数：
```
0x7F 'E' 'L' 'F'
```
（十六进制：
```
7f 45 4c 46
```
）
在偏移量4处识别ELF类别（32位 vs 64位）
在偏移量5处识别字节序（1 = 小端序，2 = 大端序）

Step 2: Parse ELF Header Fields

步骤2：解析ELF头字段

Extract key header fields based on ELF class:

For 32-bit ELF:

Program header offset: bytes 28-31
Program header entry size: bytes 42-43
Number of program headers: bytes 44-45

For 64-bit ELF:

Program header offset: bytes 32-39
Program header entry size: bytes 54-55
Number of program headers: bytes 56-57

根据ELF类别提取关键头字段：

对于32位ELF：

程序头偏移量：字节28-31
程序头条目大小：字节42-43
程序头数量：字节44-45

对于64位ELF：

程序头偏移量：字节32-39
程序头条目大小：字节54-55
程序头数量：字节56-57

Step 3: Process Program Headers

步骤3：处理程序头

Iterate through program headers and identify LOAD segments (type = 1):

Extract virtual address (p_vaddr)
Extract file offset (p_offset)
Extract file size (p_filesz)
Extract memory size (p_memsz)

遍历程序头并识别LOAD段（类型=1）：

提取虚拟地址（p_vaddr）
提取文件偏移量（p_offset）
提取文件大小（p_filesz）
提取内存大小（p_memsz）

Step 4: Extract Segment Data

步骤4：提取段数据

For each LOAD segment:

Read data from file at p_offset
Map data to virtual addresses starting at p_vaddr
Handle alignment and padding as specified

对于每个LOAD段：

从文件的p_offset处读取数据
将数据映射到从p_vaddr开始的虚拟地址
按照指定要求处理对齐和填充

Critical Data Type Considerations

关键数据类型注意事项

Signed vs Unsigned Integers

有符号与无符号整数

This is the most common source of errors in binary extraction tasks.

When reading multi-byte integer values from binary data:

Memory addresses are always unsigned
Size fields are always unsigned
Data values should typically be read as unsigned unless the task explicitly requires signed interpretation

Common API distinctions:

Node.js Buffer:
```
readUInt32LE
```
vs
```
readInt32LE
```
Python struct:
```
'I'
```
(unsigned) vs
```
'i'
```
(signed)
C/C++:
```
uint32_t
```
vs
```
int32_t
```

Verification: If output contains negative numbers but the expected output shows only positive integers, the wrong signedness was used.

这是二进制提取任务中最常见的错误来源。

从二进制数据中读取多字节整数值时：

内存地址始终为无符号
大小字段始终为无符号
数据值通常应读取为无符号，除非任务明确要求有符号解释

常见API的区别：

Node.js Buffer：
```
readUInt32LE
```
vs
```
readInt32LE
```
Python struct：
```
'I'
```
（无符号） vs
```
'i'
```
（有符号）
C/C++：
```
uint32_t
```
vs
```
int32_t
```

验证方法：如果输出中包含负数，但预期输出只有正整数，则说明使用了错误的符号类型。

Endianness

字节序

Match the endianness specified in the ELF header:

Little-endian (most common on x86/x64): Use
```
LE
```
variants
Big-endian: Use
```
BE
```
variants

匹配ELF头中指定的字节序：

小端序（x86/x64平台最常见）：使用
```
LE
```
变体方法
大端序：使用
```
BE
```
变体方法

Integer Sizes

整数大小

ELF fields vary by class:

32-bit ELF: addresses and offsets are 4 bytes
64-bit ELF: addresses and offsets are 8 bytes

ELF字段大小因类别而异：

32位ELF：地址和偏移量为4字节
64位ELF：地址和偏移量为8字节

Verification Strategies

验证策略

Before Declaring Success

确认成功前的检查

Validate output format: Ensure JSON is well-formed, keys are correct types
Check address ranges: Verify addresses fall within expected segment boundaries
Sample value verification: Manually compute expected values for a few addresses using hex inspection tools

验证输出格式：确保JSON格式正确，键的类型无误
检查地址范围：验证地址是否在预期的段边界内
样本值验证：使用十六进制检查工具手动计算部分地址的预期值

Manual Verification Commands

手动验证命令

Use these tools to verify extracted values:

bash

undefined

使用以下工具验证提取的值：

bash

undefined

View ELF header information

查看ELF头信息

readelf -h <binary>

View program headers (segments)

查看程序头（段）

readelf -l <binary>

Dump section contents in hex

以十六进制转储段内容

objdump -s <binary>

View raw hex bytes at specific offset

查看特定偏移量处的原始十六进制字节

xxd -s <offset> -l <length> <binary>

Calculate expected value from hex bytes (little-endian example)

从十六进制字节计算预期值（小端序示例）

For bytes: 41 42 43 44 -> value = 0x44434241 = 1145258561

对于字节：41 42 43 44 -> 值 = 0x44434241 = 1145258561

undefined

undefined

Value Sanity Checks

值的合理性检查

If the example output shows only positive integers, verify output contains no negative values
Compare a few computed values against manual hex calculation
Verify address coverage matches expected segment ranges

如果示例输出仅包含正整数，验证输出中没有负数
将部分计算值与手动十六进制计算结果进行比较
验证地址覆盖范围是否与预期的段范围匹配

Common Pitfalls

常见陷阱

Using signed integer reads for unsigned data - Results in negative numbers for values with high bit set (e.g., -98693133 instead of 4196274163)
Incorrect endianness handling - Produces completely wrong values; verify against ELF header byte 5
Off-by-one errors in segment boundaries - Carefully track whether sizes are inclusive/exclusive
Assuming 4-byte alignment - Check if segment sizes are multiples of the read size; handle partial reads at boundaries
Mixing 32-bit and 64-bit field sizes - Always check ELF class and use appropriate field sizes
Overconfidence without verification - Never assume "values are read directly from binary, so they should match" - always verify sample values manually

对无符号数据使用有符号整数读取 - 当值的高位被设置时会产生负数（例如：-98693133 而非 4196274163）
字节序处理错误 - 会产生完全错误的值；请验证ELF头的第5个字节
段边界的差一错误 - 仔细跟踪大小是包含性还是排他性
假设4字节对齐 - 检查段大小是否为读取大小的倍数；处理边界处的部分读取
混淆32位和64位字段大小 - 始终检查ELF类别并使用相应的字段大小
未验证就过度自信 - 永远不要假设“值直接从二进制读取，所以应该匹配” - 始终手动验证样本值

Output Format Considerations

输出格式注意事项

When producing structured output (e.g., JSON):

Use string keys for addresses if they need to be JSON object keys (JSON requires string keys)
Ensure integer values are within JavaScript/JSON safe integer range (2^53 - 1 for full precision)
Consider whether addresses should be decimal or hexadecimal strings based on task requirements

生成结构化输出（如JSON）时：

如果地址需要作为JSON对象的键，请使用字符串类型的键（JSON要求键为字符串）
确保整数值在JavaScript/JSON的安全整数范围内（完全精确的范围是2^53 - 1）
根据任务要求考虑地址应使用十进制还是十六进制字符串