yara-rule-authoring

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

YARA-X Rule Authoring

YARA-X规则编写指南

Write detection rules that catch malware without drowning in false positives.

This skill targets YARA-X, the Rust-based successor to legacy YARA. YARA-X powers VirusTotal's production systems and is the recommended implementation. See Migrating from Legacy YARA if you have existing rules.

编写能够精准检测恶意软件且不会产生大量误报的检测规则。

本指南针对YARA-X，它是基于Rust开发的旧版YARA的替代方案。YARA-X为VirusTotal的生产系统提供支持，是推荐使用的版本。如果您已有旧版规则，请查看从旧版YARA迁移章节。

Core Principles

核心原则

Strings must generate good atoms — YARA extracts 4-byte subsequences for fast matching. Strings with repeated bytes, common sequences, or under 4 bytes force slow bytecode verification on too many files.
Target specific families, not categories — "Detects ransomware" catches everything and nothing. "Detects LockBit 3.0 configuration extraction routine" catches what you want.
Test against goodware before deployment — A rule that fires on Windows system files is useless. Validate against VirusTotal's goodware corpus or your own clean file set.
Short-circuit with cheap checks first — Put
```
filesize < 10MB and uint16(0) == 0x5A4D
```
before expensive string searches or module calls.
Metadata is documentation — Future you (and your team) need to know what this catches, why, and where the sample came from.

字符串必须生成优质原子 —— YARA会提取4字节子序列以实现快速匹配。包含重复字节、常见序列或长度不足4字节的字符串会导致对过多文件执行缓慢的字节码验证。
针对特定家族，而非类别 —— “检测勒索软件”这类规则范围过宽，毫无针对性。“检测LockBit 3.0配置提取例程”才能精准命中目标。
部署前针对良性软件测试 —— 触发Windows系统文件的规则毫无用处。请通过VirusTotal的良性软件语料库或您自己的干净文件集验证规则。
先执行低成本检查以短路后续逻辑 —— 在昂贵的字符串搜索或模块调用前，先添加
```
filesize < 10MB and uint16(0) == 0x5A4D
```
这类条件。
元数据即文档 —— 未来的您（以及团队成员）需要了解该规则检测的对象、原因以及样本来源。

When to Use

适用场景

Writing new YARA-X rules for malware detection
Reviewing existing rules for quality or performance issues
Optimizing slow-running rulesets
Converting IOCs or threat intel into detection signatures
Debugging false positive issues
Preparing rules for production deployment
Migrating legacy YARA rules to YARA-X
Analyzing Chrome extensions (crx module)
Analyzing Android apps (dex module)

为恶意软件检测编写新的YARA-X规则
审核现有规则的质量或性能问题
优化运行缓慢的规则集
将IOC或威胁情报转换为检测签名
调试误报问题
为生产部署准备规则
将旧版YARA规则迁移至YARA-X
分析Chrome扩展程序（crx模块）
分析Android应用（dex模块）

When NOT to Use

不适用场景

Static analysis requiring disassembly → use Ghidra/IDA skills
Dynamic malware analysis → use sandbox analysis skills
Network-based detection → use Suricata/Snort skills
Memory forensics with Volatility → use memory forensics skills
Simple hash-based detection → just use hash lists

需要反汇编的静态分析 → 使用Ghidra/IDA相关技能
恶意软件动态分析 → 使用沙箱分析技能
基于网络的检测 → 使用Suricata/Snort技能
使用Volatility进行内存取证 → 使用内存取证技能
简单的基于哈希的检测 → 直接使用哈希列表即可

YARA-X Overview

YARA-X概述

YARA-X is the Rust-based successor to legacy YARA: 5-10x faster regex, better errors, built-in formatter, stricter validation, new modules (crx, dex), 99% rule compatibility.

Install:

brew install yara-x

(macOS) or

cargo install yara-x

Essential commands:

yr scan

yr check

yr fmt

yr dump

YARA-X是基于Rust开发的旧版YARA的替代方案：正则表达式速度提升5-10倍，错误提示更友好，内置格式化工具，验证更严格，新增crx、dex等模块，规则兼容性达99%。

安装方式：

brew install yara-x

（macOS）或

cargo install yara-x

核心命令：

yr scan

、

yr check

、

yr fmt

、

yr dump

Platform Considerations

平台注意事项

YARA works on any file type. Adapt patterns to your target:

Platform	Magic Bytes	Bad Strings	Good Strings
Windows PE	`uint16(0) == 0x5A4D`	API names, Windows paths	Mutex names, PDB paths
macOS Mach-O	`uint32(0) == 0xFEEDFACE` (32-bit), `0xFEEDFACF` (64-bit), `0xCAFEBABE` (universal)	Common Obj-C methods	Keylogger strings, persistence paths
JavaScript/Node	(none needed)	`require` , `fetch` , `axios`	Obfuscator signatures, eval+decode chains
npm/pip packages	(none needed)	`postinstall` , `dependencies`	Suspicious package names, exfil URLs
Office docs	`uint32(0) == 0x504B0304`	VBA keywords	Macro auto-exec, encoded payloads
VS Code extensions	(none needed)	`vscode.workspace`	Uncommon activationEvents, hidden file access
Chrome extensions	Use `crx` module	Common Chrome APIs	Permission abuse, manifest anomalies
Android apps	Use `dex` module	Standard DEX structure	Obfuscated classes, suspicious permissions

YARA适用于任何文件类型。请根据目标平台调整规则模式：

平台	魔术字节	不推荐的字符串	推荐的字符串
Windows PE	`uint16(0) == 0x5A4D`	API名称、Windows路径	互斥体名称、PDB路径
macOS Mach-O	`uint32(0) == 0xFEEDFACE` （32位）、 `0xFEEDFACF` （64位）、 `0xCAFEBABE` （通用二进制）	常见Obj-C方法	键盘记录器字符串、持久化路径
JavaScript/Node	（无需）	`require` 、 `fetch` 、 `axios`	混淆器签名、eval+解码链
npm/pip包	（无需）	`postinstall` 、 `dependencies`	可疑包名称、数据外渗URL
Office文档	`uint32(0) == 0x504B0304`	VBA关键字	宏自动执行、编码载荷
VS Code扩展	（无需）	`vscode.workspace`	不常见的activationEvents、隐藏文件访问
Chrome扩展	使用 `crx` 模块	常见Chrome API	权限滥用、清单异常
Android应用	使用 `dex` 模块	标准DEX结构	混淆类、可疑权限

macOS Malware Detection

macOS恶意软件检测

No dedicated Mach-O module exists yet. Use magic byte checks + string patterns:

Magic bytes:

yara

// Mach-O 32-bit
uint32(0) == 0xFEEDFACE
// Mach-O 64-bit
uint32(0) == 0xFEEDFACF
// Universal binary (fat binary)
uint32(0) == 0xCAFEBABE or uint32(0) == 0xBEBAFECA

Good indicators for macOS malware:

Keylogger artifacts:
```
CGEventTapCreate
```
,
```
kCGEventKeyDown
```
SSH tunnel strings:
```
ssh -D
```
,
```
tunnel
```
,
```
socks
```

Persistence paths:

~/Library/LaunchAgents

/Library/LaunchDaemons

Credential theft:
```
security find-generic-password
```
,
```
keychain
```

Example pattern from Airbnb BinaryAlert:

yara

rule SUSP_Mac_ProtonRAT
{
    strings:
        // Library indicators
        $lib1 = "SRWebSocket" ascii
        $lib2 = "SocketRocket" ascii

        // Behavioral indicators
        $behav1 = "SSH tunnel not launched" ascii
        $behav2 = "Keylogger" ascii

    condition:
        (uint32(0) == 0xFEEDFACF or uint32(0) == 0xCAFEBABE) and
        any of ($lib*) and any of ($behav*)
}

目前尚无专门的Mach-O模块。请使用魔术字节检查+字符串模式：

魔术字节：

yara

// Mach-O 32位
uint32(0) == 0xFEEDFACE
// Mach-O 64位
uint32(0) == 0xFEEDFACF
// 通用二进制（胖二进制）
uint32(0) == 0xCAFEBABE or uint32(0) == 0xBEBAFECA

macOS恶意软件的良好检测指标：

键盘记录器痕迹：
```
CGEventTapCreate
```
、
```
kCGEventKeyDown
```
SSH隧道字符串：
```
ssh -D
```
、
```
tunnel
```
、
```
socks
```

持久化路径：

~/Library/LaunchAgents

、

/Library/LaunchDaemons

凭证窃取：
```
security find-generic-password
```
、
```
keychain
```

来自Airbnb BinaryAlert的示例模式：

yara

rule SUSP_Mac_ProtonRAT
{
    strings:
        // 库指标
        $lib1 = "SRWebSocket" ascii
        $lib2 = "SocketRocket" ascii

        // 行为指标
        $behav1 = "SSH tunnel not launched" ascii
        $behav2 = "Keylogger" ascii

    condition:
        (uint32(0) == 0xFEEDFACF or uint32(0) == 0xCAFEBABE) and
        any of ($lib*) and any of ($behav*)
}

JavaScript Detection Decision Tree

JavaScript检测决策树

Writing a JavaScript rule?
├─ npm package?
│  ├─ Check package.json patterns
│  ├─ Look for postinstall/preinstall hooks
│  └─ Target exfil patterns: fetch + env access + credential paths
├─ Browser extension?
│  ├─ Chrome: Use crx module
│  └─ Others: Target manifest patterns, background script behaviors
├─ Standalone JS file?
│  ├─ Look for obfuscation markers: eval+atob, fromCharCode chains
│  ├─ Target unique function/variable names (often survive minification)
│  └─ Check for packed/encoded payloads
└─ Minified/webpack bundle?
   ├─ Target unique strings that survive bundling (URLs, magic values)
   └─ Avoid function names (will be mangled)

JavaScript-specific good strings:

Ethereum function selectors:
```
{ 70 a0 82 31 }
```
(transfer)
Zero-width characters (steganography):
```
{ E2 80 8B E2 80 8C }
```
Obfuscator signatures:
```
_0x
```
,
```
var _0x
```
Specific C2 patterns: domain names, webhook URLs

JavaScript-specific bad strings:

```
require
```
,
```
fetch
```
,
```
axios
```
— too common
```
Buffer
```
,
```
crypto
```
— legitimate uses everywhere
```
process.env
```
alone — need specific env var names

编写JavaScript规则？
├─ npm包？
│  ├─ 检查package.json模式
│  ├─ 查找postinstall/preinstall钩子
│  └─ 针对数据外渗模式：fetch + 环境变量访问 + 凭证路径
├─ 浏览器扩展？
│  ├─ Chrome：使用crx模块
│  └─ 其他：针对清单模式、后台脚本行为
├─ 独立JS文件？
│  ├─ 查找混淆标记：eval+atob、fromCharCode链
│  ├─ 针对唯一的函数/变量名称（通常在压缩后仍保留）
│  └─ 检查打包/编码的载荷
└─ 压缩/webpack打包文件？
   ├─ 针对打包后仍保留的唯一字符串（URL、魔术值）
   └─ 避免使用函数名称（会被混淆）

JavaScript专属的优质字符串：

Ethereum函数选择器：
```
{ 70 a0 82 31 }
```
（转账）
零宽字符（隐写术）：
```
{ E2 80 8B E2 80 8C }
```
混淆器签名：
```
_0x
```
、
```
var _0x
```
特定C2模式：域名、Webhook URL

JavaScript专属的不推荐字符串：

```
require
```
、
```
fetch
```
、
```
axios
```
—— 过于常见
```
Buffer
```
、
```
crypto
```
—— 合法场景广泛使用
单独的
```
process.env
```
—— 需要结合具体环境变量名称

Essential Toolkit

核心工具集

Tool	Purpose
yarGen	Extract candidate strings: `yarGen.py -m samples/ --excludegood` → validate with `yr check`
FLOSS	Extract obfuscated/stack strings: `floss sample.exe` (when yarGen fails)
yr CLI	Validate: `yr check` , scan: `yr scan -s` , inspect: `yr dump -m pe`
signature-base	Study quality examples
YARA-CI	Goodware corpus testing before deployment

Master these five. Don't get distracted by tool catalogs.

工具	用途
yarGen	提取候选字符串： `yarGen.py -m samples/ --excludegood` → 使用 `yr check` 验证
FLOSS	提取混淆/栈字符串： `floss sample.exe` （当yarGen失效时）
yr CLI	验证： `yr check` ，扫描： `yr scan -s` ，检查： `yr dump -m pe`
signature-base	学习优质示例
YARA-CI	部署前的良性软件语料库测试

掌握这五个工具即可，无需被繁多的工具目录分散注意力。

Rationalizations to Reject

需要摒弃的错误观念

When you catch yourself thinking these, stop and reconsider.

Rationalization	Expert Response
"This generic string is unique enough"	Test against goodware first. Your intuition is wrong.
"yarGen gave me these strings"	yarGen suggests, you validate. Check each one manually.
"It works on my 10 samples"	10 samples ≠ production. Use VirusTotal goodware corpus.
"One rule to catch all variants"	Causes FP floods. Target specific families.
"I'll make it more specific if we get FPs"	Write tight rules upfront. FPs burn trust.
"This hex pattern is unique"	Unique in one sample ≠ unique across malware ecosystem.
"Performance doesn't matter"	One slow rule slows entire ruleset. Optimize atoms.
"PEiD rules still work"	Obsolete. 32-bit packers aren't relevant.
"I'll add more conditions later"	Weak rules deployed = damage done.
"This is just for hunting"	Hunting rules become detection rules. Same quality bar.
"The API name makes it malicious"	Legitimate software uses same APIs. Need behavioral context.
"any of them is fine for these common strings"	Common strings + any = FP flood. Use `any of` only for individually unique strings.
"This regex is specific enough"	`/fetch.*token/` matches all auth code. Add exfil destination requirement.
"The JavaScript looks clean"	Attackers poison legitimate code with injects. Check for eval+decode chains.
"I'll use .* for flexibility"	Unbounded regex = performance disaster + memory explosion. Use `.{0,30}` .
"I'll use --relaxed-re-syntax everywhere"	Masks real bugs. Fix the regex instead of hiding problems.

当您出现以下想法时，请立即停止并重新考虑：

错误观念	专家回应
“这个通用字符串足够独特”	先针对良性软件测试。您的直觉是错误的。
“yarGen给了我这些字符串”	yarGen仅提供建议，您需要手动验证每个字符串。
“它在我的10个样本上有效”	10个样本≠生产环境。请使用VirusTotal的良性软件语料库。
“一个规则就能检测所有变体”	会导致大量误报。请针对特定家族编写规则。
“如果出现误报我再优化”	从一开始就编写严格的规则。误报会消耗信任。
“这个十六进制模式是独特的”	在一个样本中独特≠在整个恶意软件生态中独特。
“性能不重要”	一个慢规则会拖慢整个规则集。请优化原子。
“PEiD规则仍然有效”	已过时。32位打包器已不再相关。
“我稍后再添加更多条件”	部署弱规则=已造成损害。
“这只是用于狩猎”	狩猎规则最终会变成检测规则。需遵循相同的质量标准。
“这个API名称表明它是恶意的”	合法软件也会使用相同的API。需要结合行为上下文。
“对于这些常见字符串，any of就足够了”	常见字符串+any of=大量误报。仅对本身独特的字符串使用 `any of` 。
“这个正则表达式足够具体”	`/fetch.*token/` 会匹配所有认证代码。需添加数据外渗目的地要求。
“这段JavaScript看起来很干净”	攻击者会在合法代码中注入恶意内容。请检查eval+解码链。
“我会用.*来增加灵活性”	无界正则表达式=性能灾难+内存爆炸。请使用 `.{0,30}` 。
“我会在所有地方使用--relaxed-re-syntax”	会掩盖真正的漏洞。请修复正则表达式而非隐藏问题。

Decision Trees

决策树

Is This String Good Enough?

这个字符串是否足够优质？

Is this string good enough?
├─ Less than 4 bytes?
│  └─ NO — find longer string
├─ Contains repeated bytes (0000, 9090)?
│  └─ NO — add surrounding context
├─ Is an API name (VirtualAlloc, CreateRemoteThread)?
│  └─ NO — use hex pattern of call site instead
├─ Appears in Windows system files?
│  └─ NO — too generic, find something unique
├─ Is it a common path (C:\Windows\, cmd.exe)?
│  └─ NO — find malware-specific paths
├─ Unique to this malware family?
│  └─ YES — use it
└─ Appears in other malware too?
   └─ MAYBE — combine with family-specific marker

这个字符串是否足够优质？
├─ 长度不足4字节？
│  └─ 否 —— 寻找更长的字符串
├─ 包含重复字节（如0000、9090）？
│  └─ 否 —— 添加上下文信息
├─ 是API名称（如VirtualAlloc、CreateRemoteThread）？
│  └─ 否 —— 使用调用站点的十六进制模式替代
├─ 出现在Windows系统文件中？
│  └─ 否 —— 过于通用，寻找独特的内容
├─ 是常见路径（如C:\Windows\、cmd.exe）？
│  └─ 否 —— 寻找恶意软件专属路径
├─ 仅属于该恶意软件家族？
│  └─ 是 —— 使用该字符串
└─ 也出现在其他恶意软件中？
   └─ 可能 —— 结合家族专属标记使用

When to Use "all of" vs "any of"

何时使用“all of” vs “any of”

Should I require all strings or allow any?
├─ Strings are individually unique to malware?
│  └─ any of them (each alone is suspicious)
├─ Strings are common but combination is suspicious?
│  └─ all of them (require the full pattern)
├─ Strings have different confidence levels?
│  └─ Group: all of ($core_*) and any of ($variant_*)
└─ Seeing many false positives?
   └─ Tighten: switch any → all, add more required strings

Lesson from production: Rules using

any of ($network_*)

where strings included "fetch", "axios", "http" matched virtually all web applications. Switching to require credential path AND network call AND exfil destination eliminated FPs.

我应该要求匹配所有字符串还是允许匹配任意一个？
├─ 字符串本身对恶意软件来说是独特的？
│  └─ any of them（每个字符串单独出现就可疑）
├─ 字符串单独常见，但组合起来可疑？
│  └─ all of them（需要完整的模式）
├─ 字符串的置信度不同？
│  └─ 分组：all of ($core_*) and any of ($variant_*)
└─ 出现大量误报？
   └─ 收紧规则：将any改为all，添加更多必填字符串

生产经验教训： 使用

any of ($network_*)

且字符串包含“fetch”、“axios”、“http”的规则几乎匹配了所有Web应用。改为要求凭证路径+网络调用+数据外渗目的地后，误报完全消除。

When to Abandon a Rule Approach

何时放弃某一规则方案

Stop and pivot when:

yarGen returns only API names and paths → See When Strings Fail, Pivot to Structure
Can't find 3 unique strings → Probably packed. Target the unpacked version or detect the packer.
Rule matches goodware files → Strings aren't unique enough. 1-2 matches = investigate and tighten; 3-5 matches = find different indicators; 6+ matches = start over.
Performance is terrible even after optimization → Architecture problem. Split into multiple focused rules or add strict pre-filters.
Description is hard to write → The rule is too vague. If you can't explain what it catches, it catches too much.

出现以下情况时，请停止并转向其他方案：

yarGen仅返回API名称和路径 → 查看当字符串失效时，转向结构检测
无法找到3个独特字符串 → 样本可能被打包。请针对解包后的版本或检测打包器本身。
规则匹配良性软件文件 → 字符串不够独特。1-2个匹配=调查并收紧规则；3-5个匹配=寻找其他指标；6个以上匹配=重新开始。
即使优化后性能仍然很差 → 架构问题。拆分为多个聚焦的规则或添加严格的前置过滤器。
规则描述难以撰写 → 规则过于模糊。如果您无法解释它检测的内容，说明它的范围太广。

Debugging False Positives

调试误报

FP Investigation Flow:
│
├─ 1. Which string matched?
│     Run: yr scan -s rule.yar false_positive.exe
│
├─ 2. Is it in a legitimate library?
│     └─ Add: not $fp_vendor_string exclusion
│
├─ 3. Is it a common development pattern?
│     └─ Find more specific indicator, replace the string
│
├─ 4. Are multiple generic strings matching together?
│     └─ Tighten to require all + add unique marker
│
└─ 5. Is the malware using common techniques?
      └─ Target malware-specific implementation details, not the technique

误报调查流程：
│
├─ 1. 哪个字符串匹配了？
│     运行：yr scan -s rule.yar false_positive.exe
│
├─ 2. 它是否来自合法库？
│     └─ 添加：not $fp_vendor_string 排除规则
│
├─ 3. 它是否是常见的开发模式？
│     └─ 寻找更具体的指标，替换该字符串
│
├─ 4. 多个通用字符串同时匹配？
│     └─ 收紧规则：要求匹配所有字符串 + 添加独特标记
│
└─ 5. 恶意软件使用了常见技术？
      └─ 针对恶意软件专属的实现细节，而非技术本身

Hex vs Text vs Regex

十六进制、文本与正则表达式的选择

What string type should I use?
│
├─ Exact ASCII/Unicode text?
│  └─ TEXT: $s = "MutexName" ascii wide
│
├─ Specific byte sequence?
│  └─ HEX: $h = { 4D 5A 90 00 }
│
├─ Byte sequence with variation?
│  └─ HEX with wildcards: { 4D 5A ?? ?? 50 45 }
│
├─ Pattern with structure (URLs, paths)?
│  └─ BOUNDED REGEX: /https:\/\/[a-z]{5,20}\.onion/
│
└─ Unknown encoding (XOR, base64)?
   └─ TEXT with modifier: $s = "config" xor(0x00-0xFF)

我应该使用哪种字符串类型？
│
├─ 精确的ASCII/Unicode文本？
│  └─ TEXT: $s = "MutexName" ascii wide
│
├─ 特定字节序列？
│  └─ HEX: $h = { 4D 5A 90 00 }
│
├─ 带有变体的字节序列？
│  └─ 带通配符的HEX: { 4D 5A ?? ?? 50 45 }
│
├─ 带有结构的模式（URL、路径）？
│  └─ 有界正则表达式: /https:\/\/[a-z]{5,20}\.onion/
│
└─ 未知编码（XOR、base64）？
   └─ 带修饰符的TEXT: $s = "config" xor(0x00-0xFF)

Is the Sample Packed? (Check First)

样本是否被打包？（先检查）

Before writing any string-based rule:

Is the sample packed?
├─ Entropy > 7.0?
│  └─ Likely packed — find unpacked layer first
├─ Few/no readable strings?
│  └─ Likely packed — use entropy, PE structure, or packer signatures
├─ UPX/MPRESS/custom packer detected?
│  └─ Target the unpacked payload OR detect the packer itself
└─ Readable strings available?
   └─ Proceed with string-based detection

Expert guidance: Don't write rules against packed layers. The packing changes; the payload doesn't.

在编写任何基于字符串的规则前：

样本是否被打包？
├─ 熵值>7.0？
│  └─ 可能被打包 —— 先找到解包后的层
├─ 几乎没有可读字符串？
│  └─ 可能被打包 —— 使用熵值、PE结构或打包器签名检测
├─ 检测到UPX/MPRESS/自定义打包器？
│  └─ 针对解包后的载荷 或 检测打包器本身
└─ 存在可读字符串？
   └─ 继续使用基于字符串的检测

专家建议： 不要针对打包层编写规则。打包方式会变化，但载荷不会。

When Strings Fail, Pivot to Structure

当字符串失效时，转向结构检测

If yarGen returns only API names and generic paths:

String extraction failed — what now?
├─ High entropy sections?
│  └─ Use math.entropy() on specific sections
├─ Unusual imports pattern?
│  └─ Use pe.imphash() for import hash clustering
├─ Consistent PE structure anomalies?
│  └─ Target section names, sizes, characteristics
├─ Metadata present?
│  └─ Target version info, timestamps, resources
└─ Nothing unique?
   └─ This sample may not be detectable with YARA alone

Expert guidance: "One can try to use other file properties, such as metadata, entropy, import hashes or other data which stays constant." — Kaspersky Applied YARA Training

如果yarGen仅返回API名称和通用路径：

字符串提取失败 —— 现在该怎么办？
├─ 高熵区段？
│  └─ 对特定区段使用math.entropy()
├─ 异常的导入模式？
│  └─ 使用pe.imphash()进行导入哈希聚类
├─ 一致的PE结构异常？
│  └─ 针对区段名称、大小、特征
├─ 存在元数据？
│  └─ 针对版本信息、时间戳、资源
└─ 没有独特内容？
   └─ 该样本可能无法仅通过YARA检测

专家建议： “可以尝试使用其他文件属性，如元数据、熵值、导入哈希或其他保持不变的数据。” —— 卡巴斯基YARA应用培训

Expert Heuristics

专家启发式规则

String selection: Mutex names are gold; C2 paths silver; error messages bronze. Stack strings are almost always unique. If you need >6 strings, you're over-fitting.

Condition design: Start with

filesize <

, then magic bytes, then strings, then modules. If >5 lines, split into multiple rules.

Quality signals: yarGen output needs 80% filtering. Rules matching <50% of variants are too narrow; matching goodware are too broad.

Modifier discipline:

Never use
nocase
or
wide
speculatively — only when you have confirmed evidence the case/encoding varies in samples
```
nocase
```
doubles atom generation;
```
wide
```
doubles string matching — both have real costs
"If you don't have a clear reason for using those modifiers, don't do it" — Kaspersky Applied YARA

Regex anchoring:

Regex without a 4+ byte literal substring evaluates at every file offset — catastrophic performance
Always anchor regex to a distinctive literal:
```
/mshta\.exe http:\/\/.../
```
not
```
/http:\/\/.../
```
If you can't anchor, consider hex pattern with wildcards instead

Loop discipline:

Always bound loops with filesize:

filesize < 100KB and for all i in (1..#a) : ...

Unbounded
```
#a
```
can be thousands in large files — exponential slowdown

YARA-X tips:

$_unused

to suppress warnings;

private $s

to hide from output;

yr check

yr fmt

before every commit.

字符串选择： 互斥体名称是黄金指标；C2路径是白银指标；错误信息是青铜指标。栈字符串几乎总是独特的。如果您需要超过6个字符串，说明您过度拟合了。

条件设计： 以

filesize <

开头，然后是魔术字节，接着是字符串，最后是模块调用。如果条件超过5行，拆分为多个规则。

质量信号： yarGen的输出需要过滤掉80%的内容。匹配变体不足50%的规则过于狭窄；匹配良性软件的规则过于宽泛。

修饰符规范：

切勿随意使用
nocase
或
wide
—— 仅当您有确凿证据表明样本中的大小写/编码存在变化时才使用
```
nocase
```
会使原子生成量翻倍；
```
wide
```
会使字符串匹配量翻倍 —— 两者都会带来实际性能损耗
“如果您没有明确的理由使用这些修饰符，就不要使用” —— 卡巴斯基YARA应用培训

正则表达式锚定：

不包含4字节以上字面量子串的正则表达式 会在文件的每个偏移位置进行评估 —— 性能灾难
始终将正则表达式锚定到独特的字面量：
```
/mshta\.exe http:\/\/.../
```
而非
```
/http:\/\/.../
```
如果无法锚定，考虑使用带通配符的十六进制模式替代

循环规范：

始终使用文件大小限制循环：

filesize < 100KB and for all i in (1..#a) : ...

无界的
```
#a
```
在大文件中可能达到数千次 —— 指数级性能下降

YARA-X技巧： 使用

$_unused

抑制警告；使用

private $s

隐藏输出；每次提交前运行

yr check

yr fmt

。

When to Use Modules vs. Byte Checks

何时使用模块 vs 字节检查

Should I use a module or raw bytes?
├─ Need imphash/rich header/authenticode?
│  └─ Use PE module — too complex to replicate
├─ Just checking magic bytes or simple offsets?
│  └─ Use uint16/uint32 — faster, no module overhead
├─ Checking section names/sizes?
│  └─ PE module is cleaner, but add magic bytes filter FIRST
├─ Checking Chrome extension permissions?
│  └─ Use crx module — string parsing is fragile
└─ Checking LNK target paths?
   └─ Use lnk module — LNK format is complex

Expert guidance: "Avoid the magic module — use explicit hex checks instead" — Neo23x0. Apply this principle: if you can do it with uint32(), don't load a module.

我应该使用模块还是原始字节？
├─ 需要imphash/富头/验证码？
│  └─ 使用PE模块 —— 复制实现过于复杂
├─ 仅检查魔术字节或简单偏移？
│  └─ 使用uint16/uint32 —— 速度更快，无模块开销
├─ 检查区段名称/大小？
│  └─ PE模块更简洁，但先添加魔术字节过滤器
├─ 检查Chrome扩展权限？
│  └─ 使用crx模块 —— 字符串解析不可靠
└─ 检查LNK目标路径？
   └─ 使用lnk模块 —— LNK格式复杂

专家建议： “避免使用magic模块 —— 改用显式十六进制检查” —— Neo23x0。遵循此原则：如果可以用uint32()实现，就不要加载模块。

YARA-X New Features

YARA-X新功能

Key additions from recent releases:

Private patterns (v1.3.0+):
```
private $helper = "pattern"
```
— matches but hidden from output
Warning suppression (v1.4.0+):
```
// suppress: slow_pattern
```
inline comments
Numeric underscores (v1.5.0+):
```
filesize < 10_000_000
```
for readability
Built-in formatter:
```
yr fmt rules/
```
to standardize formatting
NDJSON output:
```
yr scan --output-format ndjson
```
for tooling

近期版本的关键新增功能：

私有模式（v1.3.0+）：
```
private $helper = "pattern"
```
—— 会匹配但不会显示在输出中
警告抑制（v1.4.0+）：
```
// suppress: slow_pattern
```
行内注释
数字下划线（v1.5.0+）：
```
filesize < 10_000_000
```
提升可读性
内置格式化工具：
```
yr fmt rules/
```
标准化格式
NDJSON输出：
```
yr scan --output-format ndjson
```
便于工具集成

YARA-X Tooling Workflow

YARA-X工具工作流

YARA-X provides diagnostic tools legacy YARA lacks:

Rule development cycle:

bash

undefined

YARA-X提供了旧版YARA没有的诊断工具：

规则开发周期：

bash

undefined

1. Write initial rule

1. 编写初始规则

2. Check syntax with detailed errors

2. 检查语法并查看详细错误

yr check rule.yar

3. Format consistently

3. 统一格式化

yr fmt -w rule.yar

4. Dump module output to inspect file structure (no dummy rule needed)

4. 导出模块输出以检查文件结构（无需虚拟规则）

yr dump -m pe sample.exe --output-format yaml

5. Scan with timing info

5. 扫描并查看计时信息

time yr scan -s rule.yar corpus/


**When to use `yr dump`:**
- Investigating what PE/ELF/Mach-O fields are available
- Debugging why module conditions aren't matching
- Exploring new modules (crx, lnk, dotnet) before writing rules

**YARA-X diagnostic advantage:** Error messages include precise source locations. If `yr check` points to line 15, the issue is actually on line 15 (unlike legacy YARA).

time yr scan -s rule.yar corpus/


**何时使用`yr dump`：**
- 调查PE/ELF/Mach-O可用的字段
- 调试模块条件不匹配的原因
- 在编写规则前探索新模块（crx、lnk、dotnet）

**YARA-X诊断优势：** 错误信息包含精确的源代码位置。如果`yr check`指向第15行，问题确实出在第15行（与旧版YARA不同）。

Chrome Extension Analysis (crx module)

Chrome扩展分析（crx模块）

The

crx

module enables detection of malicious Chrome extensions. Requires YARA-X v1.5.0+ (basic), v1.11.0+ for

permhash()

Key APIs:

crx.is_crx

crx.permissions

crx.permhash()

Red flags:

nativeMessaging

downloads

debugger

permission, content scripts on

<all_urls>

yara

import "crx"

rule SUSP_CRX_HighRiskPerms {
    condition:
        crx.is_crx and
        for any perm in crx.permissions : (perm == "debugger")
}

See crx-module.md for complete API reference, permission risk assessment, and example rules.

crx

模块可用于检测恶意Chrome扩展。需要YARA-X v1.5.0+（基础功能），v1.11.0+支持

permhash()

。

核心API：

crx.is_crx

、

crx.permissions

、

crx.permhash()

危险信号：

nativeMessaging

downloads

、

debugger

权限、针对

<all_urls>

的内容脚本

yara

import "crx"

rule SUSP_CRX_HighRiskPerms {
    condition:
        crx.is_crx and
        for any perm in crx.permissions : (perm == "debugger")
}

完整API参考、权限风险评估和示例规则请查看crx-module.md。

Android DEX Analysis (dex module)

Android DEX分析（dex模块）

The

dex

module enables detection of Android malware. Requires YARA-X v1.11.0+. Not compatible with legacy YARA's dex module — API is completely different.

Key APIs:

dex.is_dex

dex.contains_class()

dex.contains_method()

dex.contains_string()

Red flags: Single-letter class names (obfuscation),

DexClassLoader

reflection, encrypted assets

yara

import "dex"

rule SUSP_DEX_DynamicLoading {
    condition:
        dex.is_dex and
        dex.contains_class("Ldalvik/system/DexClassLoader;")
}

See dex-module.md for complete API reference, obfuscation detection, and example rules.

dex

模块可用于检测Android恶意软件。需要YARA-X v1.11.0+。与旧版YARA的dex模块不兼容 —— API完全不同。

核心API：

dex.is_dex

、

dex.contains_class()

、

dex.contains_method()

、

dex.contains_string()

危险信号： 单字母类名（混淆）、

DexClassLoader

反射、加密资源

yara

import "dex"

rule SUSP_DEX_DynamicLoading {
    condition:
        dex.is_dex and
        dex.contains_class("Ldalvik/system/DexClassLoader;")
}

完整API参考、混淆检测和示例规则请查看dex-module.md。

Migrating from Legacy YARA

从旧版YARA迁移

YARA-X has 99% rule compatibility, but enforces stricter validation.

Quick migration:

bash

yr check --relaxed-re-syntax rules/  # Identify issues

YARA-X的规则兼容性达99%，但会执行更严格的验证。

快速迁移：

bash

yr check --relaxed-re-syntax rules/  # 识别问题

Fix each issue, then:

修复每个问题后，运行：

yr check rules/ # Verify without relaxed mode


**Common fixes:**
| Issue | Legacy | YARA-X Fix |
|-------|--------|------------|
| Literal `{` in regex | `/{/` | `/\{/` |
| Invalid escapes | `\R` silently literal | `\\R` or `R` |
| Base64 strings | Any length | 3+ chars required |
| Negative indexing | `@a[-1]` | `@a[#a - 1]` |
| Duplicate modifiers | Allowed | Remove duplicates |

> **Note:** Use `--relaxed-re-syntax` only as a diagnostic tool. Fix issues rather than relying on relaxed mode.

yr check rules/ # 在不使用宽松模式的情况下验证


**常见修复：**
| 问题 | 旧版YARA | YARA-X修复方案 |
|-------|--------|------------|
| 正则表达式中的字面量`{` | `/{/` | `/\{/` |
| 无效转义 | `\R`被当作字面量 | `\\R` 或 `R` |
| Base64字符串 | 任意长度 | 要求至少3个字符 |
| 负索引 | `@a[-1]` | `@a[#a - 1]` |
| 重复修饰符 | 允许 | 移除重复的修饰符 |

> **注意：** 仅将`--relaxed-re-syntax`用作诊断工具。请修复问题而非依赖宽松模式。

Quick Reference

快速参考

Naming Convention

命名规范

{CATEGORY}_{PLATFORM}_{FAMILY}_{VARIANT}_{DATE}

Common prefixes:

MAL_

(malware),

HKTL_

(hacking tool),

WEBSHELL_

EXPL_

SUSP_

(suspicious),

GEN_

(generic)

Platforms:

Win_

Lnx_

Mac_

Android_

CRX_

Example:

MAL_Win_Emotet_Loader_Jan25

See style-guide.md for full conventions, metadata requirements, and naming examples.

{CATEGORY}_{PLATFORM}_{FAMILY}_{VARIANT}_{DATE}

常见前缀：

MAL_

（恶意软件）、

HKTL_

（黑客工具）、

WEBSHELL_

、

EXPL_

（漏洞利用）、

SUSP_

（可疑）、

GEN_

（通用）

平台：

Win_

、

Lnx_

、

Mac_

、

Android_

、

CRX_

示例：

MAL_Win_Emotet_Loader_Jan25

完整规范、元数据要求和命名示例请查看style-guide.md。

Required Metadata

必填元数据

Every rule needs:

description

(starts with "Detects"),

author

reference

date

yara

meta:
    description = "Detects Example malware via unique mutex and C2 path"
    author = "Your Name <email@example.com>"
    reference = "https://example.com/analysis"
    date = "2025-01-29"

每个规则都需要：

description

（以“Detects”开头）、

author

、

reference

、

date

。

yara

meta:
    description = "Detects Example malware via unique mutex and C2 path"
    author = "Your Name <email@example.com>"
    reference = "https://example.com/analysis"
    date = "2025-01-29"

String Selection

字符串选择

Good: Mutex names, PDB paths, C2 paths, stack strings, configuration markers Bad: API names, common executables, format specifiers, generic paths

See strings.md for the full decision tree and examples.

优质： 互斥体名称、PDB路径、C2路径、栈字符串、配置标记 劣质： API名称、常见可执行文件、格式说明符、通用路径

完整决策树和示例请查看strings.md。

Condition Patterns

条件模式

Order conditions for short-circuit:

```
filesize < 10MB
```
(instant)
```
uint16(0) == 0x5A4D
```
(nearly instant)
String matches (cheap)
Module checks (expensive)

See performance.md for detailed optimization patterns.

按短路顺序排列条件：

```
filesize < 10MB
```
（即时完成）
```
uint16(0) == 0x5A4D
```
（几乎即时完成）
字符串匹配（低成本）
模块检查（高成本）

详细优化模式请查看performance.md。

Workflow

工作流

Gather samples — Multiple samples; single-sample rules are brittle
Extract candidates —
```
yarGen -m samples/ --excludegood
```
Validate quality — Use decision tree; yarGen needs 80% filtering
Write initial rule — Follow template with proper metadata
Lint and test —
```
yr check
```
,
```
yr fmt
```
, linter script
Goodware validation — VirusTotal corpus or local clean files
Deploy — Add to repo with full metadata, monitor for FPs

See testing.md for detailed validation workflow and FP investigation.

For a comprehensive step-by-step guide covering all phases from sample collection to deployment, see rule-development.md.

收集样本 —— 多个样本；单一样本的规则很脆弱
提取候选字符串 ——
```
yarGen -m samples/ --excludegood
```
验证质量 —— 使用决策树；yarGen的输出需要过滤80%
编写初始规则 —— 遵循模板并添加适当的元数据
检查和测试 ——
```
yr check
```
、
```
yr fmt
```
、检查脚本
良性软件验证 —— VirusTotal语料库或本地干净文件
部署 —— 添加到仓库并附带完整元数据，监控误报

详细验证工作流和误报调查请查看testing.md。

涵盖从样本收集到部署所有阶段的全面分步指南，请查看rule-development.md。

Common Mistakes

常见错误

Mistake	Bad	Good
API names as indicators	`"VirtualAlloc"`	Hex pattern of call site + unique mutex
Unbounded regex	`/https?:\/\/.*/`	`/https?:\/\/[a-z0-9]{8,12}\.onion/`
Missing file type filter	`pe.imports(...)` first	`uint16(0) == 0x5A4D and filesize < 10MB` first
Short strings	`"abc"` (3 bytes)	`"abcdef"` (4+ bytes)
Unescaped braces (YARA-X)	`/config{key}/`	`/config\{key\}/`

错误	错误示例	正确示例
使用API名称作为指标	`"VirtualAlloc"`	调用站点的十六进制模式 + 唯一互斥体
无界正则表达式	`/https?:\/\/.*/`	`/https?:\/\/[a-z0-9]{8,12}\.onion/`
缺少文件类型过滤器	先写 `pe.imports(...)`	先写 `uint16(0) == 0x5A4D and filesize < 10MB`
短字符串	`"abc"` （3字节）	`"abcdef"` （4+字节）
未转义大括号（YARA-X）	`/config{key}/`	`/config\{key\}/`

Performance Optimization

性能优化

Quick wins: Put

filesize

first, avoid

nocase

, bounded regex

{1,100}

, prefer hex over regex.

Red flags: Strings <4 bytes, unbounded regex (

.*

), modules without file-type filter.

See performance.md for atom theory and optimization details.

快速优化： 将

filesize

放在最前面，避免使用

nocase

，使用有界正则表达式

{1,100}

，优先使用十六进制而非正则表达式。

危险信号： 长度不足4字节的字符串、无界正则表达式（

.*

）、未添加文件类型过滤器的模块调用。

原子理论和优化细节请查看performance.md。

Reference Documents

参考文档

Topic	Document
Naming and metadata conventions	style-guide.md
Performance and atom optimization	performance.md
String types and judgment	strings.md
Testing and validation	testing.md
Chrome extension module (crx)	crx-module.md
Android DEX module (dex)	dex-module.md

主题	文档
命名和元数据规范	style-guide.md
性能和原子优化	performance.md
字符串类型和判断	strings.md
测试和验证	testing.md
Chrome扩展模块（crx）	crx-module.md
Android DEX模块（dex）	dex-module.md

Workflows

工作流文档

Topic	Document
Complete rule development process	rule-development.md

主题	文档
完整规则开发流程	rule-development.md

Example Rules

示例规则

The

examples/

directory contains real, attributed rules demonstrating best practices:

Example	Demonstrates	Source
MAL_Win_Remcos_Jan25.yar	PE malware: graduated string counts, multiple rules per family	Elastic Security
MAL_Mac_ProtonRAT_Jan25.yar	macOS: Mach-O magic bytes, multi-category grouping	Airbnb BinaryAlert
MAL_NPM_SupplyChain_Jan25.yar	npm supply chain: real attack patterns, ERC-20 selectors	Stairwell Research
SUSP_JS_Obfuscation_Jan25.yar	JavaScript: obfuscator detection, density-based matching	imp0rtp3, Nils Kuhnert
SUSP_CRX_SuspiciousPermissions.yar	Chrome extensions: crx module, permissions	Educational

examples/

目录包含展示最佳实践的真实、有来源的规则：

示例	演示内容	来源
MAL_Win_Remcos_Jan25.yar	PE恶意软件：分级字符串计数，每个家族对应多个规则	Elastic Security
MAL_Mac_ProtonRAT_Jan25.yar	macOS：Mach-O魔术字节，多类别分组	Airbnb BinaryAlert
MAL_NPM_SupplyChain_Jan25.yar	npm供应链：真实攻击模式，ERC-20选择器	Stairwell Research
SUSP_JS_Obfuscation_Jan25.yar	JavaScript：混淆器检测，基于密度的匹配	imp0rtp3, Nils Kuhnert
SUSP_CRX_SuspiciousPermissions.yar	Chrome扩展：crx模块，权限检测	教育示例

Scripts

脚本

bash

uv run {baseDir}/scripts/yara_lint.py rule.yar      # Validate style/metadata
uv run {baseDir}/scripts/atom_analyzer.py rule.yar  # Check string quality

See README.md for detailed script documentation.

bash

uv run {baseDir}/scripts/yara_lint.py rule.yar      # 验证样式/元数据
uv run {baseDir}/scripts/atom_analyzer.py rule.yar  # 检查字符串质量

详细脚本文档请查看README.md。

Quality Checklist

质量检查清单

Before deploying any rule:

Name follows

{CATEGORY}_{PLATFORM}_{FAMILY}_{VARIANT}_{DATE}

format

Description starts with "Detects" and explains what/how
All required metadata present (author, reference, date)
Strings are unique (not API names, common paths, or format strings)
All strings have 4+ bytes with good atom potential
Base64 modifier only on strings with 3+ characters
Regex patterns have escaped
```
{
```
and valid escape sequences
Condition starts with cheap checks (filesize, magic bytes)
Rule matches all target samples
Rule produces zero matches on goodware corpus
```
yr check
```
passes with no errors
```
yr fmt --check
```
passes (consistent formatting)
Linter passes with no errors
Peer review completed

部署任何规则前，请确认：

名称遵循

{CATEGORY}_{PLATFORM}_{FAMILY}_{VARIANT}_{DATE}

格式

描述以“Detects”开头并解释检测对象和方式
所有必填元数据齐全（作者、参考、日期）
字符串是独特的（不是API名称、通用路径或格式字符串）
所有字符串长度≥4字节且具有良好的原子生成潜力
仅对长度≥3的字符串使用Base64修饰符
正则表达式已转义
```
{
```
且转义序列有效
条件以低成本检查开头（文件大小、魔术字节）
规则匹配所有目标样本
规则在良性软件语料库上无匹配
```
yr check
```
通过且无错误
```
yr fmt --check
```
通过（格式一致）
检查脚本通过且无错误
已完成同行评审

Resources

资源

Quality YARA Rule Repositories

优质YARA规则仓库

Learn from production rules. These repositories contain well-tested, properly attributed rules:

Repository	Focus	Maintainer
Neo23x0/signature-base	17,000+ production rules, multi-platform	Florian Roth
Elastic/protections-artifacts	1,000+ endpoint-tested rules	Elastic Security
reversinglabs/reversinglabs-yara-rules	Threat research rules	ReversingLabs
imp0rtp3/js-yara-rules	JavaScript/browser malware	imp0rtp3
InQuest/awesome-yara	Curated index of resources	InQuest

向生产环境的规则学习。这些仓库包含经过充分测试、来源明确的规则：

仓库	重点	维护者
Neo23x0/signature-base	17000+生产规则，多平台	Florian Roth
Elastic/protections-artifacts	1000+经过端点测试的规则	Elastic Security
reversinglabs/reversinglabs-yara-rules	威胁研究规则	ReversingLabs
imp0rtp3/js-yara-rules	JavaScript/浏览器恶意软件	imp0rtp3
InQuest/awesome-yara	精选资源索引	InQuest

Style & Performance Guides

样式与性能指南

Guide	Purpose
YARA Style Guide	Naming conventions, metadata, string prefixes
YARA Performance Guidelines	Atom optimization, regex bounds
Kaspersky Applied YARA Training	Expert techniques from production use

指南	用途
YARA Style Guide	命名规范、元数据、字符串前缀
YARA Performance Guidelines	原子优化、正则表达式边界
Kaspersky Applied YARA Training	生产环境的专家技术

Tools

工具

Tool	Purpose
yarGen	Extract candidate strings from samples
FLOSS	Extract obfuscated and stack strings
YARA-CI	Automated goodware testing
YaraDbg	Web-based rule debugger

工具	用途
yarGen	从样本中提取候选字符串
FLOSS	提取混淆和栈字符串
YARA-CI	自动化良性软件测试
YaraDbg	基于Web的规则调试器

macOS-Specific Resources

macOS专属资源

Resource	Purpose
Apple XProtect	Production macOS rules at `/System/Library/CoreServices/XProtect.bundle/`
objective-see	macOS malware research and samples
macOS Security Tools	Reference list

资源	用途
Apple XProtect	生产环境的macOS规则位于 `/System/Library/CoreServices/XProtect.bundle/`
objective-see	macOS恶意软件研究和样本
macOS Security Tools	参考列表

Multi-Indicator Clustering Pattern

多指标聚类模式

Production rules often group indicators by type:

yara

strings:
    // Category A: Library indicators
    $a1 = "SRWebSocket" ascii
    $a2 = "SocketRocket" ascii

    // Category B: Behavioral indicators
    $b1 = "SSH tunnel" ascii
    $b2 = "keylogger" ascii nocase

    // Category C: C2 patterns
    $c1 = /https:\/\/[a-z0-9]{8,16}\.onion/

condition:
    filesize < 10MB and
    any of ($a*) and any of ($b*)  // Require evidence from BOTH categories

Why this works: Different indicator types have different confidence levels. A single C2 domain might be definitive, while you need multiple library imports to be confident. Grouping by

$a*

$b*

$c*

lets you express graduated requirements.

生产环境的规则通常按类型分组指标：

yara

strings:
    // 类别A：库指标
    $a1 = "SRWebSocket" ascii
    $a2 = "SocketRocket" ascii

    // 类别B：行为指标
    $b1 = "SSH tunnel" ascii
    $b2 = "keylogger" ascii nocase

    // 类别C：C2模式
    $c1 = /https:\/\/[a-z0-9]{8,16}\.onion/

condition:
    filesize < 10MB and
    any of ($a*) and any of ($b*)  // 需要来自两个类别的证据

为何有效： 不同类型的指标具有不同的置信度。单个C2域名可能具有决定性，而您需要多个库导入才能确定。按

$a*

、

$b*

、

$c*

分组可以让您表达分级的要求。