yara-rule-authoring
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseYARA-X Rule Authoring
YARA-X规则编写指南
Write detection rules that catch malware without drowning in false positives.
This skill targets YARA-X, the Rust-based successor to legacy YARA. YARA-X powers VirusTotal's production systems and is the recommended implementation. See Migrating from Legacy YARA if you have existing rules.
编写能够精准检测恶意软件且不会产生大量误报的检测规则。
本指南针对YARA-X,它是基于Rust开发的旧版YARA的替代方案。YARA-X为VirusTotal的生产系统提供支持,是推荐使用的版本。如果您已有旧版规则,请查看从旧版YARA迁移章节。
Core Principles
核心原则
-
Strings must generate good atoms — YARA extracts 4-byte subsequences for fast matching. Strings with repeated bytes, common sequences, or under 4 bytes force slow bytecode verification on too many files.
-
Target specific families, not categories — "Detects ransomware" catches everything and nothing. "Detects LockBit 3.0 configuration extraction routine" catches what you want.
-
Test against goodware before deployment — A rule that fires on Windows system files is useless. Validate against VirusTotal's goodware corpus or your own clean file set.
-
Short-circuit with cheap checks first — Putbefore expensive string searches or module calls.
filesize < 10MB and uint16(0) == 0x5A4D -
Metadata is documentation — Future you (and your team) need to know what this catches, why, and where the sample came from.
-
字符串必须生成优质原子 —— YARA会提取4字节子序列以实现快速匹配。包含重复字节、常见序列或长度不足4字节的字符串会导致对过多文件执行缓慢的字节码验证。
-
针对特定家族,而非类别 —— “检测勒索软件”这类规则范围过宽,毫无针对性。“检测LockBit 3.0配置提取例程”才能精准命中目标。
-
部署前针对良性软件测试 —— 触发Windows系统文件的规则毫无用处。请通过VirusTotal的良性软件语料库或您自己的干净文件集验证规则。
-
先执行低成本检查以短路后续逻辑 —— 在昂贵的字符串搜索或模块调用前,先添加这类条件。
filesize < 10MB and uint16(0) == 0x5A4D -
元数据即文档 —— 未来的您(以及团队成员)需要了解该规则检测的对象、原因以及样本来源。
When to Use
适用场景
- Writing new YARA-X rules for malware detection
- Reviewing existing rules for quality or performance issues
- Optimizing slow-running rulesets
- Converting IOCs or threat intel into detection signatures
- Debugging false positive issues
- Preparing rules for production deployment
- Migrating legacy YARA rules to YARA-X
- Analyzing Chrome extensions (crx module)
- Analyzing Android apps (dex module)
- 为恶意软件检测编写新的YARA-X规则
- 审核现有规则的质量或性能问题
- 优化运行缓慢的规则集
- 将IOC或威胁情报转换为检测签名
- 调试误报问题
- 为生产部署准备规则
- 将旧版YARA规则迁移至YARA-X
- 分析Chrome扩展程序(crx模块)
- 分析Android应用(dex模块)
When NOT to Use
不适用场景
- Static analysis requiring disassembly → use Ghidra/IDA skills
- Dynamic malware analysis → use sandbox analysis skills
- Network-based detection → use Suricata/Snort skills
- Memory forensics with Volatility → use memory forensics skills
- Simple hash-based detection → just use hash lists
- 需要反汇编的静态分析 → 使用Ghidra/IDA相关技能
- 恶意软件动态分析 → 使用沙箱分析技能
- 基于网络的检测 → 使用Suricata/Snort技能
- 使用Volatility进行内存取证 → 使用内存取证技能
- 简单的基于哈希的检测 → 直接使用哈希列表即可
YARA-X Overview
YARA-X概述
YARA-X is the Rust-based successor to legacy YARA: 5-10x faster regex, better errors, built-in formatter, stricter validation, new modules (crx, dex), 99% rule compatibility.
Install: (macOS) or
brew install yara-xcargo install yara-xEssential commands: , , ,
yr scanyr checkyr fmtyr dumpYARA-X是基于Rust开发的旧版YARA的替代方案:正则表达式速度提升5-10倍,错误提示更友好,内置格式化工具,验证更严格,新增crx、dex等模块,规则兼容性达99%。
安装方式: (macOS)或
brew install yara-xcargo install yara-x核心命令: 、、、
yr scanyr checkyr fmtyr dumpPlatform Considerations
平台注意事项
YARA works on any file type. Adapt patterns to your target:
| Platform | Magic Bytes | Bad Strings | Good Strings |
|---|---|---|---|
| Windows PE | | API names, Windows paths | Mutex names, PDB paths |
| macOS Mach-O | | Common Obj-C methods | Keylogger strings, persistence paths |
| JavaScript/Node | (none needed) | | Obfuscator signatures, eval+decode chains |
| npm/pip packages | (none needed) | | Suspicious package names, exfil URLs |
| Office docs | | VBA keywords | Macro auto-exec, encoded payloads |
| VS Code extensions | (none needed) | | Uncommon activationEvents, hidden file access |
| Chrome extensions | Use | Common Chrome APIs | Permission abuse, manifest anomalies |
| Android apps | Use | Standard DEX structure | Obfuscated classes, suspicious permissions |
YARA适用于任何文件类型。请根据目标平台调整规则模式:
| 平台 | 魔术字节 | 不推荐的字符串 | 推荐的字符串 |
|---|---|---|---|
| Windows PE | | API名称、Windows路径 | 互斥体名称、PDB路径 |
| macOS Mach-O | | 常见Obj-C方法 | 键盘记录器字符串、持久化路径 |
| JavaScript/Node | (无需) | | 混淆器签名、eval+解码链 |
| npm/pip包 | (无需) | | 可疑包名称、数据外渗URL |
| Office文档 | | VBA关键字 | 宏自动执行、编码载荷 |
| VS Code扩展 | (无需) | | 不常见的activationEvents、隐藏文件访问 |
| Chrome扩展 | 使用 | 常见Chrome API | 权限滥用、清单异常 |
| Android应用 | 使用 | 标准DEX结构 | 混淆类、可疑权限 |
macOS Malware Detection
macOS恶意软件检测
No dedicated Mach-O module exists yet. Use magic byte checks + string patterns:
Magic bytes:
yara
// Mach-O 32-bit
uint32(0) == 0xFEEDFACE
// Mach-O 64-bit
uint32(0) == 0xFEEDFACF
// Universal binary (fat binary)
uint32(0) == 0xCAFEBABE or uint32(0) == 0xBEBAFECAGood indicators for macOS malware:
- Keylogger artifacts: ,
CGEventTapCreatekCGEventKeyDown - SSH tunnel strings: ,
ssh -D,tunnelsocks - Persistence paths: ,
~/Library/LaunchAgents/Library/LaunchDaemons - Credential theft: ,
security find-generic-passwordkeychain
Example pattern from Airbnb BinaryAlert:
yara
rule SUSP_Mac_ProtonRAT
{
strings:
// Library indicators
$lib1 = "SRWebSocket" ascii
$lib2 = "SocketRocket" ascii
// Behavioral indicators
$behav1 = "SSH tunnel not launched" ascii
$behav2 = "Keylogger" ascii
condition:
(uint32(0) == 0xFEEDFACF or uint32(0) == 0xCAFEBABE) and
any of ($lib*) and any of ($behav*)
}目前尚无专门的Mach-O模块。请使用魔术字节检查+字符串模式:
魔术字节:
yara
// Mach-O 32位
uint32(0) == 0xFEEDFACE
// Mach-O 64位
uint32(0) == 0xFEEDFACF
// 通用二进制(胖二进制)
uint32(0) == 0xCAFEBABE or uint32(0) == 0xBEBAFECAmacOS恶意软件的良好检测指标:
- 键盘记录器痕迹:、
CGEventTapCreatekCGEventKeyDown - SSH隧道字符串:、
ssh -D、tunnelsocks - 持久化路径:、
~/Library/LaunchAgents/Library/LaunchDaemons - 凭证窃取:、
security find-generic-passwordkeychain
来自Airbnb BinaryAlert的示例模式:
yara
rule SUSP_Mac_ProtonRAT
{
strings:
// 库指标
$lib1 = "SRWebSocket" ascii
$lib2 = "SocketRocket" ascii
// 行为指标
$behav1 = "SSH tunnel not launched" ascii
$behav2 = "Keylogger" ascii
condition:
(uint32(0) == 0xFEEDFACF or uint32(0) == 0xCAFEBABE) and
any of ($lib*) and any of ($behav*)
}JavaScript Detection Decision Tree
JavaScript检测决策树
Writing a JavaScript rule?
├─ npm package?
│ ├─ Check package.json patterns
│ ├─ Look for postinstall/preinstall hooks
│ └─ Target exfil patterns: fetch + env access + credential paths
├─ Browser extension?
│ ├─ Chrome: Use crx module
│ └─ Others: Target manifest patterns, background script behaviors
├─ Standalone JS file?
│ ├─ Look for obfuscation markers: eval+atob, fromCharCode chains
│ ├─ Target unique function/variable names (often survive minification)
│ └─ Check for packed/encoded payloads
└─ Minified/webpack bundle?
├─ Target unique strings that survive bundling (URLs, magic values)
└─ Avoid function names (will be mangled)JavaScript-specific good strings:
- Ethereum function selectors: (transfer)
{ 70 a0 82 31 } - Zero-width characters (steganography):
{ E2 80 8B E2 80 8C } - Obfuscator signatures: ,
_0xvar _0x - Specific C2 patterns: domain names, webhook URLs
JavaScript-specific bad strings:
- ,
require,fetch— too commonaxios - ,
Buffer— legitimate uses everywherecrypto - alone — need specific env var names
process.env
编写JavaScript规则?
├─ npm包?
│ ├─ 检查package.json模式
│ ├─ 查找postinstall/preinstall钩子
│ └─ 针对数据外渗模式:fetch + 环境变量访问 + 凭证路径
├─ 浏览器扩展?
│ ├─ Chrome:使用crx模块
│ └─ 其他:针对清单模式、后台脚本行为
├─ 独立JS文件?
│ ├─ 查找混淆标记:eval+atob、fromCharCode链
│ ├─ 针对唯一的函数/变量名称(通常在压缩后仍保留)
│ └─ 检查打包/编码的载荷
└─ 压缩/webpack打包文件?
├─ 针对打包后仍保留的唯一字符串(URL、魔术值)
└─ 避免使用函数名称(会被混淆)JavaScript专属的优质字符串:
- Ethereum函数选择器:(转账)
{ 70 a0 82 31 } - 零宽字符(隐写术):
{ E2 80 8B E2 80 8C } - 混淆器签名:、
_0xvar _0x - 特定C2模式:域名、Webhook URL
JavaScript专属的不推荐字符串:
- 、
require、fetch—— 过于常见axios - 、
Buffer—— 合法场景广泛使用crypto - 单独的—— 需要结合具体环境变量名称
process.env
Essential Toolkit
核心工具集
| Tool | Purpose |
|---|---|
| yarGen | Extract candidate strings: |
| FLOSS | Extract obfuscated/stack strings: |
| yr CLI | Validate: |
| signature-base | Study quality examples |
| YARA-CI | Goodware corpus testing before deployment |
Master these five. Don't get distracted by tool catalogs.
| 工具 | 用途 |
|---|---|
| yarGen | 提取候选字符串: |
| FLOSS | 提取混淆/栈字符串: |
| yr CLI | 验证: |
| signature-base | 学习优质示例 |
| YARA-CI | 部署前的良性软件语料库测试 |
掌握这五个工具即可,无需被繁多的工具目录分散注意力。
Rationalizations to Reject
需要摒弃的错误观念
When you catch yourself thinking these, stop and reconsider.
| Rationalization | Expert Response |
|---|---|
| "This generic string is unique enough" | Test against goodware first. Your intuition is wrong. |
| "yarGen gave me these strings" | yarGen suggests, you validate. Check each one manually. |
| "It works on my 10 samples" | 10 samples ≠ production. Use VirusTotal goodware corpus. |
| "One rule to catch all variants" | Causes FP floods. Target specific families. |
| "I'll make it more specific if we get FPs" | Write tight rules upfront. FPs burn trust. |
| "This hex pattern is unique" | Unique in one sample ≠ unique across malware ecosystem. |
| "Performance doesn't matter" | One slow rule slows entire ruleset. Optimize atoms. |
| "PEiD rules still work" | Obsolete. 32-bit packers aren't relevant. |
| "I'll add more conditions later" | Weak rules deployed = damage done. |
| "This is just for hunting" | Hunting rules become detection rules. Same quality bar. |
| "The API name makes it malicious" | Legitimate software uses same APIs. Need behavioral context. |
| "any of them is fine for these common strings" | Common strings + any = FP flood. Use |
| "This regex is specific enough" | |
| "The JavaScript looks clean" | Attackers poison legitimate code with injects. Check for eval+decode chains. |
| "I'll use .* for flexibility" | Unbounded regex = performance disaster + memory explosion. Use |
| "I'll use --relaxed-re-syntax everywhere" | Masks real bugs. Fix the regex instead of hiding problems. |
当您出现以下想法时,请立即停止并重新考虑:
| 错误观念 | 专家回应 |
|---|---|
| “这个通用字符串足够独特” | 先针对良性软件测试。您的直觉是错误的。 |
| “yarGen给了我这些字符串” | yarGen仅提供建议,您需要手动验证每个字符串。 |
| “它在我的10个样本上有效” | 10个样本≠生产环境。请使用VirusTotal的良性软件语料库。 |
| “一个规则就能检测所有变体” | 会导致大量误报。请针对特定家族编写规则。 |
| “如果出现误报我再优化” | 从一开始就编写严格的规则。误报会消耗信任。 |
| “这个十六进制模式是独特的” | 在一个样本中独特≠在整个恶意软件生态中独特。 |
| “性能不重要” | 一个慢规则会拖慢整个规则集。请优化原子。 |
| “PEiD规则仍然有效” | 已过时。32位打包器已不再相关。 |
| “我稍后再添加更多条件” | 部署弱规则=已造成损害。 |
| “这只是用于狩猎” | 狩猎规则最终会变成检测规则。需遵循相同的质量标准。 |
| “这个API名称表明它是恶意的” | 合法软件也会使用相同的API。需要结合行为上下文。 |
| “对于这些常见字符串,any of就足够了” | 常见字符串+any of=大量误报。仅对本身独特的字符串使用 |
| “这个正则表达式足够具体” | |
| “这段JavaScript看起来很干净” | 攻击者会在合法代码中注入恶意内容。请检查eval+解码链。 |
| “我会用.*来增加灵活性” | 无界正则表达式=性能灾难+内存爆炸。请使用 |
| “我会在所有地方使用--relaxed-re-syntax” | 会掩盖真正的漏洞。请修复正则表达式而非隐藏问题。 |
Decision Trees
决策树
Is This String Good Enough?
这个字符串是否足够优质?
Is this string good enough?
├─ Less than 4 bytes?
│ └─ NO — find longer string
├─ Contains repeated bytes (0000, 9090)?
│ └─ NO — add surrounding context
├─ Is an API name (VirtualAlloc, CreateRemoteThread)?
│ └─ NO — use hex pattern of call site instead
├─ Appears in Windows system files?
│ └─ NO — too generic, find something unique
├─ Is it a common path (C:\Windows\, cmd.exe)?
│ └─ NO — find malware-specific paths
├─ Unique to this malware family?
│ └─ YES — use it
└─ Appears in other malware too?
└─ MAYBE — combine with family-specific marker这个字符串是否足够优质?
├─ 长度不足4字节?
│ └─ 否 —— 寻找更长的字符串
├─ 包含重复字节(如0000、9090)?
│ └─ 否 —— 添加上下文信息
├─ 是API名称(如VirtualAlloc、CreateRemoteThread)?
│ └─ 否 —— 使用调用站点的十六进制模式替代
├─ 出现在Windows系统文件中?
│ └─ 否 —— 过于通用,寻找独特的内容
├─ 是常见路径(如C:\Windows\、cmd.exe)?
│ └─ 否 —— 寻找恶意软件专属路径
├─ 仅属于该恶意软件家族?
│ └─ 是 —— 使用该字符串
└─ 也出现在其他恶意软件中?
└─ 可能 —— 结合家族专属标记使用When to Use "all of" vs "any of"
何时使用“all of” vs “any of”
Should I require all strings or allow any?
├─ Strings are individually unique to malware?
│ └─ any of them (each alone is suspicious)
├─ Strings are common but combination is suspicious?
│ └─ all of them (require the full pattern)
├─ Strings have different confidence levels?
│ └─ Group: all of ($core_*) and any of ($variant_*)
└─ Seeing many false positives?
└─ Tighten: switch any → all, add more required stringsLesson from production: Rules using where strings included "fetch", "axios", "http" matched virtually all web applications. Switching to require credential path AND network call AND exfil destination eliminated FPs.
any of ($network_*)我应该要求匹配所有字符串还是允许匹配任意一个?
├─ 字符串本身对恶意软件来说是独特的?
│ └─ any of them(每个字符串单独出现就可疑)
├─ 字符串单独常见,但组合起来可疑?
│ └─ all of them(需要完整的模式)
├─ 字符串的置信度不同?
│ └─ 分组:all of ($core_*) and any of ($variant_*)
└─ 出现大量误报?
└─ 收紧规则:将any改为all,添加更多必填字符串生产经验教训: 使用且字符串包含“fetch”、“axios”、“http”的规则几乎匹配了所有Web应用。改为要求凭证路径+网络调用+数据外渗目的地后,误报完全消除。
any of ($network_*)When to Abandon a Rule Approach
何时放弃某一规则方案
Stop and pivot when:
-
yarGen returns only API names and paths → See When Strings Fail, Pivot to Structure
-
Can't find 3 unique strings → Probably packed. Target the unpacked version or detect the packer.
-
Rule matches goodware files → Strings aren't unique enough. 1-2 matches = investigate and tighten; 3-5 matches = find different indicators; 6+ matches = start over.
-
Performance is terrible even after optimization → Architecture problem. Split into multiple focused rules or add strict pre-filters.
-
Description is hard to write → The rule is too vague. If you can't explain what it catches, it catches too much.
出现以下情况时,请停止并转向其他方案:
-
yarGen仅返回API名称和路径 → 查看当字符串失效时,转向结构检测
-
无法找到3个独特字符串 → 样本可能被打包。请针对解包后的版本或检测打包器本身。
-
规则匹配良性软件文件 → 字符串不够独特。1-2个匹配=调查并收紧规则;3-5个匹配=寻找其他指标;6个以上匹配=重新开始。
-
即使优化后性能仍然很差 → 架构问题。拆分为多个聚焦的规则或添加严格的前置过滤器。
-
规则描述难以撰写 → 规则过于模糊。如果您无法解释它检测的内容,说明它的范围太广。
Debugging False Positives
调试误报
FP Investigation Flow:
│
├─ 1. Which string matched?
│ Run: yr scan -s rule.yar false_positive.exe
│
├─ 2. Is it in a legitimate library?
│ └─ Add: not $fp_vendor_string exclusion
│
├─ 3. Is it a common development pattern?
│ └─ Find more specific indicator, replace the string
│
├─ 4. Are multiple generic strings matching together?
│ └─ Tighten to require all + add unique marker
│
└─ 5. Is the malware using common techniques?
└─ Target malware-specific implementation details, not the technique误报调查流程:
│
├─ 1. 哪个字符串匹配了?
│ 运行:yr scan -s rule.yar false_positive.exe
│
├─ 2. 它是否来自合法库?
│ └─ 添加:not $fp_vendor_string 排除规则
│
├─ 3. 它是否是常见的开发模式?
│ └─ 寻找更具体的指标,替换该字符串
│
├─ 4. 多个通用字符串同时匹配?
│ └─ 收紧规则:要求匹配所有字符串 + 添加独特标记
│
└─ 5. 恶意软件使用了常见技术?
└─ 针对恶意软件专属的实现细节,而非技术本身Hex vs Text vs Regex
十六进制、文本与正则表达式的选择
What string type should I use?
│
├─ Exact ASCII/Unicode text?
│ └─ TEXT: $s = "MutexName" ascii wide
│
├─ Specific byte sequence?
│ └─ HEX: $h = { 4D 5A 90 00 }
│
├─ Byte sequence with variation?
│ └─ HEX with wildcards: { 4D 5A ?? ?? 50 45 }
│
├─ Pattern with structure (URLs, paths)?
│ └─ BOUNDED REGEX: /https:\/\/[a-z]{5,20}\.onion/
│
└─ Unknown encoding (XOR, base64)?
└─ TEXT with modifier: $s = "config" xor(0x00-0xFF)我应该使用哪种字符串类型?
│
├─ 精确的ASCII/Unicode文本?
│ └─ TEXT: $s = "MutexName" ascii wide
│
├─ 特定字节序列?
│ └─ HEX: $h = { 4D 5A 90 00 }
│
├─ 带有变体的字节序列?
│ └─ 带通配符的HEX: { 4D 5A ?? ?? 50 45 }
│
├─ 带有结构的模式(URL、路径)?
│ └─ 有界正则表达式: /https:\/\/[a-z]{5,20}\.onion/
│
└─ 未知编码(XOR、base64)?
└─ 带修饰符的TEXT: $s = "config" xor(0x00-0xFF)Is the Sample Packed? (Check First)
样本是否被打包?(先检查)
Before writing any string-based rule:
Is the sample packed?
├─ Entropy > 7.0?
│ └─ Likely packed — find unpacked layer first
├─ Few/no readable strings?
│ └─ Likely packed — use entropy, PE structure, or packer signatures
├─ UPX/MPRESS/custom packer detected?
│ └─ Target the unpacked payload OR detect the packer itself
└─ Readable strings available?
└─ Proceed with string-based detectionExpert guidance: Don't write rules against packed layers. The packing changes; the payload doesn't.
在编写任何基于字符串的规则前:
样本是否被打包?
├─ 熵值>7.0?
│ └─ 可能被打包 —— 先找到解包后的层
├─ 几乎没有可读字符串?
│ └─ 可能被打包 —— 使用熵值、PE结构或打包器签名检测
├─ 检测到UPX/MPRESS/自定义打包器?
│ └─ 针对解包后的载荷 或 检测打包器本身
└─ 存在可读字符串?
└─ 继续使用基于字符串的检测专家建议: 不要针对打包层编写规则。打包方式会变化,但载荷不会。
When Strings Fail, Pivot to Structure
当字符串失效时,转向结构检测
If yarGen returns only API names and generic paths:
String extraction failed — what now?
├─ High entropy sections?
│ └─ Use math.entropy() on specific sections
├─ Unusual imports pattern?
│ └─ Use pe.imphash() for import hash clustering
├─ Consistent PE structure anomalies?
│ └─ Target section names, sizes, characteristics
├─ Metadata present?
│ └─ Target version info, timestamps, resources
└─ Nothing unique?
└─ This sample may not be detectable with YARA aloneExpert guidance: "One can try to use other file properties, such as metadata, entropy, import hashes or other data which stays constant." — Kaspersky Applied YARA Training
如果yarGen仅返回API名称和通用路径:
字符串提取失败 —— 现在该怎么办?
├─ 高熵区段?
│ └─ 对特定区段使用math.entropy()
├─ 异常的导入模式?
│ └─ 使用pe.imphash()进行导入哈希聚类
├─ 一致的PE结构异常?
│ └─ 针对区段名称、大小、特征
├─ 存在元数据?
│ └─ 针对版本信息、时间戳、资源
└─ 没有独特内容?
└─ 该样本可能无法仅通过YARA检测专家建议: “可以尝试使用其他文件属性,如元数据、熵值、导入哈希或其他保持不变的数据。” —— 卡巴斯基YARA应用培训
Expert Heuristics
专家启发式规则
String selection: Mutex names are gold; C2 paths silver; error messages bronze. Stack strings are almost always unique. If you need >6 strings, you're over-fitting.
Condition design: Start with , then magic bytes, then strings, then modules. If >5 lines, split into multiple rules.
filesize <Quality signals: yarGen output needs 80% filtering. Rules matching <50% of variants are too narrow; matching goodware are too broad.
Modifier discipline:
- Never use or
nocasespeculatively — only when you have confirmed evidence the case/encoding varies in sampleswide - doubles atom generation;
nocasedoubles string matching — both have real costswide - "If you don't have a clear reason for using those modifiers, don't do it" — Kaspersky Applied YARA
Regex anchoring:
- Regex without a 4+ byte literal substring evaluates at every file offset — catastrophic performance
- Always anchor regex to a distinctive literal: not
/mshta\.exe http:\/\/...//http:\/\/.../ - If you can't anchor, consider hex pattern with wildcards instead
Loop discipline:
- Always bound loops with filesize:
filesize < 100KB and for all i in (1..#a) : ... - Unbounded can be thousands in large files — exponential slowdown
#a
YARA-X tips: to suppress warnings; to hide from output; + before every commit.
$_unusedprivate $syr checkyr fmt字符串选择: 互斥体名称是黄金指标;C2路径是白银指标;错误信息是青铜指标。栈字符串几乎总是独特的。如果您需要超过6个字符串,说明您过度拟合了。
条件设计: 以开头,然后是魔术字节,接着是字符串,最后是模块调用。如果条件超过5行,拆分为多个规则。
filesize <质量信号: yarGen的输出需要过滤掉80%的内容。匹配变体不足50%的规则过于狭窄;匹配良性软件的规则过于宽泛。
修饰符规范:
- 切勿随意使用或
nocase—— 仅当您有确凿证据表明样本中的大小写/编码存在变化时才使用wide - 会使原子生成量翻倍;
nocase会使字符串匹配量翻倍 —— 两者都会带来实际性能损耗wide - “如果您没有明确的理由使用这些修饰符,就不要使用” —— 卡巴斯基YARA应用培训
正则表达式锚定:
- 不包含4字节以上字面量子串的正则表达式 会在文件的每个偏移位置进行评估 —— 性能灾难
- 始终将正则表达式锚定到独特的字面量:而非
/mshta\.exe http:\/\/...//http:\/\/.../ - 如果无法锚定,考虑使用带通配符的十六进制模式替代
循环规范:
- 始终使用文件大小限制循环:
filesize < 100KB and for all i in (1..#a) : ... - 无界的在大文件中可能达到数千次 —— 指数级性能下降
#a
YARA-X技巧: 使用抑制警告;使用隐藏输出;每次提交前运行 + 。
$_unusedprivate $syr checkyr fmtWhen to Use Modules vs. Byte Checks
何时使用模块 vs 字节检查
Should I use a module or raw bytes?
├─ Need imphash/rich header/authenticode?
│ └─ Use PE module — too complex to replicate
├─ Just checking magic bytes or simple offsets?
│ └─ Use uint16/uint32 — faster, no module overhead
├─ Checking section names/sizes?
│ └─ PE module is cleaner, but add magic bytes filter FIRST
├─ Checking Chrome extension permissions?
│ └─ Use crx module — string parsing is fragile
└─ Checking LNK target paths?
└─ Use lnk module — LNK format is complexExpert guidance: "Avoid the magic module — use explicit hex checks instead" — Neo23x0. Apply this principle: if you can do it with uint32(), don't load a module.
我应该使用模块还是原始字节?
├─ 需要imphash/富头/验证码?
│ └─ 使用PE模块 —— 复制实现过于复杂
├─ 仅检查魔术字节或简单偏移?
│ └─ 使用uint16/uint32 —— 速度更快,无模块开销
├─ 检查区段名称/大小?
│ └─ PE模块更简洁,但先添加魔术字节过滤器
├─ 检查Chrome扩展权限?
│ └─ 使用crx模块 —— 字符串解析不可靠
└─ 检查LNK目标路径?
└─ 使用lnk模块 —— LNK格式复杂专家建议: “避免使用magic模块 —— 改用显式十六进制检查” —— Neo23x0。遵循此原则:如果可以用uint32()实现,就不要加载模块。
YARA-X New Features
YARA-X新功能
Key additions from recent releases:
- Private patterns (v1.3.0+): — matches but hidden from output
private $helper = "pattern" - Warning suppression (v1.4.0+): inline comments
// suppress: slow_pattern - Numeric underscores (v1.5.0+): for readability
filesize < 10_000_000 - Built-in formatter: to standardize formatting
yr fmt rules/ - NDJSON output: for tooling
yr scan --output-format ndjson
近期版本的关键新增功能:
- 私有模式(v1.3.0+):—— 会匹配但不会显示在输出中
private $helper = "pattern" - 警告抑制(v1.4.0+):行内注释
// suppress: slow_pattern - 数字下划线(v1.5.0+):提升可读性
filesize < 10_000_000 - 内置格式化工具:标准化格式
yr fmt rules/ - NDJSON输出:便于工具集成
yr scan --output-format ndjson
YARA-X Tooling Workflow
YARA-X工具工作流
YARA-X provides diagnostic tools legacy YARA lacks:
Rule development cycle:
bash
undefinedYARA-X提供了旧版YARA没有的诊断工具:
规则开发周期:
bash
undefined1. Write initial rule
1. 编写初始规则
2. Check syntax with detailed errors
2. 检查语法并查看详细错误
yr check rule.yar
yr check rule.yar
3. Format consistently
3. 统一格式化
yr fmt -w rule.yar
yr fmt -w rule.yar
4. Dump module output to inspect file structure (no dummy rule needed)
4. 导出模块输出以检查文件结构(无需虚拟规则)
yr dump -m pe sample.exe --output-format yaml
yr dump -m pe sample.exe --output-format yaml
5. Scan with timing info
5. 扫描并查看计时信息
time yr scan -s rule.yar corpus/
**When to use `yr dump`:**
- Investigating what PE/ELF/Mach-O fields are available
- Debugging why module conditions aren't matching
- Exploring new modules (crx, lnk, dotnet) before writing rules
**YARA-X diagnostic advantage:** Error messages include precise source locations. If `yr check` points to line 15, the issue is actually on line 15 (unlike legacy YARA).time yr scan -s rule.yar corpus/
**何时使用`yr dump`:**
- 调查PE/ELF/Mach-O可用的字段
- 调试模块条件不匹配的原因
- 在编写规则前探索新模块(crx、lnk、dotnet)
**YARA-X诊断优势:** 错误信息包含精确的源代码位置。如果`yr check`指向第15行,问题确实出在第15行(与旧版YARA不同)。Chrome Extension Analysis (crx module)
Chrome扩展分析(crx模块)
The module enables detection of malicious Chrome extensions. Requires YARA-X v1.5.0+ (basic), v1.11.0+ for .
crxpermhash()Key APIs: , ,
crx.is_crxcrx.permissionscrx.permhash()Red flags: + , permission, content scripts on
nativeMessagingdownloadsdebugger<all_urls>yara
import "crx"
rule SUSP_CRX_HighRiskPerms {
condition:
crx.is_crx and
for any perm in crx.permissions : (perm == "debugger")
}See crx-module.md for complete API reference, permission risk assessment, and example rules.
crxpermhash()核心API: 、、
crx.is_crxcrx.permissionscrx.permhash()危险信号: + 、权限、针对的内容脚本
nativeMessagingdownloadsdebugger<all_urls>yara
import "crx"
rule SUSP_CRX_HighRiskPerms {
condition:
crx.is_crx and
for any perm in crx.permissions : (perm == "debugger")
}完整API参考、权限风险评估和示例规则请查看crx-module.md。
Android DEX Analysis (dex module)
Android DEX分析(dex模块)
The module enables detection of Android malware. Requires YARA-X v1.11.0+. Not compatible with legacy YARA's dex module — API is completely different.
dexKey APIs: , , ,
dex.is_dexdex.contains_class()dex.contains_method()dex.contains_string()Red flags: Single-letter class names (obfuscation), reflection, encrypted assets
DexClassLoaderyara
import "dex"
rule SUSP_DEX_DynamicLoading {
condition:
dex.is_dex and
dex.contains_class("Ldalvik/system/DexClassLoader;")
}See dex-module.md for complete API reference, obfuscation detection, and example rules.
dex核心API: 、、、
dex.is_dexdex.contains_class()dex.contains_method()dex.contains_string()危险信号: 单字母类名(混淆)、反射、加密资源
DexClassLoaderyara
import "dex"
rule SUSP_DEX_DynamicLoading {
condition:
dex.is_dex and
dex.contains_class("Ldalvik/system/DexClassLoader;")
}完整API参考、混淆检测和示例规则请查看dex-module.md。
Migrating from Legacy YARA
从旧版YARA迁移
YARA-X has 99% rule compatibility, but enforces stricter validation.
Quick migration:
bash
yr check --relaxed-re-syntax rules/ # Identify issuesYARA-X的规则兼容性达99%,但会执行更严格的验证。
快速迁移:
bash
yr check --relaxed-re-syntax rules/ # 识别问题Fix each issue, then:
修复每个问题后,运行:
yr check rules/ # Verify without relaxed mode
**Common fixes:**
| Issue | Legacy | YARA-X Fix |
|-------|--------|------------|
| Literal `{` in regex | `/{/` | `/\{/` |
| Invalid escapes | `\R` silently literal | `\\R` or `R` |
| Base64 strings | Any length | 3+ chars required |
| Negative indexing | `@a[-1]` | `@a[#a - 1]` |
| Duplicate modifiers | Allowed | Remove duplicates |
> **Note:** Use `--relaxed-re-syntax` only as a diagnostic tool. Fix issues rather than relying on relaxed mode.yr check rules/ # 在不使用宽松模式的情况下验证
**常见修复:**
| 问题 | 旧版YARA | YARA-X修复方案 |
|-------|--------|------------|
| 正则表达式中的字面量`{` | `/{/` | `/\{/` |
| 无效转义 | `\R`被当作字面量 | `\\R` 或 `R` |
| Base64字符串 | 任意长度 | 要求至少3个字符 |
| 负索引 | `@a[-1]` | `@a[#a - 1]` |
| 重复修饰符 | 允许 | 移除重复的修饰符 |
> **注意:** 仅将`--relaxed-re-syntax`用作诊断工具。请修复问题而非依赖宽松模式。Quick Reference
快速参考
Naming Convention
命名规范
{CATEGORY}_{PLATFORM}_{FAMILY}_{VARIANT}_{DATE}Common prefixes: (malware), (hacking tool), , , (suspicious), (generic)
MAL_HKTL_WEBSHELL_EXPL_SUSP_GEN_Platforms: , , , ,
Win_Lnx_Mac_Android_CRX_Example:
MAL_Win_Emotet_Loader_Jan25See style-guide.md for full conventions, metadata requirements, and naming examples.
{CATEGORY}_{PLATFORM}_{FAMILY}_{VARIANT}_{DATE}常见前缀: (恶意软件)、(黑客工具)、、(漏洞利用)、(可疑)、(通用)
MAL_HKTL_WEBSHELL_EXPL_SUSP_GEN_平台: 、、、、
Win_Lnx_Mac_Android_CRX_示例:
MAL_Win_Emotet_Loader_Jan25完整规范、元数据要求和命名示例请查看style-guide.md。
Required Metadata
必填元数据
Every rule needs: (starts with "Detects"), , , .
descriptionauthorreferencedateyara
meta:
description = "Detects Example malware via unique mutex and C2 path"
author = "Your Name <email@example.com>"
reference = "https://example.com/analysis"
date = "2025-01-29"每个规则都需要:(以“Detects”开头)、、、。
descriptionauthorreferencedateyara
meta:
description = "Detects Example malware via unique mutex and C2 path"
author = "Your Name <email@example.com>"
reference = "https://example.com/analysis"
date = "2025-01-29"String Selection
字符串选择
Good: Mutex names, PDB paths, C2 paths, stack strings, configuration markers
Bad: API names, common executables, format specifiers, generic paths
See strings.md for the full decision tree and examples.
优质: 互斥体名称、PDB路径、C2路径、栈字符串、配置标记
劣质: API名称、常见可执行文件、格式说明符、通用路径
完整决策树和示例请查看strings.md。
Condition Patterns
条件模式
Order conditions for short-circuit:
- (instant)
filesize < 10MB - (nearly instant)
uint16(0) == 0x5A4D - String matches (cheap)
- Module checks (expensive)
See performance.md for detailed optimization patterns.
按短路顺序排列条件:
- (即时完成)
filesize < 10MB - (几乎即时完成)
uint16(0) == 0x5A4D - 字符串匹配(低成本)
- 模块检查(高成本)
详细优化模式请查看performance.md。
Workflow
工作流
- Gather samples — Multiple samples; single-sample rules are brittle
- Extract candidates —
yarGen -m samples/ --excludegood - Validate quality — Use decision tree; yarGen needs 80% filtering
- Write initial rule — Follow template with proper metadata
- Lint and test — ,
yr check, linter scriptyr fmt - Goodware validation — VirusTotal corpus or local clean files
- Deploy — Add to repo with full metadata, monitor for FPs
See testing.md for detailed validation workflow and FP investigation.
For a comprehensive step-by-step guide covering all phases from sample collection to deployment, see rule-development.md.
- 收集样本 —— 多个样本;单一样本的规则很脆弱
- 提取候选字符串 ——
yarGen -m samples/ --excludegood - 验证质量 —— 使用决策树;yarGen的输出需要过滤80%
- 编写初始规则 —— 遵循模板并添加适当的元数据
- 检查和测试 —— 、
yr check、检查脚本yr fmt - 良性软件验证 —— VirusTotal语料库或本地干净文件
- 部署 —— 添加到仓库并附带完整元数据,监控误报
详细验证工作流和误报调查请查看testing.md。
涵盖从样本收集到部署所有阶段的全面分步指南,请查看rule-development.md。
Common Mistakes
常见错误
| Mistake | Bad | Good |
|---|---|---|
| API names as indicators | | Hex pattern of call site + unique mutex |
| Unbounded regex | | |
| Missing file type filter | | |
| Short strings | | |
| Unescaped braces (YARA-X) | | |
| 错误 | 错误示例 | 正确示例 |
|---|---|---|
| 使用API名称作为指标 | | 调用站点的十六进制模式 + 唯一互斥体 |
| 无界正则表达式 | | |
| 缺少文件类型过滤器 | 先写 | 先写 |
| 短字符串 | | |
| 未转义大括号(YARA-X) | | |
Performance Optimization
性能优化
Quick wins: Put first, avoid , bounded regex , prefer hex over regex.
filesizenocase{1,100}Red flags: Strings <4 bytes, unbounded regex (), modules without file-type filter.
.*See performance.md for atom theory and optimization details.
快速优化: 将放在最前面,避免使用,使用有界正则表达式,优先使用十六进制而非正则表达式。
filesizenocase{1,100}危险信号: 长度不足4字节的字符串、无界正则表达式()、未添加文件类型过滤器的模块调用。
.*原子理论和优化细节请查看performance.md。
Reference Documents
参考文档
| Topic | Document |
|---|---|
| Naming and metadata conventions | style-guide.md |
| Performance and atom optimization | performance.md |
| String types and judgment | strings.md |
| Testing and validation | testing.md |
| Chrome extension module (crx) | crx-module.md |
| Android DEX module (dex) | dex-module.md |
| 主题 | 文档 |
|---|---|
| 命名和元数据规范 | style-guide.md |
| 性能和原子优化 | performance.md |
| 字符串类型和判断 | strings.md |
| 测试和验证 | testing.md |
| Chrome扩展模块(crx) | crx-module.md |
| Android DEX模块(dex) | dex-module.md |
Workflows
工作流文档
| Topic | Document |
|---|---|
| Complete rule development process | rule-development.md |
| 主题 | 文档 |
|---|---|
| 完整规则开发流程 | rule-development.md |
Example Rules
示例规则
The directory contains real, attributed rules demonstrating best practices:
examples/| Example | Demonstrates | Source |
|---|---|---|
| MAL_Win_Remcos_Jan25.yar | PE malware: graduated string counts, multiple rules per family | Elastic Security |
| MAL_Mac_ProtonRAT_Jan25.yar | macOS: Mach-O magic bytes, multi-category grouping | Airbnb BinaryAlert |
| MAL_NPM_SupplyChain_Jan25.yar | npm supply chain: real attack patterns, ERC-20 selectors | Stairwell Research |
| SUSP_JS_Obfuscation_Jan25.yar | JavaScript: obfuscator detection, density-based matching | imp0rtp3, Nils Kuhnert |
| SUSP_CRX_SuspiciousPermissions.yar | Chrome extensions: crx module, permissions | Educational |
examples/| 示例 | 演示内容 | 来源 |
|---|---|---|
| MAL_Win_Remcos_Jan25.yar | PE恶意软件:分级字符串计数,每个家族对应多个规则 | Elastic Security |
| MAL_Mac_ProtonRAT_Jan25.yar | macOS:Mach-O魔术字节,多类别分组 | Airbnb BinaryAlert |
| MAL_NPM_SupplyChain_Jan25.yar | npm供应链:真实攻击模式,ERC-20选择器 | Stairwell Research |
| SUSP_JS_Obfuscation_Jan25.yar | JavaScript:混淆器检测,基于密度的匹配 | imp0rtp3, Nils Kuhnert |
| SUSP_CRX_SuspiciousPermissions.yar | Chrome扩展:crx模块,权限检测 | 教育示例 |
Scripts
脚本
bash
uv run {baseDir}/scripts/yara_lint.py rule.yar # Validate style/metadata
uv run {baseDir}/scripts/atom_analyzer.py rule.yar # Check string qualitySee README.md for detailed script documentation.
bash
uv run {baseDir}/scripts/yara_lint.py rule.yar # 验证样式/元数据
uv run {baseDir}/scripts/atom_analyzer.py rule.yar # 检查字符串质量详细脚本文档请查看README.md。
Quality Checklist
质量检查清单
Before deploying any rule:
- Name follows format
{CATEGORY}_{PLATFORM}_{FAMILY}_{VARIANT}_{DATE} - Description starts with "Detects" and explains what/how
- All required metadata present (author, reference, date)
- Strings are unique (not API names, common paths, or format strings)
- All strings have 4+ bytes with good atom potential
- Base64 modifier only on strings with 3+ characters
- Regex patterns have escaped and valid escape sequences
{ - Condition starts with cheap checks (filesize, magic bytes)
- Rule matches all target samples
- Rule produces zero matches on goodware corpus
- passes with no errors
yr check - passes (consistent formatting)
yr fmt --check - Linter passes with no errors
- Peer review completed
部署任何规则前,请确认:
- 名称遵循格式
{CATEGORY}_{PLATFORM}_{FAMILY}_{VARIANT}_{DATE} - 描述以“Detects”开头并解释检测对象和方式
- 所有必填元数据齐全(作者、参考、日期)
- 字符串是独特的(不是API名称、通用路径或格式字符串)
- 所有字符串长度≥4字节且具有良好的原子生成潜力
- 仅对长度≥3的字符串使用Base64修饰符
- 正则表达式已转义且转义序列有效
{ - 条件以低成本检查开头(文件大小、魔术字节)
- 规则匹配所有目标样本
- 规则在良性软件语料库上无匹配
- 通过且无错误
yr check - 通过(格式一致)
yr fmt --check - 检查脚本通过且无错误
- 已完成同行评审
Resources
资源
Quality YARA Rule Repositories
优质YARA规则仓库
Learn from production rules. These repositories contain well-tested, properly attributed rules:
| Repository | Focus | Maintainer |
|---|---|---|
| Neo23x0/signature-base | 17,000+ production rules, multi-platform | Florian Roth |
| Elastic/protections-artifacts | 1,000+ endpoint-tested rules | Elastic Security |
| reversinglabs/reversinglabs-yara-rules | Threat research rules | ReversingLabs |
| imp0rtp3/js-yara-rules | JavaScript/browser malware | imp0rtp3 |
| InQuest/awesome-yara | Curated index of resources | InQuest |
向生产环境的规则学习。这些仓库包含经过充分测试、来源明确的规则:
| 仓库 | 重点 | 维护者 |
|---|---|---|
| Neo23x0/signature-base | 17000+生产规则,多平台 | Florian Roth |
| Elastic/protections-artifacts | 1000+经过端点测试的规则 | Elastic Security |
| reversinglabs/reversinglabs-yara-rules | 威胁研究规则 | ReversingLabs |
| imp0rtp3/js-yara-rules | JavaScript/浏览器恶意软件 | imp0rtp3 |
| InQuest/awesome-yara | 精选资源索引 | InQuest |
Style & Performance Guides
样式与性能指南
| Guide | Purpose |
|---|---|
| YARA Style Guide | Naming conventions, metadata, string prefixes |
| YARA Performance Guidelines | Atom optimization, regex bounds |
| Kaspersky Applied YARA Training | Expert techniques from production use |
| 指南 | 用途 |
|---|---|
| YARA Style Guide | 命名规范、元数据、字符串前缀 |
| YARA Performance Guidelines | 原子优化、正则表达式边界 |
| Kaspersky Applied YARA Training | 生产环境的专家技术 |
Tools
工具
macOS-Specific Resources
macOS专属资源
| Resource | Purpose |
|---|---|
| Apple XProtect | Production macOS rules at |
| objective-see | macOS malware research and samples |
| macOS Security Tools | Reference list |
| 资源 | 用途 |
|---|---|
| Apple XProtect | 生产环境的macOS规则位于 |
| objective-see | macOS恶意软件研究和样本 |
| macOS Security Tools | 参考列表 |
Multi-Indicator Clustering Pattern
多指标聚类模式
Production rules often group indicators by type:
yara
strings:
// Category A: Library indicators
$a1 = "SRWebSocket" ascii
$a2 = "SocketRocket" ascii
// Category B: Behavioral indicators
$b1 = "SSH tunnel" ascii
$b2 = "keylogger" ascii nocase
// Category C: C2 patterns
$c1 = /https:\/\/[a-z0-9]{8,16}\.onion/
condition:
filesize < 10MB and
any of ($a*) and any of ($b*) // Require evidence from BOTH categoriesWhy this works: Different indicator types have different confidence levels. A single C2 domain might be definitive, while you need multiple library imports to be confident. Grouping by , , lets you express graduated requirements.
$a*$b*$c*生产环境的规则通常按类型分组指标:
yara
strings:
// 类别A:库指标
$a1 = "SRWebSocket" ascii
$a2 = "SocketRocket" ascii
// 类别B:行为指标
$b1 = "SSH tunnel" ascii
$b2 = "keylogger" ascii nocase
// 类别C:C2模式
$c1 = /https:\/\/[a-z0-9]{8,16}\.onion/
condition:
filesize < 10MB and
any of ($a*) and any of ($b*) // 需要来自两个类别的证据为何有效: 不同类型的指标具有不同的置信度。单个C2域名可能具有决定性,而您需要多个库导入才能确定。按、、分组可以让您表达分级的要求。
$a*$b*$c*