yara-rule-authoring

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

YARA-X Rule Authoring

YARA-X规则编写指南

Write detection rules that catch malware without drowning in false positives.
This skill targets YARA-X, the Rust-based successor to legacy YARA. YARA-X powers VirusTotal's production systems and is the recommended implementation. See Migrating from Legacy YARA if you have existing rules.
编写能够精准检测恶意软件且不会产生大量误报的检测规则。
本指南针对YARA-X,它是基于Rust开发的旧版YARA的替代方案。YARA-X为VirusTotal的生产系统提供支持,是推荐使用的版本。如果您已有旧版规则,请查看从旧版YARA迁移章节。

Core Principles

核心原则

  1. Strings must generate good atoms — YARA extracts 4-byte subsequences for fast matching. Strings with repeated bytes, common sequences, or under 4 bytes force slow bytecode verification on too many files.
  2. Target specific families, not categories — "Detects ransomware" catches everything and nothing. "Detects LockBit 3.0 configuration extraction routine" catches what you want.
  3. Test against goodware before deployment — A rule that fires on Windows system files is useless. Validate against VirusTotal's goodware corpus or your own clean file set.
  4. Short-circuit with cheap checks first — Put
    filesize < 10MB and uint16(0) == 0x5A4D
    before expensive string searches or module calls.
  5. Metadata is documentation — Future you (and your team) need to know what this catches, why, and where the sample came from.
  1. 字符串必须生成优质原子 —— YARA会提取4字节子序列以实现快速匹配。包含重复字节、常见序列或长度不足4字节的字符串会导致对过多文件执行缓慢的字节码验证。
  2. 针对特定家族,而非类别 —— “检测勒索软件”这类规则范围过宽,毫无针对性。“检测LockBit 3.0配置提取例程”才能精准命中目标。
  3. 部署前针对良性软件测试 —— 触发Windows系统文件的规则毫无用处。请通过VirusTotal的良性软件语料库或您自己的干净文件集验证规则。
  4. 先执行低成本检查以短路后续逻辑 —— 在昂贵的字符串搜索或模块调用前,先添加
    filesize < 10MB and uint16(0) == 0x5A4D
    这类条件。
  5. 元数据即文档 —— 未来的您(以及团队成员)需要了解该规则检测的对象、原因以及样本来源。

When to Use

适用场景

  • Writing new YARA-X rules for malware detection
  • Reviewing existing rules for quality or performance issues
  • Optimizing slow-running rulesets
  • Converting IOCs or threat intel into detection signatures
  • Debugging false positive issues
  • Preparing rules for production deployment
  • Migrating legacy YARA rules to YARA-X
  • Analyzing Chrome extensions (crx module)
  • Analyzing Android apps (dex module)
  • 为恶意软件检测编写新的YARA-X规则
  • 审核现有规则的质量或性能问题
  • 优化运行缓慢的规则集
  • 将IOC或威胁情报转换为检测签名
  • 调试误报问题
  • 为生产部署准备规则
  • 将旧版YARA规则迁移至YARA-X
  • 分析Chrome扩展程序(crx模块)
  • 分析Android应用(dex模块)

When NOT to Use

不适用场景

  • Static analysis requiring disassembly → use Ghidra/IDA skills
  • Dynamic malware analysis → use sandbox analysis skills
  • Network-based detection → use Suricata/Snort skills
  • Memory forensics with Volatility → use memory forensics skills
  • Simple hash-based detection → just use hash lists
  • 需要反汇编的静态分析 → 使用Ghidra/IDA相关技能
  • 恶意软件动态分析 → 使用沙箱分析技能
  • 基于网络的检测 → 使用Suricata/Snort技能
  • 使用Volatility进行内存取证 → 使用内存取证技能
  • 简单的基于哈希的检测 → 直接使用哈希列表即可

YARA-X Overview

YARA-X概述

YARA-X is the Rust-based successor to legacy YARA: 5-10x faster regex, better errors, built-in formatter, stricter validation, new modules (crx, dex), 99% rule compatibility.
Install:
brew install yara-x
(macOS) or
cargo install yara-x
Essential commands:
yr scan
,
yr check
,
yr fmt
,
yr dump
YARA-X是基于Rust开发的旧版YARA的替代方案:正则表达式速度提升5-10倍,错误提示更友好,内置格式化工具,验证更严格,新增crx、dex等模块,规则兼容性达99%。
安装方式:
brew install yara-x
(macOS)或
cargo install yara-x
核心命令:
yr scan
yr check
yr fmt
yr dump

Platform Considerations

平台注意事项

YARA works on any file type. Adapt patterns to your target:
PlatformMagic BytesBad StringsGood Strings
Windows PE
uint16(0) == 0x5A4D
API names, Windows pathsMutex names, PDB paths
macOS Mach-O
uint32(0) == 0xFEEDFACE
(32-bit),
0xFEEDFACF
(64-bit),
0xCAFEBABE
(universal)
Common Obj-C methodsKeylogger strings, persistence paths
JavaScript/Node(none needed)
require
,
fetch
,
axios
Obfuscator signatures, eval+decode chains
npm/pip packages(none needed)
postinstall
,
dependencies
Suspicious package names, exfil URLs
Office docs
uint32(0) == 0x504B0304
VBA keywordsMacro auto-exec, encoded payloads
VS Code extensions(none needed)
vscode.workspace
Uncommon activationEvents, hidden file access
Chrome extensionsUse
crx
module
Common Chrome APIsPermission abuse, manifest anomalies
Android appsUse
dex
module
Standard DEX structureObfuscated classes, suspicious permissions
YARA适用于任何文件类型。请根据目标平台调整规则模式:
平台魔术字节不推荐的字符串推荐的字符串
Windows PE
uint16(0) == 0x5A4D
API名称、Windows路径互斥体名称、PDB路径
macOS Mach-O
uint32(0) == 0xFEEDFACE
(32位)、
0xFEEDFACF
(64位)、
0xCAFEBABE
(通用二进制)
常见Obj-C方法键盘记录器字符串、持久化路径
JavaScript/Node(无需)
require
fetch
axios
混淆器签名、eval+解码链
npm/pip包(无需)
postinstall
dependencies
可疑包名称、数据外渗URL
Office文档
uint32(0) == 0x504B0304
VBA关键字宏自动执行、编码载荷
VS Code扩展(无需)
vscode.workspace
不常见的activationEvents、隐藏文件访问
Chrome扩展使用
crx
模块
常见Chrome API权限滥用、清单异常
Android应用使用
dex
模块
标准DEX结构混淆类、可疑权限

macOS Malware Detection

macOS恶意软件检测

No dedicated Mach-O module exists yet. Use magic byte checks + string patterns:
Magic bytes:
yara
// Mach-O 32-bit
uint32(0) == 0xFEEDFACE
// Mach-O 64-bit
uint32(0) == 0xFEEDFACF
// Universal binary (fat binary)
uint32(0) == 0xCAFEBABE or uint32(0) == 0xBEBAFECA
Good indicators for macOS malware:
  • Keylogger artifacts:
    CGEventTapCreate
    ,
    kCGEventKeyDown
  • SSH tunnel strings:
    ssh -D
    ,
    tunnel
    ,
    socks
  • Persistence paths:
    ~/Library/LaunchAgents
    ,
    /Library/LaunchDaemons
  • Credential theft:
    security find-generic-password
    ,
    keychain
Example pattern from Airbnb BinaryAlert:
yara
rule SUSP_Mac_ProtonRAT
{
    strings:
        // Library indicators
        $lib1 = "SRWebSocket" ascii
        $lib2 = "SocketRocket" ascii

        // Behavioral indicators
        $behav1 = "SSH tunnel not launched" ascii
        $behav2 = "Keylogger" ascii

    condition:
        (uint32(0) == 0xFEEDFACF or uint32(0) == 0xCAFEBABE) and
        any of ($lib*) and any of ($behav*)
}
目前尚无专门的Mach-O模块。请使用魔术字节检查+字符串模式:
魔术字节:
yara
// Mach-O 32位
uint32(0) == 0xFEEDFACE
// Mach-O 64位
uint32(0) == 0xFEEDFACF
// 通用二进制(胖二进制)
uint32(0) == 0xCAFEBABE or uint32(0) == 0xBEBAFECA
macOS恶意软件的良好检测指标:
  • 键盘记录器痕迹:
    CGEventTapCreate
    kCGEventKeyDown
  • SSH隧道字符串:
    ssh -D
    tunnel
    socks
  • 持久化路径:
    ~/Library/LaunchAgents
    /Library/LaunchDaemons
  • 凭证窃取:
    security find-generic-password
    keychain
来自Airbnb BinaryAlert的示例模式:
yara
rule SUSP_Mac_ProtonRAT
{
    strings:
        // 库指标
        $lib1 = "SRWebSocket" ascii
        $lib2 = "SocketRocket" ascii

        // 行为指标
        $behav1 = "SSH tunnel not launched" ascii
        $behav2 = "Keylogger" ascii

    condition:
        (uint32(0) == 0xFEEDFACF or uint32(0) == 0xCAFEBABE) and
        any of ($lib*) and any of ($behav*)
}

JavaScript Detection Decision Tree

JavaScript检测决策树

Writing a JavaScript rule?
├─ npm package?
│  ├─ Check package.json patterns
│  ├─ Look for postinstall/preinstall hooks
│  └─ Target exfil patterns: fetch + env access + credential paths
├─ Browser extension?
│  ├─ Chrome: Use crx module
│  └─ Others: Target manifest patterns, background script behaviors
├─ Standalone JS file?
│  ├─ Look for obfuscation markers: eval+atob, fromCharCode chains
│  ├─ Target unique function/variable names (often survive minification)
│  └─ Check for packed/encoded payloads
└─ Minified/webpack bundle?
   ├─ Target unique strings that survive bundling (URLs, magic values)
   └─ Avoid function names (will be mangled)
JavaScript-specific good strings:
  • Ethereum function selectors:
    { 70 a0 82 31 }
    (transfer)
  • Zero-width characters (steganography):
    { E2 80 8B E2 80 8C }
  • Obfuscator signatures:
    _0x
    ,
    var _0x
  • Specific C2 patterns: domain names, webhook URLs
JavaScript-specific bad strings:
  • require
    ,
    fetch
    ,
    axios
    — too common
  • Buffer
    ,
    crypto
    — legitimate uses everywhere
  • process.env
    alone — need specific env var names
编写JavaScript规则?
├─ npm包?
│  ├─ 检查package.json模式
│  ├─ 查找postinstall/preinstall钩子
│  └─ 针对数据外渗模式:fetch + 环境变量访问 + 凭证路径
├─ 浏览器扩展?
│  ├─ Chrome:使用crx模块
│  └─ 其他:针对清单模式、后台脚本行为
├─ 独立JS文件?
│  ├─ 查找混淆标记:eval+atob、fromCharCode链
│  ├─ 针对唯一的函数/变量名称(通常在压缩后仍保留)
│  └─ 检查打包/编码的载荷
└─ 压缩/webpack打包文件?
   ├─ 针对打包后仍保留的唯一字符串(URL、魔术值)
   └─ 避免使用函数名称(会被混淆)
JavaScript专属的优质字符串:
  • Ethereum函数选择器:
    { 70 a0 82 31 }
    (转账)
  • 零宽字符(隐写术):
    { E2 80 8B E2 80 8C }
  • 混淆器签名:
    _0x
    var _0x
  • 特定C2模式:域名、Webhook URL
JavaScript专属的不推荐字符串:
  • require
    fetch
    axios
    —— 过于常见
  • Buffer
    crypto
    —— 合法场景广泛使用
  • 单独的
    process.env
    —— 需要结合具体环境变量名称

Essential Toolkit

核心工具集

ToolPurpose
yarGenExtract candidate strings:
yarGen.py -m samples/ --excludegood
→ validate with
yr check
FLOSSExtract obfuscated/stack strings:
floss sample.exe
(when yarGen fails)
yr CLIValidate:
yr check
, scan:
yr scan -s
, inspect:
yr dump -m pe
signature-baseStudy quality examples
YARA-CIGoodware corpus testing before deployment
Master these five. Don't get distracted by tool catalogs.
工具用途
yarGen提取候选字符串:
yarGen.py -m samples/ --excludegood
→ 使用
yr check
验证
FLOSS提取混淆/栈字符串:
floss sample.exe
(当yarGen失效时)
yr CLI验证:
yr check
,扫描:
yr scan -s
,检查:
yr dump -m pe
signature-base学习优质示例
YARA-CI部署前的良性软件语料库测试
掌握这五个工具即可,无需被繁多的工具目录分散注意力。

Rationalizations to Reject

需要摒弃的错误观念

When you catch yourself thinking these, stop and reconsider.
RationalizationExpert Response
"This generic string is unique enough"Test against goodware first. Your intuition is wrong.
"yarGen gave me these strings"yarGen suggests, you validate. Check each one manually.
"It works on my 10 samples"10 samples ≠ production. Use VirusTotal goodware corpus.
"One rule to catch all variants"Causes FP floods. Target specific families.
"I'll make it more specific if we get FPs"Write tight rules upfront. FPs burn trust.
"This hex pattern is unique"Unique in one sample ≠ unique across malware ecosystem.
"Performance doesn't matter"One slow rule slows entire ruleset. Optimize atoms.
"PEiD rules still work"Obsolete. 32-bit packers aren't relevant.
"I'll add more conditions later"Weak rules deployed = damage done.
"This is just for hunting"Hunting rules become detection rules. Same quality bar.
"The API name makes it malicious"Legitimate software uses same APIs. Need behavioral context.
"any of them is fine for these common strings"Common strings + any = FP flood. Use
any of
only for individually unique strings.
"This regex is specific enough"
/fetch.*token/
matches all auth code. Add exfil destination requirement.
"The JavaScript looks clean"Attackers poison legitimate code with injects. Check for eval+decode chains.
"I'll use .* for flexibility"Unbounded regex = performance disaster + memory explosion. Use
.{0,30}
.
"I'll use --relaxed-re-syntax everywhere"Masks real bugs. Fix the regex instead of hiding problems.
当您出现以下想法时,请立即停止并重新考虑:
错误观念专家回应
“这个通用字符串足够独特”先针对良性软件测试。您的直觉是错误的。
“yarGen给了我这些字符串”yarGen仅提供建议,您需要手动验证每个字符串。
“它在我的10个样本上有效”10个样本≠生产环境。请使用VirusTotal的良性软件语料库。
“一个规则就能检测所有变体”会导致大量误报。请针对特定家族编写规则。
“如果出现误报我再优化”从一开始就编写严格的规则。误报会消耗信任。
“这个十六进制模式是独特的”在一个样本中独特≠在整个恶意软件生态中独特。
“性能不重要”一个慢规则会拖慢整个规则集。请优化原子。
“PEiD规则仍然有效”已过时。32位打包器已不再相关。
“我稍后再添加更多条件”部署弱规则=已造成损害。
“这只是用于狩猎”狩猎规则最终会变成检测规则。需遵循相同的质量标准。
“这个API名称表明它是恶意的”合法软件也会使用相同的API。需要结合行为上下文。
“对于这些常见字符串,any of就足够了”常见字符串+any of=大量误报。仅对本身独特的字符串使用
any of
“这个正则表达式足够具体”
/fetch.*token/
会匹配所有认证代码。需添加数据外渗目的地要求。
“这段JavaScript看起来很干净”攻击者会在合法代码中注入恶意内容。请检查eval+解码链。
“我会用.*来增加灵活性”无界正则表达式=性能灾难+内存爆炸。请使用
.{0,30}
“我会在所有地方使用--relaxed-re-syntax”会掩盖真正的漏洞。请修复正则表达式而非隐藏问题。

Decision Trees

决策树

Is This String Good Enough?

这个字符串是否足够优质?

Is this string good enough?
├─ Less than 4 bytes?
│  └─ NO — find longer string
├─ Contains repeated bytes (0000, 9090)?
│  └─ NO — add surrounding context
├─ Is an API name (VirtualAlloc, CreateRemoteThread)?
│  └─ NO — use hex pattern of call site instead
├─ Appears in Windows system files?
│  └─ NO — too generic, find something unique
├─ Is it a common path (C:\Windows\, cmd.exe)?
│  └─ NO — find malware-specific paths
├─ Unique to this malware family?
│  └─ YES — use it
└─ Appears in other malware too?
   └─ MAYBE — combine with family-specific marker
这个字符串是否足够优质?
├─ 长度不足4字节?
│  └─ 否 —— 寻找更长的字符串
├─ 包含重复字节(如0000、9090)?
│  └─ 否 —— 添加上下文信息
├─ 是API名称(如VirtualAlloc、CreateRemoteThread)?
│  └─ 否 —— 使用调用站点的十六进制模式替代
├─ 出现在Windows系统文件中?
│  └─ 否 —— 过于通用,寻找独特的内容
├─ 是常见路径(如C:\Windows\、cmd.exe)?
│  └─ 否 —— 寻找恶意软件专属路径
├─ 仅属于该恶意软件家族?
│  └─ 是 —— 使用该字符串
└─ 也出现在其他恶意软件中?
   └─ 可能 —— 结合家族专属标记使用

When to Use "all of" vs "any of"

何时使用“all of” vs “any of”

Should I require all strings or allow any?
├─ Strings are individually unique to malware?
│  └─ any of them (each alone is suspicious)
├─ Strings are common but combination is suspicious?
│  └─ all of them (require the full pattern)
├─ Strings have different confidence levels?
│  └─ Group: all of ($core_*) and any of ($variant_*)
└─ Seeing many false positives?
   └─ Tighten: switch any → all, add more required strings
Lesson from production: Rules using
any of ($network_*)
where strings included "fetch", "axios", "http" matched virtually all web applications. Switching to require credential path AND network call AND exfil destination eliminated FPs.
我应该要求匹配所有字符串还是允许匹配任意一个?
├─ 字符串本身对恶意软件来说是独特的?
│  └─ any of them(每个字符串单独出现就可疑)
├─ 字符串单独常见,但组合起来可疑?
│  └─ all of them(需要完整的模式)
├─ 字符串的置信度不同?
│  └─ 分组:all of ($core_*) and any of ($variant_*)
└─ 出现大量误报?
   └─ 收紧规则:将any改为all,添加更多必填字符串
生产经验教训: 使用
any of ($network_*)
且字符串包含“fetch”、“axios”、“http”的规则几乎匹配了所有Web应用。改为要求凭证路径+网络调用+数据外渗目的地后,误报完全消除。

When to Abandon a Rule Approach

何时放弃某一规则方案

Stop and pivot when:
  • yarGen returns only API names and paths → See When Strings Fail, Pivot to Structure
  • Can't find 3 unique strings → Probably packed. Target the unpacked version or detect the packer.
  • Rule matches goodware files → Strings aren't unique enough. 1-2 matches = investigate and tighten; 3-5 matches = find different indicators; 6+ matches = start over.
  • Performance is terrible even after optimization → Architecture problem. Split into multiple focused rules or add strict pre-filters.
  • Description is hard to write → The rule is too vague. If you can't explain what it catches, it catches too much.
出现以下情况时,请停止并转向其他方案:
  • yarGen仅返回API名称和路径 → 查看当字符串失效时,转向结构检测
  • 无法找到3个独特字符串 → 样本可能被打包。请针对解包后的版本或检测打包器本身。
  • 规则匹配良性软件文件 → 字符串不够独特。1-2个匹配=调查并收紧规则;3-5个匹配=寻找其他指标;6个以上匹配=重新开始。
  • 即使优化后性能仍然很差 → 架构问题。拆分为多个聚焦的规则或添加严格的前置过滤器。
  • 规则描述难以撰写 → 规则过于模糊。如果您无法解释它检测的内容,说明它的范围太广。

Debugging False Positives

调试误报

FP Investigation Flow:
├─ 1. Which string matched?
│     Run: yr scan -s rule.yar false_positive.exe
├─ 2. Is it in a legitimate library?
│     └─ Add: not $fp_vendor_string exclusion
├─ 3. Is it a common development pattern?
│     └─ Find more specific indicator, replace the string
├─ 4. Are multiple generic strings matching together?
│     └─ Tighten to require all + add unique marker
└─ 5. Is the malware using common techniques?
      └─ Target malware-specific implementation details, not the technique
误报调查流程:
├─ 1. 哪个字符串匹配了?
│     运行:yr scan -s rule.yar false_positive.exe
├─ 2. 它是否来自合法库?
│     └─ 添加:not $fp_vendor_string 排除规则
├─ 3. 它是否是常见的开发模式?
│     └─ 寻找更具体的指标,替换该字符串
├─ 4. 多个通用字符串同时匹配?
│     └─ 收紧规则:要求匹配所有字符串 + 添加独特标记
└─ 5. 恶意软件使用了常见技术?
      └─ 针对恶意软件专属的实现细节,而非技术本身

Hex vs Text vs Regex

十六进制、文本与正则表达式的选择

What string type should I use?
├─ Exact ASCII/Unicode text?
│  └─ TEXT: $s = "MutexName" ascii wide
├─ Specific byte sequence?
│  └─ HEX: $h = { 4D 5A 90 00 }
├─ Byte sequence with variation?
│  └─ HEX with wildcards: { 4D 5A ?? ?? 50 45 }
├─ Pattern with structure (URLs, paths)?
│  └─ BOUNDED REGEX: /https:\/\/[a-z]{5,20}\.onion/
└─ Unknown encoding (XOR, base64)?
   └─ TEXT with modifier: $s = "config" xor(0x00-0xFF)
我应该使用哪种字符串类型?
├─ 精确的ASCII/Unicode文本?
│  └─ TEXT: $s = "MutexName" ascii wide
├─ 特定字节序列?
│  └─ HEX: $h = { 4D 5A 90 00 }
├─ 带有变体的字节序列?
│  └─ 带通配符的HEX: { 4D 5A ?? ?? 50 45 }
├─ 带有结构的模式(URL、路径)?
│  └─ 有界正则表达式: /https:\/\/[a-z]{5,20}\.onion/
└─ 未知编码(XOR、base64)?
   └─ 带修饰符的TEXT: $s = "config" xor(0x00-0xFF)

Is the Sample Packed? (Check First)

样本是否被打包?(先检查)

Before writing any string-based rule:
Is the sample packed?
├─ Entropy > 7.0?
│  └─ Likely packed — find unpacked layer first
├─ Few/no readable strings?
│  └─ Likely packed — use entropy, PE structure, or packer signatures
├─ UPX/MPRESS/custom packer detected?
│  └─ Target the unpacked payload OR detect the packer itself
└─ Readable strings available?
   └─ Proceed with string-based detection
Expert guidance: Don't write rules against packed layers. The packing changes; the payload doesn't.
在编写任何基于字符串的规则前:
样本是否被打包?
├─ 熵值>7.0?
│  └─ 可能被打包 —— 先找到解包后的层
├─ 几乎没有可读字符串?
│  └─ 可能被打包 —— 使用熵值、PE结构或打包器签名检测
├─ 检测到UPX/MPRESS/自定义打包器?
│  └─ 针对解包后的载荷 或 检测打包器本身
└─ 存在可读字符串?
   └─ 继续使用基于字符串的检测
专家建议: 不要针对打包层编写规则。打包方式会变化,但载荷不会。

When Strings Fail, Pivot to Structure

当字符串失效时,转向结构检测

If yarGen returns only API names and generic paths:
String extraction failed — what now?
├─ High entropy sections?
│  └─ Use math.entropy() on specific sections
├─ Unusual imports pattern?
│  └─ Use pe.imphash() for import hash clustering
├─ Consistent PE structure anomalies?
│  └─ Target section names, sizes, characteristics
├─ Metadata present?
│  └─ Target version info, timestamps, resources
└─ Nothing unique?
   └─ This sample may not be detectable with YARA alone
Expert guidance: "One can try to use other file properties, such as metadata, entropy, import hashes or other data which stays constant." — Kaspersky Applied YARA Training
如果yarGen仅返回API名称和通用路径:
字符串提取失败 —— 现在该怎么办?
├─ 高熵区段?
│  └─ 对特定区段使用math.entropy()
├─ 异常的导入模式?
│  └─ 使用pe.imphash()进行导入哈希聚类
├─ 一致的PE结构异常?
│  └─ 针对区段名称、大小、特征
├─ 存在元数据?
│  └─ 针对版本信息、时间戳、资源
└─ 没有独特内容?
   └─ 该样本可能无法仅通过YARA检测
专家建议: “可以尝试使用其他文件属性,如元数据、熵值、导入哈希或其他保持不变的数据。” —— 卡巴斯基YARA应用培训

Expert Heuristics

专家启发式规则

String selection: Mutex names are gold; C2 paths silver; error messages bronze. Stack strings are almost always unique. If you need >6 strings, you're over-fitting.
Condition design: Start with
filesize <
, then magic bytes, then strings, then modules. If >5 lines, split into multiple rules.
Quality signals: yarGen output needs 80% filtering. Rules matching <50% of variants are too narrow; matching goodware are too broad.
Modifier discipline:
  • Never use
    nocase
    or
    wide
    speculatively
    — only when you have confirmed evidence the case/encoding varies in samples
  • nocase
    doubles atom generation;
    wide
    doubles string matching — both have real costs
  • "If you don't have a clear reason for using those modifiers, don't do it" — Kaspersky Applied YARA
Regex anchoring:
  • Regex without a 4+ byte literal substring evaluates at every file offset — catastrophic performance
  • Always anchor regex to a distinctive literal:
    /mshta\.exe http:\/\/.../
    not
    /http:\/\/.../
  • If you can't anchor, consider hex pattern with wildcards instead
Loop discipline:
  • Always bound loops with filesize:
    filesize < 100KB and for all i in (1..#a) : ...
  • Unbounded
    #a
    can be thousands in large files — exponential slowdown
YARA-X tips:
$_unused
to suppress warnings;
private $s
to hide from output;
yr check
+
yr fmt
before every commit.
字符串选择: 互斥体名称是黄金指标;C2路径是白银指标;错误信息是青铜指标。栈字符串几乎总是独特的。如果您需要超过6个字符串,说明您过度拟合了。
条件设计:
filesize <
开头,然后是魔术字节,接着是字符串,最后是模块调用。如果条件超过5行,拆分为多个规则。
质量信号: yarGen的输出需要过滤掉80%的内容。匹配变体不足50%的规则过于狭窄;匹配良性软件的规则过于宽泛。
修饰符规范:
  • 切勿随意使用
    nocase
    wide
    —— 仅当您有确凿证据表明样本中的大小写/编码存在变化时才使用
  • nocase
    会使原子生成量翻倍;
    wide
    会使字符串匹配量翻倍 —— 两者都会带来实际性能损耗
  • “如果您没有明确的理由使用这些修饰符,就不要使用” —— 卡巴斯基YARA应用培训
正则表达式锚定:
  • 不包含4字节以上字面量子串的正则表达式 会在文件的每个偏移位置进行评估 —— 性能灾难
  • 始终将正则表达式锚定到独特的字面量:
    /mshta\.exe http:\/\/.../
    而非
    /http:\/\/.../
  • 如果无法锚定,考虑使用带通配符的十六进制模式替代
循环规范:
  • 始终使用文件大小限制循环:
    filesize < 100KB and for all i in (1..#a) : ...
  • 无界的
    #a
    在大文件中可能达到数千次 —— 指数级性能下降
YARA-X技巧: 使用
$_unused
抑制警告;使用
private $s
隐藏输出;每次提交前运行
yr check
+
yr fmt

When to Use Modules vs. Byte Checks

何时使用模块 vs 字节检查

Should I use a module or raw bytes?
├─ Need imphash/rich header/authenticode?
│  └─ Use PE module — too complex to replicate
├─ Just checking magic bytes or simple offsets?
│  └─ Use uint16/uint32 — faster, no module overhead
├─ Checking section names/sizes?
│  └─ PE module is cleaner, but add magic bytes filter FIRST
├─ Checking Chrome extension permissions?
│  └─ Use crx module — string parsing is fragile
└─ Checking LNK target paths?
   └─ Use lnk module — LNK format is complex
Expert guidance: "Avoid the magic module — use explicit hex checks instead" — Neo23x0. Apply this principle: if you can do it with uint32(), don't load a module.
我应该使用模块还是原始字节?
├─ 需要imphash/富头/验证码?
│  └─ 使用PE模块 —— 复制实现过于复杂
├─ 仅检查魔术字节或简单偏移?
│  └─ 使用uint16/uint32 —— 速度更快,无模块开销
├─ 检查区段名称/大小?
│  └─ PE模块更简洁,但先添加魔术字节过滤器
├─ 检查Chrome扩展权限?
│  └─ 使用crx模块 —— 字符串解析不可靠
└─ 检查LNK目标路径?
   └─ 使用lnk模块 —— LNK格式复杂
专家建议: “避免使用magic模块 —— 改用显式十六进制检查” —— Neo23x0。遵循此原则:如果可以用uint32()实现,就不要加载模块。

YARA-X New Features

YARA-X新功能

Key additions from recent releases:
  • Private patterns (v1.3.0+):
    private $helper = "pattern"
    — matches but hidden from output
  • Warning suppression (v1.4.0+):
    // suppress: slow_pattern
    inline comments
  • Numeric underscores (v1.5.0+):
    filesize < 10_000_000
    for readability
  • Built-in formatter:
    yr fmt rules/
    to standardize formatting
  • NDJSON output:
    yr scan --output-format ndjson
    for tooling
近期版本的关键新增功能:
  • 私有模式(v1.3.0+):
    private $helper = "pattern"
    —— 会匹配但不会显示在输出中
  • 警告抑制(v1.4.0+):
    // suppress: slow_pattern
    行内注释
  • 数字下划线(v1.5.0+):
    filesize < 10_000_000
    提升可读性
  • 内置格式化工具
    yr fmt rules/
    标准化格式
  • NDJSON输出
    yr scan --output-format ndjson
    便于工具集成

YARA-X Tooling Workflow

YARA-X工具工作流

YARA-X provides diagnostic tools legacy YARA lacks:
Rule development cycle:
bash
undefined
YARA-X提供了旧版YARA没有的诊断工具:
规则开发周期:
bash
undefined

1. Write initial rule

1. 编写初始规则

2. Check syntax with detailed errors

2. 检查语法并查看详细错误

yr check rule.yar
yr check rule.yar

3. Format consistently

3. 统一格式化

yr fmt -w rule.yar
yr fmt -w rule.yar

4. Dump module output to inspect file structure (no dummy rule needed)

4. 导出模块输出以检查文件结构(无需虚拟规则)

yr dump -m pe sample.exe --output-format yaml
yr dump -m pe sample.exe --output-format yaml

5. Scan with timing info

5. 扫描并查看计时信息

time yr scan -s rule.yar corpus/

**When to use `yr dump`:**
- Investigating what PE/ELF/Mach-O fields are available
- Debugging why module conditions aren't matching
- Exploring new modules (crx, lnk, dotnet) before writing rules

**YARA-X diagnostic advantage:** Error messages include precise source locations. If `yr check` points to line 15, the issue is actually on line 15 (unlike legacy YARA).
time yr scan -s rule.yar corpus/

**何时使用`yr dump`:**
- 调查PE/ELF/Mach-O可用的字段
- 调试模块条件不匹配的原因
- 在编写规则前探索新模块(crx、lnk、dotnet)

**YARA-X诊断优势:** 错误信息包含精确的源代码位置。如果`yr check`指向第15行,问题确实出在第15行(与旧版YARA不同)。

Chrome Extension Analysis (crx module)

Chrome扩展分析(crx模块)

The
crx
module enables detection of malicious Chrome extensions. Requires YARA-X v1.5.0+ (basic), v1.11.0+ for
permhash()
.
Key APIs:
crx.is_crx
,
crx.permissions
,
crx.permhash()
Red flags:
nativeMessaging
+
downloads
,
debugger
permission, content scripts on
<all_urls>
yara
import "crx"

rule SUSP_CRX_HighRiskPerms {
    condition:
        crx.is_crx and
        for any perm in crx.permissions : (perm == "debugger")
}
See crx-module.md for complete API reference, permission risk assessment, and example rules.
crx
模块可用于检测恶意Chrome扩展。需要YARA-X v1.5.0+(基础功能),v1.11.0+支持
permhash()
核心API:
crx.is_crx
crx.permissions
crx.permhash()
危险信号:
nativeMessaging
+
downloads
debugger
权限、针对
<all_urls>
的内容脚本
yara
import "crx"

rule SUSP_CRX_HighRiskPerms {
    condition:
        crx.is_crx and
        for any perm in crx.permissions : (perm == "debugger")
}
完整API参考、权限风险评估和示例规则请查看crx-module.md

Android DEX Analysis (dex module)

Android DEX分析(dex模块)

The
dex
module enables detection of Android malware. Requires YARA-X v1.11.0+. Not compatible with legacy YARA's dex module — API is completely different.
Key APIs:
dex.is_dex
,
dex.contains_class()
,
dex.contains_method()
,
dex.contains_string()
Red flags: Single-letter class names (obfuscation),
DexClassLoader
reflection, encrypted assets
yara
import "dex"

rule SUSP_DEX_DynamicLoading {
    condition:
        dex.is_dex and
        dex.contains_class("Ldalvik/system/DexClassLoader;")
}
See dex-module.md for complete API reference, obfuscation detection, and example rules.
dex
模块可用于检测Android恶意软件。需要YARA-X v1.11.0+。与旧版YARA的dex模块不兼容 —— API完全不同。
核心API:
dex.is_dex
dex.contains_class()
dex.contains_method()
dex.contains_string()
危险信号: 单字母类名(混淆)、
DexClassLoader
反射、加密资源
yara
import "dex"

rule SUSP_DEX_DynamicLoading {
    condition:
        dex.is_dex and
        dex.contains_class("Ldalvik/system/DexClassLoader;")
}
完整API参考、混淆检测和示例规则请查看dex-module.md

Migrating from Legacy YARA

从旧版YARA迁移

YARA-X has 99% rule compatibility, but enforces stricter validation.
Quick migration:
bash
yr check --relaxed-re-syntax rules/  # Identify issues
YARA-X的规则兼容性达99%,但会执行更严格的验证。
快速迁移:
bash
yr check --relaxed-re-syntax rules/  # 识别问题

Fix each issue, then:

修复每个问题后,运行:

yr check rules/ # Verify without relaxed mode

**Common fixes:**
| Issue | Legacy | YARA-X Fix |
|-------|--------|------------|
| Literal `{` in regex | `/{/` | `/\{/` |
| Invalid escapes | `\R` silently literal | `\\R` or `R` |
| Base64 strings | Any length | 3+ chars required |
| Negative indexing | `@a[-1]` | `@a[#a - 1]` |
| Duplicate modifiers | Allowed | Remove duplicates |

> **Note:** Use `--relaxed-re-syntax` only as a diagnostic tool. Fix issues rather than relying on relaxed mode.
yr check rules/ # 在不使用宽松模式的情况下验证

**常见修复:**
| 问题 | 旧版YARA | YARA-X修复方案 |
|-------|--------|------------|
| 正则表达式中的字面量`{` | `/{/` | `/\{/` |
| 无效转义 | `\R`被当作字面量 | `\\R` 或 `R` |
| Base64字符串 | 任意长度 | 要求至少3个字符 |
| 负索引 | `@a[-1]` | `@a[#a - 1]` |
| 重复修饰符 | 允许 | 移除重复的修饰符 |

> **注意:** 仅将`--relaxed-re-syntax`用作诊断工具。请修复问题而非依赖宽松模式。

Quick Reference

快速参考

Naming Convention

命名规范

{CATEGORY}_{PLATFORM}_{FAMILY}_{VARIANT}_{DATE}
Common prefixes:
MAL_
(malware),
HKTL_
(hacking tool),
WEBSHELL_
,
EXPL_
,
SUSP_
(suspicious),
GEN_
(generic)
Platforms:
Win_
,
Lnx_
,
Mac_
,
Android_
,
CRX_
Example:
MAL_Win_Emotet_Loader_Jan25
See style-guide.md for full conventions, metadata requirements, and naming examples.
{CATEGORY}_{PLATFORM}_{FAMILY}_{VARIANT}_{DATE}
常见前缀:
MAL_
(恶意软件)、
HKTL_
(黑客工具)、
WEBSHELL_
EXPL_
(漏洞利用)、
SUSP_
(可疑)、
GEN_
(通用)
平台:
Win_
Lnx_
Mac_
Android_
CRX_
示例:
MAL_Win_Emotet_Loader_Jan25
完整规范、元数据要求和命名示例请查看style-guide.md

Required Metadata

必填元数据

Every rule needs:
description
(starts with "Detects"),
author
,
reference
,
date
.
yara
meta:
    description = "Detects Example malware via unique mutex and C2 path"
    author = "Your Name <email@example.com>"
    reference = "https://example.com/analysis"
    date = "2025-01-29"
每个规则都需要:
description
(以“Detects”开头)、
author
reference
date
yara
meta:
    description = "Detects Example malware via unique mutex and C2 path"
    author = "Your Name <email@example.com>"
    reference = "https://example.com/analysis"
    date = "2025-01-29"

String Selection

字符串选择

Good: Mutex names, PDB paths, C2 paths, stack strings, configuration markers Bad: API names, common executables, format specifiers, generic paths
See strings.md for the full decision tree and examples.
优质: 互斥体名称、PDB路径、C2路径、栈字符串、配置标记 劣质: API名称、常见可执行文件、格式说明符、通用路径
完整决策树和示例请查看strings.md

Condition Patterns

条件模式

Order conditions for short-circuit:
  1. filesize < 10MB
    (instant)
  2. uint16(0) == 0x5A4D
    (nearly instant)
  3. String matches (cheap)
  4. Module checks (expensive)
See performance.md for detailed optimization patterns.
按短路顺序排列条件:
  1. filesize < 10MB
    (即时完成)
  2. uint16(0) == 0x5A4D
    (几乎即时完成)
  3. 字符串匹配(低成本)
  4. 模块检查(高成本)
详细优化模式请查看performance.md

Workflow

工作流

  1. Gather samples — Multiple samples; single-sample rules are brittle
  2. Extract candidates
    yarGen -m samples/ --excludegood
  3. Validate quality — Use decision tree; yarGen needs 80% filtering
  4. Write initial rule — Follow template with proper metadata
  5. Lint and test
    yr check
    ,
    yr fmt
    , linter script
  6. Goodware validation — VirusTotal corpus or local clean files
  7. Deploy — Add to repo with full metadata, monitor for FPs
See testing.md for detailed validation workflow and FP investigation.
For a comprehensive step-by-step guide covering all phases from sample collection to deployment, see rule-development.md.
  1. 收集样本 —— 多个样本;单一样本的规则很脆弱
  2. 提取候选字符串 ——
    yarGen -m samples/ --excludegood
  3. 验证质量 —— 使用决策树;yarGen的输出需要过滤80%
  4. 编写初始规则 —— 遵循模板并添加适当的元数据
  5. 检查和测试 ——
    yr check
    yr fmt
    、检查脚本
  6. 良性软件验证 —— VirusTotal语料库或本地干净文件
  7. 部署 —— 添加到仓库并附带完整元数据,监控误报
详细验证工作流和误报调查请查看testing.md
涵盖从样本收集到部署所有阶段的全面分步指南,请查看rule-development.md

Common Mistakes

常见错误

MistakeBadGood
API names as indicators
"VirtualAlloc"
Hex pattern of call site + unique mutex
Unbounded regex
/https?:\/\/.*/
/https?:\/\/[a-z0-9]{8,12}\.onion/
Missing file type filter
pe.imports(...)
first
uint16(0) == 0x5A4D and filesize < 10MB
first
Short strings
"abc"
(3 bytes)
"abcdef"
(4+ bytes)
Unescaped braces (YARA-X)
/config{key}/
/config\{key\}/
错误错误示例正确示例
使用API名称作为指标
"VirtualAlloc"
调用站点的十六进制模式 + 唯一互斥体
无界正则表达式
/https?:\/\/.*/
/https?:\/\/[a-z0-9]{8,12}\.onion/
缺少文件类型过滤器先写
pe.imports(...)
先写
uint16(0) == 0x5A4D and filesize < 10MB
短字符串
"abc"
(3字节)
"abcdef"
(4+字节)
未转义大括号(YARA-X)
/config{key}/
/config\{key\}/

Performance Optimization

性能优化

Quick wins: Put
filesize
first, avoid
nocase
, bounded regex
{1,100}
, prefer hex over regex.
Red flags: Strings <4 bytes, unbounded regex (
.*
), modules without file-type filter.
See performance.md for atom theory and optimization details.
快速优化:
filesize
放在最前面,避免使用
nocase
,使用有界正则表达式
{1,100}
,优先使用十六进制而非正则表达式。
危险信号: 长度不足4字节的字符串、无界正则表达式(
.*
)、未添加文件类型过滤器的模块调用。
原子理论和优化细节请查看performance.md

Reference Documents

参考文档

TopicDocument
Naming and metadata conventionsstyle-guide.md
Performance and atom optimizationperformance.md
String types and judgmentstrings.md
Testing and validationtesting.md
Chrome extension module (crx)crx-module.md
Android DEX module (dex)dex-module.md
主题文档
命名和元数据规范style-guide.md
性能和原子优化performance.md
字符串类型和判断strings.md
测试和验证testing.md
Chrome扩展模块(crx)crx-module.md
Android DEX模块(dex)dex-module.md

Workflows

工作流文档

TopicDocument
Complete rule development processrule-development.md
主题文档
完整规则开发流程rule-development.md

Example Rules

示例规则

The
examples/
directory contains real, attributed rules demonstrating best practices:
ExampleDemonstratesSource
MAL_Win_Remcos_Jan25.yarPE malware: graduated string counts, multiple rules per familyElastic Security
MAL_Mac_ProtonRAT_Jan25.yarmacOS: Mach-O magic bytes, multi-category groupingAirbnb BinaryAlert
MAL_NPM_SupplyChain_Jan25.yarnpm supply chain: real attack patterns, ERC-20 selectorsStairwell Research
SUSP_JS_Obfuscation_Jan25.yarJavaScript: obfuscator detection, density-based matchingimp0rtp3, Nils Kuhnert
SUSP_CRX_SuspiciousPermissions.yarChrome extensions: crx module, permissionsEducational
examples/
目录包含展示最佳实践的真实、有来源的规则:
示例演示内容来源
MAL_Win_Remcos_Jan25.yarPE恶意软件:分级字符串计数,每个家族对应多个规则Elastic Security
MAL_Mac_ProtonRAT_Jan25.yarmacOS:Mach-O魔术字节,多类别分组Airbnb BinaryAlert
MAL_NPM_SupplyChain_Jan25.yarnpm供应链:真实攻击模式,ERC-20选择器Stairwell Research
SUSP_JS_Obfuscation_Jan25.yarJavaScript:混淆器检测,基于密度的匹配imp0rtp3, Nils Kuhnert
SUSP_CRX_SuspiciousPermissions.yarChrome扩展:crx模块,权限检测教育示例

Scripts

脚本

bash
uv run {baseDir}/scripts/yara_lint.py rule.yar      # Validate style/metadata
uv run {baseDir}/scripts/atom_analyzer.py rule.yar  # Check string quality
See README.md for detailed script documentation.
bash
uv run {baseDir}/scripts/yara_lint.py rule.yar      # 验证样式/元数据
uv run {baseDir}/scripts/atom_analyzer.py rule.yar  # 检查字符串质量
详细脚本文档请查看README.md

Quality Checklist

质量检查清单

Before deploying any rule:
  • Name follows
    {CATEGORY}_{PLATFORM}_{FAMILY}_{VARIANT}_{DATE}
    format
  • Description starts with "Detects" and explains what/how
  • All required metadata present (author, reference, date)
  • Strings are unique (not API names, common paths, or format strings)
  • All strings have 4+ bytes with good atom potential
  • Base64 modifier only on strings with 3+ characters
  • Regex patterns have escaped
    {
    and valid escape sequences
  • Condition starts with cheap checks (filesize, magic bytes)
  • Rule matches all target samples
  • Rule produces zero matches on goodware corpus
  • yr check
    passes with no errors
  • yr fmt --check
    passes (consistent formatting)
  • Linter passes with no errors
  • Peer review completed
部署任何规则前,请确认:
  • 名称遵循
    {CATEGORY}_{PLATFORM}_{FAMILY}_{VARIANT}_{DATE}
    格式
  • 描述以“Detects”开头并解释检测对象和方式
  • 所有必填元数据齐全(作者、参考、日期)
  • 字符串是独特的(不是API名称、通用路径或格式字符串)
  • 所有字符串长度≥4字节且具有良好的原子生成潜力
  • 仅对长度≥3的字符串使用Base64修饰符
  • 正则表达式已转义
    {
    且转义序列有效
  • 条件以低成本检查开头(文件大小、魔术字节)
  • 规则匹配所有目标样本
  • 规则在良性软件语料库上无匹配
  • yr check
    通过且无错误
  • yr fmt --check
    通过(格式一致)
  • 检查脚本通过且无错误
  • 已完成同行评审

Resources

资源

Quality YARA Rule Repositories

优质YARA规则仓库

Learn from production rules. These repositories contain well-tested, properly attributed rules:
RepositoryFocusMaintainer
Neo23x0/signature-base17,000+ production rules, multi-platformFlorian Roth
Elastic/protections-artifacts1,000+ endpoint-tested rulesElastic Security
reversinglabs/reversinglabs-yara-rulesThreat research rulesReversingLabs
imp0rtp3/js-yara-rulesJavaScript/browser malwareimp0rtp3
InQuest/awesome-yaraCurated index of resourcesInQuest
向生产环境的规则学习。这些仓库包含经过充分测试、来源明确的规则:
仓库重点维护者
Neo23x0/signature-base17000+生产规则,多平台Florian Roth
Elastic/protections-artifacts1000+经过端点测试的规则Elastic Security
reversinglabs/reversinglabs-yara-rules威胁研究规则ReversingLabs
imp0rtp3/js-yara-rulesJavaScript/浏览器恶意软件imp0rtp3
InQuest/awesome-yara精选资源索引InQuest

Style & Performance Guides

样式与性能指南

GuidePurpose
YARA Style GuideNaming conventions, metadata, string prefixes
YARA Performance GuidelinesAtom optimization, regex bounds
Kaspersky Applied YARA TrainingExpert techniques from production use
指南用途
YARA Style Guide命名规范、元数据、字符串前缀
YARA Performance Guidelines原子优化、正则表达式边界
Kaspersky Applied YARA Training生产环境的专家技术

Tools

工具

ToolPurpose
yarGenExtract candidate strings from samples
FLOSSExtract obfuscated and stack strings
YARA-CIAutomated goodware testing
YaraDbgWeb-based rule debugger
工具用途
yarGen从样本中提取候选字符串
FLOSS提取混淆和栈字符串
YARA-CI自动化良性软件测试
YaraDbg基于Web的规则调试器

macOS-Specific Resources

macOS专属资源

ResourcePurpose
Apple XProtectProduction macOS rules at
/System/Library/CoreServices/XProtect.bundle/
objective-seemacOS malware research and samples
macOS Security ToolsReference list
资源用途
Apple XProtect生产环境的macOS规则位于
/System/Library/CoreServices/XProtect.bundle/
objective-seemacOS恶意软件研究和样本
macOS Security Tools参考列表

Multi-Indicator Clustering Pattern

多指标聚类模式

Production rules often group indicators by type:
yara
strings:
    // Category A: Library indicators
    $a1 = "SRWebSocket" ascii
    $a2 = "SocketRocket" ascii

    // Category B: Behavioral indicators
    $b1 = "SSH tunnel" ascii
    $b2 = "keylogger" ascii nocase

    // Category C: C2 patterns
    $c1 = /https:\/\/[a-z0-9]{8,16}\.onion/

condition:
    filesize < 10MB and
    any of ($a*) and any of ($b*)  // Require evidence from BOTH categories
Why this works: Different indicator types have different confidence levels. A single C2 domain might be definitive, while you need multiple library imports to be confident. Grouping by
$a*
,
$b*
,
$c*
lets you express graduated requirements.
生产环境的规则通常按类型分组指标:
yara
strings:
    // 类别A:库指标
    $a1 = "SRWebSocket" ascii
    $a2 = "SocketRocket" ascii

    // 类别B:行为指标
    $b1 = "SSH tunnel" ascii
    $b2 = "keylogger" ascii nocase

    // 类别C:C2模式
    $c1 = /https:\/\/[a-z0-9]{8,16}\.onion/

condition:
    filesize < 10MB and
    any of ($a*) and any of ($b*)  // 需要来自两个类别的证据
为何有效: 不同类型的指标具有不同的置信度。单个C2域名可能具有决定性,而您需要多个库导入才能确定。按
$a*
$b*
$c*
分组可以让您表达分级的要求。