YARA-X Rule Authoring
Write detection rules that catch malware without drowning in false positives. Based on Trail of Bits methodology.
Core Principles
- 1. Strings must generate good atoms — YARA extracts 4-byte subsequences for fast matching. Strings with repeated bytes, common sequences, or under 4 bytes force slow bytecode scans.
- Target specific families, not categories — "Detects ransomware" is useless. "Detects LockBit 3.0 config extraction routine" is useful.
- Test against goodware — Validate against clean file sets before deployment.
- Short-circuit with cheap checks first —
filesize < 10MB and uint16(0) == 0x5A4D before expensive string searches. - Metadata is documentation — Future you needs to know what this catches and why.
YARA-X Basics
YARA-X is the Rust successor to legacy YARA: 5-10x faster, better errors, built-in formatter, stricter validation, new modules (crx, dex).
Install: brew install yara-x / cargo install yara-x
Commands: yr scan, yr check, yr fmt, INLINECODE6
Rule Template
CODEBLOCK0
Naming Convention
INLINECODE7 — examples:
- - INLINECODE8
- INLINECODE9
- INLINECODE10
String Selection
Good strings (unique, specific):
- - Mutex names, PDB paths, C2 URLs
- Unique byte sequences from disassembly
- Custom encryption constants
- Uncommon API call sequences
Bad strings (too common, high FP):
- -
http://, https://, common API names alone - Single common words, short strings (<4 bytes)
- Strings found in Windows system files
Condition Patterns
CODEBLOCK1
Common magic bytes:
| Platform | Check |
|---|
| PE (Windows) | INLINECODE13 |
| ELF (Linux) |
uint32(0) == 0x464C457F |
| Mach-O 64-bit |
uint32(0) == 0xFEEDFACF |
| PDF |
uint32(0) == 0x25504446 |
| Office/ZIP |
uint32(0) == 0x504B0304 |
Performance Rules
- 1. Put
filesize and magic byte checks FIRST in condition - Never use unbounded regex like INLINECODE19
- Avoid
for all with complex conditions on large files - Use
ascii or wide, not both unless needed - Hex strings with specific bytes > wildcards > regex
- Use
at for fixed offsets instead of scanning entire file
Testing
CODEBLOCK2
False Positive Reduction
- - Add
filesize constraints (malware has typical size ranges) - Require multiple string matches (
2 of ($str*) not any of) - Exclude known good paths/publishers via
not conditions - Score-based approach: assign confidence scores in metadata, triage by threshold
- Test against goodware corpus before deployment
Reference
Full methodology, module docs (pe, elf, crx, dex), and migration guide from legacy YARA:
https://github.com/trailofbits/skills/tree/main/plugins/yara-authoring
YARA-X 规则编写
编写能够捕获恶意软件且不会淹没在误报中的检测规则。基于 Trail of Bits 方法论。
核心原则
- 1. 字符串必须生成良好的原子 — YARA 提取 4 字节子序列用于快速匹配。包含重复字节、常见序列或长度不足 4 字节的字符串会强制进行缓慢的字节码扫描。
- 针对特定家族,而非类别 — 检测勒索软件毫无用处。检测 LockBit 3.0 配置提取例程才有价值。
- 针对良性软件进行测试 — 在部署前使用干净文件集进行验证。
- 先用低成本检查进行短路 — 在昂贵的字符串搜索之前先检查 filesize < 10MB and uint16(0) == 0x5A4D。
- 元数据即文档 — 未来的你需要知道这条规则捕获什么以及为什么。
YARA-X 基础
YARA-X 是传统 YARA 的 Rust 继任者:速度快 5-10 倍,错误信息更友好,内置格式化工具,验证更严格,新增模块(crx、dex)。
安装: brew install yara-x / cargo install yara-x
命令: yr scan、yr check、yr fmt、yr dump
规则模板
yara
import pe
rule FamilyNameVariantTechnique : tag1 tag2 {
meta:
author = Your Name
date = 2026-02-14
description = 检测[恶意软件家族]中的[特定行为]
reference = https://...
tlp = TLP:WHITE
hash =
score = 75 // 0-100 置信度
strings:
// 样本中的唯一字符串
$api1 = VirtualAllocEx ascii
$api2 = WriteProcessMemory ascii
$str1 = { 48 8B 05 ?? ?? ?? ?? 48 85 C0 } // 带通配符的十六进制
$pdb = /[A-Z]:\\.\\Release\\.\.pdb/ nocase
condition:
uint16(0) == 0x5A4D and
filesize < 5MB and
(2 of ($api*) and $str1) or
$pdb
}
命名规范
FamilyVariantTechnique — 示例:
- - EmotetLoaderDocumentMacro
- CobaltStrikeBeaconx64
- GenericCryptominerXMRig
字符串选择
良好字符串(唯一、特定):
- - 互斥体名称、PDB 路径、C2 URL
- 反汇编中的唯一字节序列
- 自定义加密常量
- 不常见的 API 调用序列
不良字符串(过于常见、高误报率):
- - http://、https://、单独的常见 API 名称
- 单个常见单词、短字符串(<4 字节)
- Windows 系统文件中存在的字符串
条件模式
yara
// 按性能排序(廉价 → 昂贵)
condition:
uint16(0) == 0x5A4D and // 魔数(即时)
filesize < 10MB and // 大小过滤(即时)
2 of ($unique*) and // 字符串匹配(快速)
pe.imports(kernel32.dll) // 模块检查(较慢)
常见魔数:
| 平台 | 检查 |
|---|
| PE(Windows) | uint16(0) == 0x5A4D |
| ELF(Linux) |
uint32(0) == 0x464C457F |
| Mach-O 64 位 | uint32(0) == 0xFEEDFACF |
| PDF | uint32(0) == 0x25504446 |
| Office/ZIP | uint32(0) == 0x504B0304 |
性能规则
- 1. 将 filesize 和魔数检查放在条件的最前面
- 永远不要使用无界正则表达式如 /.*/
- 避免对大型文件使用带复杂条件的 for all
- 使用 ascii 或 wide,除非必要否则不要同时使用
- 十六进制字符串:特定字节 > 通配符 > 正则表达式
- 使用 at 定位固定偏移,而非扫描整个文件
测试
bash
验证语法
yr check rules/
扫描样本
yr scan rules/my
rule.yar suspiciousfile.exe
扫描目录
yr scan rules/ samples/ --threads 4
统一格式化规则
yr fmt rules/my_rule.yar
误报减少
- - 添加 filesize 约束(恶意软件有典型的大小范围)
- 要求多个字符串匹配(2 of ($str*) 而非 any of)
- 通过 not 条件排除已知的良好路径/发布者
- 基于评分的方法:在元数据中分配置信度分数,按阈值分类
- 部署前针对良性软件库进行测试
参考
完整方法论、模块文档(pe、elf、crx、dex)以及从传统 YARA 迁移的指南:
https://github.com/trailofbits/skills/tree/main/plugins/yara-authoring