Read this file when the sample is packed or obfuscated, uses stack strings, or you need detailed hex pattern guidance (atoms, jumps, XOR/base64).
Modern malware rarely presents strings in the clear. Adapt the detection strategy rather than giving up.
Use the xor modifier when the sample uses single-byte XOR on known strings.
Always pair with tight scope guards — xor multiplies the search space by 255x:
$x1 = "ThisIsMyC2" xor(0x01-0xff) fullword // skip 0x00, that's the plaintextUse base64 or base64wide when the sample embeds base64 payloads. Same
performance caveat — constrain with filesize and file-type guards.
When a sample is highly entropic and yields no useful strings, pivot to structural detection:
math.entropy(0, filesize) > 7.0flags packed content (but pair with other conditions — many legitimate installers are packed too).- PE section anomalies: zero
raw_data_sizewith largevirtual_sizeindicates a section that unpacks at runtime. pe.imphash()catches families that use the same import table across variants.- Small import table with only
LoadLibrary+GetProcAddresssuggests dynamic API resolution (a packer/crypter pattern). - Unusual section names (
.UPX0,.themida, custom names) indicate specific packers.
The goal is to detect the packing behaviour or structural fingerprint rather than the (invisible) payload strings.
Modern malware (especially C++ and Go) frequently builds strings
character-by-character on the stack to evade static string extraction. The
resulting assembly produces a distinctive pattern of mov byte instructions:
// Stack string pattern for "cmd" built via mov byte [rbp+offset], char
// C6 45 = mov byte ptr [rbp+...], followed by the character
$stack_cmd = { C6 45 ?? 63 C6 45 ?? 6D C6 45 ?? 64 } // 'c', 'm', 'd'When cleartext strings are missing but the sample clearly uses certain strings
(visible in dynamic analysis or capa output), look for this mov byte pattern
in the disassembly and translate it into a hex pattern. FLOSS is specifically
designed to recover stack strings automatically — run it first before resorting
to manual hex extraction.
Hex patterns with wildcards are fast and precise for byte-level matching:
// Single-byte wildcards for relative offsets
$h1 = { 48 8B 05 ?? ?? ?? ?? 48 85 C0 74 ?? }
// Variable-length jumps — prefer [min-max] over long chains of ??
$h2 = { E8 [4] 85 C0 0F 84 [4-8] 48 89 }
// Bad — 10 wildcards when the gap is always 4-6 bytes
// $h3 = { E8 ?? ?? ?? ?? 85 C0 ?? ?? ?? ?? ?? ?? ?? ?? 48 89 }
// Good — express the actual variability
$h3 = { E8 [4] 85 C0 [4-6] 48 89 }Use ?? for single-byte wildcards and [min-max] for variable-length gaps.
Prefer [N] or [min-max] jumps over long chains of ?? — they are more
performant and express intent more clearly.
YARA's pre-filter extracts 4-byte "atoms" from strings to decide which files deserve a full scan. Strings shorter than 4 bytes (or hex patterns whose only literal run is < 4 bytes) cannot form a useful atom, forcing the engine to scan every file. Avoid:
- Strings shorter than 4 characters.
- Hex patterns where all literal segments are very short
(e.g.,
{ AA ?? BB ?? CC }). - Byte sequences that are extremely common at the binary level
(e.g.,
{ 00 00 00 00 },{ CC CC CC CC }).
These degrade to near-brute-force scanning.
Keep variable-length jumps reasonable — under ~200 bytes. Large jumps
(e.g., [0-1000]) cause state explosion in the engine's matching automaton
and drastically slow scanning. If the gap between meaningful byte sequences is
larger or unpredictable, split into two separate strings and constrain them
with a condition:
$part_a and $part_b and @part_b > @part_a and @part_b - @part_a < 500When a rule risks matching a known legitimate binary, add explicit exclusions rather than removing useful strings:
condition:
uint16(0) == 0x5A4D and
filesize < 400KB and
not pe.exports("DllRegisterServer") and // exclude legit COM DLLs
not pe.imphash() == "a1b2c3d4..." and // exclude known-good imphash
(1 of ($s*) or 3 of ($x*))Prefer adding constraints over removing strings. Removing a string reduces detection coverage; adding a negative guard preserves it.
YARA works best as part of a broader analysis workflow. These FLARE team open-source tools are particularly useful before and during rule writing:
FLOSS (FLARE Obfuscated String Solver) — Run FLOSS against a sample before
writing strings. Unlike the standard strings utility, FLOSS recovers
stack-constructed strings, decoded strings, and tight strings that malware
authors use to evade basic static analysis. The output feeds directly into
string selection for the rule. Especially valuable for Go binaries where the
compiler generates stack strings by default.
capa (capability detection) — Run capa against a sample to understand what it does before deciding how to detect it. capa identifies capabilities at the code level (API call patterns, behaviours) and maps them to MITRE ATT&CK and MBC. Where YARA matches byte sequences, capa describes features at the function level. Use capa output to: identify which capabilities to target in the rule, find the right MITRE ATT&CK IDs for the meta block, and decide whether to pivot to structural detection (import combinations, PE anomalies) when strings alone are insufficient.
FakeNet-NG — Run FakeNet-NG during dynamic analysis to capture network
indicators (C2 URLs, User-Agent strings, HTTP headers, DNS queries). These
network artefacts are often embedded as strings in the binary and make excellent
$s* or $x* candidates for the rule.
The workflow is: FLOSS + capa for static triage → FakeNet-NG for dynamic indicators → YARA rule using the combined output.