Attacker-Controlled Length Prefix Anchor Poisoning#
SEC-001 / ADV-001 — found and fixed in PR #211 (issue #38).
When the evaluator resolves OffsetSpec::Relative rules, it advances EvaluationContext::last_match_end by the number of bytes the previous match consumed. For variable-width types (pstring, string, string16), that byte count is computed by a dedicated helper — bytes_consumed_with_pattern in src/evaluator/types/mod.rs — which must mirror every bounds check that the corresponding read function applies.
Before the fix, pstring_bytes_consumed re-read the raw length prefix from the buffer but did not clamp the result against the actual remaining buffer length. A crafted 4-byte big-endian prefix of \xFF\xFF\xFF\xFF (≈ 4 GB claimed length) caused bytes_consumed_with_pattern to return ~4 GB, advancing last_match_end far past buffer.len(). Every subsequent Relative rule resolved to an out-of-bounds target and was silently skipped via the engine's graceful-skip arm — no panic, no error, no test failure on benign inputs .
How the Anchor Works (Background)#
After each successful rule match, the anchor advancement loop in src/evaluator/engine/mod.rs runs:
consumed = types::bytes_consumed_with_pattern(buffer, absolute_offset, &rule.typ, Some(&rule.value))
new_anchor = absolute_offset.saturating_add(consumed)
context.set_last_match_end(new_anchor)
OffsetSpec::Relative(N) then resolves as last_match_end + N. This means the anchor is the sole bridge between a matched pstring (or any variable-width type) and every sibling or child rule that uses a relative offset. For pstring, the consumed byte count must equal prefix_width + clamped_payload_length — the prefix width plus however many payload bytes read_pstring actually verified fit in the buffer .
The anchor is global/shared and never saved or restored across child recursion, matching GNU file semantics. See GOTCHAS S3.8 for the full anchor model.
Root Cause: Divergent Helper Contracts#
read_pstring checks string_end <= buffer.len() and returns TypeReadError::BufferOverrun if the payload extends beyond the buffer. A successful read_pstring call therefore proves that the payload fit in the remaining buffer .
pstring_bytes_consumed is the consume-side twin. Before PR #211, it re-read the raw length prefix but applied none of the same bounds. The engine assumed both functions were equivalent for any input that produced a successful read. Under adversarial input they diverged: read_pstring rejected the oversized payload (or the rule never reached bytes_consumed because the match failed), but a pstring rule that did match left bytes_consumed free to return an unclamped value near u32::MAX .
The invariant the engine depended on: the bytes the anchor advances must equal the bytes the read function actually consumed from the buffer. Violating it silently corrupts every subsequent Relative offset in the same evaluation pass.
Symptom Fingerprint#
- A
pstringrule with a 4-byte length prefix matches successfully. - All sibling and child rules using
OffsetSpec::Relativesilently fail to match. - Debug logs show
Skipping rule '<name>': BufferOverrunfor each suppressed rule. The root cause (anchor saturation from the prior pstring) is invisible without cross-correlating log lines. - No panic, no error — only the match list returned to the caller is wrong.
- The caller sees only the broad parent match (e.g.,
"data") instead of the specific type-refinement matches (e.g.,"shell script with dangerous interpreter") that would follow from the relative-offset child rules .
A crafted file can trigger this deliberately to suppress classification details and force the engine to report only a generic result.
The Fix#
pstring_bytes_consumed now clamps the payload against the remaining buffer after reading the prefix:
let remaining_after_prefix = buffer.len().saturating_sub(prefix_end);
let bounded_payload = payload_length.min(remaining_after_prefix);
let actual_length = max_length.map_or(bounded_payload, |m| m.min(bounded_payload));
width.saturating_add(actual_length)
The fixed-width path in bytes_consumed_with_pattern was hardened at the same time: it now returns 0 when offset + width > buffer.len() or when checked_add overflows, preventing any fixed-width type from advancing the anchor past the buffer end.
Regression tests in src/evaluator/types/tests.rs:
test_bytes_consumed_pstring_clamps_oversized_prefix_be—\xFF\xFF\xFF\xFFBE prefix on a 7-byte buffer returns4 + 3 = 7, not4 + u32::MAX.test_bytes_consumed_pstring_clamps_oversized_prefix_le— same for LE; 9-byte buffer returns4 + 5 = 9.test_bytes_consumed_fixed_width_returns_zero_past_end— fixed-width type atoffset == buf.len(), past end, and atusize::MAXall return0.
Dual-Helper Sync Rule (Prevention)#
This vulnerability is an instance of a broader pattern: when read_X and X_bytes_consumed diverge, the engine silently mis-advances the anchor. Three incidents are now documented:
- PR #211 —
pstring_bytes_consumedmissing buffer clamp (this issue). - PR #233 original —
read_string_exactintroduced forTypeKind::String; the consume side needed matching updates. - PR #233 follow-on —
Value::Bytesarm added to the read path for backslash-escape patterns (e.g.,\177ELF) was missing frombytes_consumed_with_pattern, causing anchor mis-advance on NUL-free ELF headers.
Rule: Every new read_X function and every new pattern-dispatch arm in read_typed_value_with_pattern requires a matching arm in bytes_consumed_with_pattern that applies all the same bounds checks. Catch-all _ => arms in bytes_consumed_with_pattern compile without errors but silently fire the wrong behavior — GOTCHAS S2.1 notes that bytes_consumed is a critical match point where unhandled variable-width variants silently corrupt relative-offset anchors.
Key cross-references:
pstring_bytes_consumedimplementation —src/evaluator/types/mod.rs- Anchor advancement call site —
src/evaluator/engine/mod.rs - GOTCHAS S3.8 — clamping invariant and anchor model
- GOTCHAS S2.1 — exhaustive match update sites for
TypeKind - PR #211 — original fix and security review