Indirect Offset Parser-Evaluator Sync#
The core lesson: a feature can be fully implemented on one side of the parser-evaluator split and remain completely unreachable through the public API. In issue #37 (branch 37-evaluator-implement-indirect-offset-resolution), resolve_indirect_offset() in src/evaluator/offset/indirect.rs was complete and covered by 35 unit tests. But parse_offset() in src/parser/grammar/mod.rs had no branch for (-prefixed syntax — any indirect offset like (0x3c.l) was rejected at parse time. The evaluator code was effectively dead from the MagicDatabase::load_from_file() path.
The fix added parse_indirect_offset() and updated parse_offset() to branch on a leading ( . No changes were needed in parse_rule_offset() — it delegates to parse_offset(), so hierarchical forms like >(0x3c.l) work automatically once parse_offset() is fixed.
Parser-Evaluator Parity Checklist#
When adding any new OffsetSpec or TypeKind variant, verify all five of the following before considering the feature done:
| # | What to check | Where |
|---|---|---|
| 1 | Parser produces it — unit test parses raw syntax and asserts correct AST node | src/parser/grammar/tests/ |
| 2 | Evaluator consumes it — unit test constructs AST node and asserts evaluation result | src/evaluator/offset/ or src/evaluator/engine.rs |
| 3 | End-to-end through MagicDatabase — integration test loads a .magic file and evaluates a buffer | tests/indirect_offset_integration.rs |
| 4 | Codegen handles it — if the variant can appear in built-in rules, update src/parser/codegen.rs | src/parser/codegen.rs |
| 5 | Strength calculation covers it — if scoring changes, update src/evaluator/strength.rs | src/evaluator/strength.rs |
The issue with indirect offsets was a failure at step 1: the parser test was missing, and without a step-3 end-to-end test the gap was invisible — the evaluator unit tests passed, but MagicDatabase could never reach the evaluator code at all.
Integration Test Template#
Every parser-evaluator-sync regression must have an end-to-end test through MagicDatabase. The canonical shape lives in tests/indirect_offset_integration.rs:
#[test]
fn test_indirect_offset_pe_detection_via_magic_file() {
let temp_dir = TempDir::new().unwrap();
let magic_path = temp_dir.path().join("pe.magic");
let mut f = fs::File::create(&magic_path).unwrap();
writeln!(f, r#"0 string "MZ" DOS executable"#).unwrap();
writeln!(f, r#">(0x3c.l) string "PE" (PE)"#).unwrap();
let db = MagicDatabase::load_from_file(&magic_path).unwrap();
let buf = build_pe_like_buffer();
let result = db.evaluate_buffer(&buf).unwrap();
assert!(result.description.contains("DOS executable"));
assert!(result.description.contains("(PE)"));
}
Key points:
TempDir::new()isolates the test; the directory is cleaned up on drop.- Load via
MagicDatabase::load_from_file()— this exercises the full parse → AST → evaluate pipeline, unlike unit tests that construct AST nodes directly. - Assert on
result.description. Both the parent rule match and the child indirect-offset match are visible here. - Use uppercase specifiers (
.L= big-endian long) when you control the test buffer. Big-endian byte layout is portable across architectures; native-endian.lmakes test buffers x86-specific and breaks on big-endian targets.
Pointer Specifier Mapping#
pointer_specifier_to_type() maps the single-char specifier after the . in (base.X) to a (TypeKind, Endianness) pair. GNU file semantics: lowercase = little-endian, uppercase = big-endian. All pointer types are signed .
| Specifier | TypeKind | Endianness |
|---|---|---|
b | Byte { signed: true } | Little |
B | Byte { signed: true } | Big |
s | Short { ..., signed: true } | Little |
S | Short { ..., signed: true } | Big |
l | Long { ..., signed: true } | Little |
L | Long { ..., signed: true } | Big |
q | Quad { ..., signed: true } | Little |
Q | Quad { ..., signed: true } | Big |
i/I | Long (treated as 32-bit) | Little/Big |
i/I are magic(5) ID3 variable-byte integer specifiers; they parse successfully but evaluate as plain 32-bit longs. Real 7-bit-per-byte decoding is a tracked follow-up gap .
Note: The initial implementation used incorrect endianness for lowercase specifiers. The GNU-correct mapping (lowercase = little-endian) documented here reflects what is in the current codebase. See
docs/solutions/logic-errors/indirect-offset-gnu-file-semantics.mdfor the full correctness history.
Parser Gotchas Specific to Indirect Offset Syntax#
parse_number does not handle + (GOTCHAS S3.5)#
parse_number handles a leading - internally via opt(char('-')). It does not handle +. For +N adjustments — both inside-paren (base.type+N) and the legacy outside-paren form (base.type)+N — the + must be consumed manually with strip_prefix('+') before calling parse_number .
Do not modify parse_number globally to accept +. It is shared by offset parsing, value parsing, and strength modifier parsing; adding + handling would silently change behavior for all three call sites.
The same pattern applies in parse_offset itself for the &+N relative-offset form .
parse_value does not accept bare strings (GOTCHAS S3.6)#
parse_value requires string values to be quoted: 0 string "MZ", not 0 string MZ. The parse_magic_rule function later added a parse_bare_string_value fallback for string-family types, but direct callers of parse_value still require quoting. Unquoted string values in test magic files produce a parse error at load time — the rule silently fails to load, not an evaluation failure.
Key Source Files#
| File | Role |
|---|---|
src/parser/grammar/mod.rs | parse_indirect_offset(), pointer_specifier_to_type(), parse_offset() dispatcher |
src/evaluator/offset/indirect.rs | resolve_indirect_offset_with_anchor() — pointer read, adjustment, bounds check |
tests/indirect_offset_integration.rs | End-to-end regression tests for all pointer specifiers and adjustment forms |
src/parser/grammar/tests/indirect_offset.rs | Unit tests for grammar-level indirect offset parsing |
Related learning docs:
docs/solutions/logic-errors/indirect-offset-resolution.md— evaluator-side implementation detaildocs/solutions/logic-errors/indirect-offset-gnu-file-semantics.md— the correct endianness/signedness mapping that superseded the initial implementationdocs/MAGIC_FORMAT.mdlines 106–126 — magic(5) indirect offset syntax reference- GitHub issue #37 — original tracking issue
- PR #199 — implementation PR adding parser support