Documents
Indirect Offset Parser-Evaluator Sync
Indirect Offset Parser-Evaluator Sync
Type
Topic
Status
Published
Created
Apr 25, 2026
Updated
Apr 25, 2026
Created by
Dosu Bot
Updated by
Dosu Bot

Indirect Offset Parser-Evaluator Sync#

The core lesson: a feature can be fully implemented on one side of the parser-evaluator split and remain completely unreachable through the public API. In issue #37 (branch 37-evaluator-implement-indirect-offset-resolution), resolve_indirect_offset() in src/evaluator/offset/indirect.rs was complete and covered by 35 unit tests. But parse_offset() in src/parser/grammar/mod.rs had no branch for (-prefixed syntax — any indirect offset like (0x3c.l) was rejected at parse time. The evaluator code was effectively dead from the MagicDatabase::load_from_file() path.

The fix added parse_indirect_offset() and updated parse_offset() to branch on a leading ( . No changes were needed in parse_rule_offset() — it delegates to parse_offset(), so hierarchical forms like >(0x3c.l) work automatically once parse_offset() is fixed.

Parser-Evaluator Parity Checklist#

When adding any new OffsetSpec or TypeKind variant, verify all five of the following before considering the feature done:

#What to checkWhere
1Parser produces it — unit test parses raw syntax and asserts correct AST nodesrc/parser/grammar/tests/
2Evaluator consumes it — unit test constructs AST node and asserts evaluation resultsrc/evaluator/offset/ or src/evaluator/engine.rs
3End-to-end through MagicDatabase — integration test loads a .magic file and evaluates a buffertests/indirect_offset_integration.rs
4Codegen handles it — if the variant can appear in built-in rules, update src/parser/codegen.rssrc/parser/codegen.rs
5Strength calculation covers it — if scoring changes, update src/evaluator/strength.rssrc/evaluator/strength.rs

The issue with indirect offsets was a failure at step 1: the parser test was missing, and without a step-3 end-to-end test the gap was invisible — the evaluator unit tests passed, but MagicDatabase could never reach the evaluator code at all.

Integration Test Template#

Every parser-evaluator-sync regression must have an end-to-end test through MagicDatabase. The canonical shape lives in tests/indirect_offset_integration.rs:

#[test]
fn test_indirect_offset_pe_detection_via_magic_file() {
    let temp_dir = TempDir::new().unwrap();
    let magic_path = temp_dir.path().join("pe.magic");

    let mut f = fs::File::create(&magic_path).unwrap();
    writeln!(f, r#"0 string "MZ" DOS executable"#).unwrap();
    writeln!(f, r#">(0x3c.l) string "PE" (PE)"#).unwrap();

    let db = MagicDatabase::load_from_file(&magic_path).unwrap();
    let buf = build_pe_like_buffer();
    let result = db.evaluate_buffer(&buf).unwrap();

    assert!(result.description.contains("DOS executable"));
    assert!(result.description.contains("(PE)"));
}

Key points:

  • TempDir::new() isolates the test; the directory is cleaned up on drop.
  • Load via MagicDatabase::load_from_file() — this exercises the full parse → AST → evaluate pipeline, unlike unit tests that construct AST nodes directly.
  • Assert on result.description. Both the parent rule match and the child indirect-offset match are visible here.
  • Use uppercase specifiers (.L = big-endian long) when you control the test buffer. Big-endian byte layout is portable across architectures; native-endian .l makes test buffers x86-specific and breaks on big-endian targets.

Pointer Specifier Mapping#

pointer_specifier_to_type() maps the single-char specifier after the . in (base.X) to a (TypeKind, Endianness) pair. GNU file semantics: lowercase = little-endian, uppercase = big-endian. All pointer types are signed .

SpecifierTypeKindEndianness
bByte { signed: true }Little
BByte { signed: true }Big
sShort { ..., signed: true }Little
SShort { ..., signed: true }Big
lLong { ..., signed: true }Little
LLong { ..., signed: true }Big
qQuad { ..., signed: true }Little
QQuad { ..., signed: true }Big
i/ILong (treated as 32-bit)Little/Big

i/I are magic(5) ID3 variable-byte integer specifiers; they parse successfully but evaluate as plain 32-bit longs. Real 7-bit-per-byte decoding is a tracked follow-up gap .

Note: The initial implementation used incorrect endianness for lowercase specifiers. The GNU-correct mapping (lowercase = little-endian) documented here reflects what is in the current codebase. See docs/solutions/logic-errors/indirect-offset-gnu-file-semantics.md for the full correctness history.

Parser Gotchas Specific to Indirect Offset Syntax#

parse_number does not handle + (GOTCHAS S3.5)#

parse_number handles a leading - internally via opt(char('-')). It does not handle +. For +N adjustments — both inside-paren (base.type+N) and the legacy outside-paren form (base.type)+N — the + must be consumed manually with strip_prefix('+') before calling parse_number .

Do not modify parse_number globally to accept +. It is shared by offset parsing, value parsing, and strength modifier parsing; adding + handling would silently change behavior for all three call sites.

The same pattern applies in parse_offset itself for the &+N relative-offset form .

parse_value does not accept bare strings (GOTCHAS S3.6)#

parse_value requires string values to be quoted: 0 string "MZ", not 0 string MZ. The parse_magic_rule function later added a parse_bare_string_value fallback for string-family types, but direct callers of parse_value still require quoting. Unquoted string values in test magic files produce a parse error at load time — the rule silently fails to load, not an evaluation failure.

Key Source Files#

FileRole
src/parser/grammar/mod.rsparse_indirect_offset(), pointer_specifier_to_type(), parse_offset() dispatcher
src/evaluator/offset/indirect.rsresolve_indirect_offset_with_anchor() — pointer read, adjustment, bounds check
tests/indirect_offset_integration.rsEnd-to-end regression tests for all pointer specifiers and adjustment forms
src/parser/grammar/tests/indirect_offset.rsUnit tests for grammar-level indirect offset parsing

Related learning docs:

  • docs/solutions/logic-errors/indirect-offset-resolution.md — evaluator-side implementation detail
  • docs/solutions/logic-errors/indirect-offset-gnu-file-semantics.md — the correct endianness/signedness mapping that superseded the initial implementation
  • docs/MAGIC_FORMAT.md lines 106–126 — magic(5) indirect offset syntax reference
  • GitHub issue #37 — original tracking issue
  • PR #199 — implementation PR adding parser support
Indirect Offset Parser-Evaluator Sync | Dosu