Documents
Indirect Offset GNU File Semantic Correctness
Indirect Offset GNU File Semantic Correctness
Type
Topic
Status
Published
Created
Apr 25, 2026
Updated
Apr 25, 2026
Created by
Dosu Bot
Updated by
Dosu Bot

Indirect Offset GNU File Semantic Correctness#

Overview#

The indirect offset parser in src/parser/grammar/mod.rs contained three semantic bugs in pointer_specifier_to_type and parse_indirect_offset. All three produced incorrect AST nodes while the code compiled cleanly and tests passed — a "tests match code but not spec" anti-pattern caused by writing test expectations from running the implementation rather than from the GNU file specification .

The bugs were tracked in issue #37 and resolved by aligning the parser to GNU file semantics. The primary lesson: any time a type mapping or grammar rule is added, test expectations must be derived from the spec and GOTCHAS.md before the implementation is written.

The Three Semantic Errors#

Bug 1 — Endianness mapping#

Lowercase specifiers (.b, .s, .l, .q) were mapped to Endianness::Native instead of Endianness::Little. GNU file defines lowercase = little-endian and uppercase = big-endian; uppercase specifiers were already correct .

The fix mapped all lowercase arms to Endianness::Little. The current pointer_specifier_to_type function shows the corrected mapping, codified in GOTCHAS S3.7.

Before/after for the .l arm — both endian and signed were wrong simultaneously:

// Before (wrong)
'l' => Some((TypeKind::Long { endian: Endianness::Native, signed: false }, Endianness::Native))

// After (correct)
'l' => Some((TypeKind::Long { endian: Endianness::Little, signed: true }, Endianness::Little))

Bug 2 — Signedness default#

Every specifier arm set signed: false. libmagic types are signed by default; unsigned variants use a u prefix (ubyte, ushort, etc.) (GOTCHAS S6.3). Pointer types in indirect offsets follow the same convention.

The fix changed all specifier arms to signed: true . The current implementation confirms all arms carry signed: true .

Bug 3 — Adjustment placement#

The original parser consumed the adjustment operand inside the parentheses, making (0x3c.l)+4 parse as an indirect offset with adjustment=0 and a leftover +4 that broke parse_magic_rule() .

The fix closed the paren first, then parsed the outside adjustment :

// Before (wrong): adjustment consumed inside parens
let (input, adjustment) = parse_adjustment(input)?;
let (input, _) = char(')')(input)?;

// After (correct): close paren first, then adjustment
let (input, _) = char(')')(input)?;
let (input, adjustment) = parse_adjustment(input)?;

Subsequent work expanded the grammar to accept both forms: canonical (base.type+N) inside the parens (with the full magic(5) operator set: +, -, *, /, %, &, |, ^) and the legacy (base.type)+N after the parens (only +/-). The two forms are mutually exclusive per rule . See the current parse_indirect_offset implementation for the full grammar.

Spec-First Test Discipline#

All three bugs shared the same root cause: test expectations were derived by running the (wrong) implementation, so tests confirmed bugs rather than catching them .

Rules to follow when adding or modifying type mappings:

  1. Write expectations from the spec first. Derive expected values from the GNU file man page and GOTCHAS.md before writing any code. In TDD, the RED phase must use spec-sourced values, not implementation output. Document the spec section above each test case .

  2. Treat GOTCHAS.md as a mandatory checklist :

    • S6.3signed: true by default; unsigned requires u prefix.
    • S3.7 — lowercase specifiers = little-endian, uppercase = big-endian; full operator set valid inside parens, only +/- outside.
    • S3.5parse_number does not handle +; consume it manually before calling parse_number for indirect offset adjustments.
  3. Never use Endianness::Native in indirect offset resolution. Every endianness value must be explicitly Little or Big. Tests must use explicit byte sequences, not to_ne_bytes() .

  4. Extract test inputs from real magic files. Use /usr/share/misc/magic or the upstream file/file repository rather than inventing syntax .

Diagnostic Symptoms and Cross-References#

Look for these symptoms when debugging similar parser bugs :

  • (0x3c.l)+4 parses as indirect with adjustment=0 and leftover +4 — adjustment was consumed inside the parens.
  • Lowercase pointer specifiers (.s, .l, .q) produce Endianness::Native instead of Endianness::Little.
  • Pointer types carry signed: false, mismatching libmagic's signed-by-default convention.

Related documents:

TopicSource
Evaluator implementation for indirect offsetsdocs/solutions/logic-errors/indirect-offset-resolution.md
Parser-evaluator reachability syncdocs/solutions/integration-issues/indirect-offset-parser-evaluator-sync.md
Magic format specificationdocs/MAGIC_FORMAT.md (lines 106–126)
Canonical gotchas checklistGOTCHAS.md S3.5, S3.6, S3.7, S6.3
Current parser implementationsrc/parser/grammar/mod.rs
Issue#37
Indirect Offset GNU File Semantic Correctness | Dosu