Indirect Offset GNU File Semantic Correctness#
Overview#
The indirect offset parser in src/parser/grammar/mod.rs contained three semantic bugs in pointer_specifier_to_type and parse_indirect_offset. All three produced incorrect AST nodes while the code compiled cleanly and tests passed — a "tests match code but not spec" anti-pattern caused by writing test expectations from running the implementation rather than from the GNU file specification .
The bugs were tracked in issue #37 and resolved by aligning the parser to GNU file semantics. The primary lesson: any time a type mapping or grammar rule is added, test expectations must be derived from the spec and GOTCHAS.md before the implementation is written.
The Three Semantic Errors#
Bug 1 — Endianness mapping#
Lowercase specifiers (.b, .s, .l, .q) were mapped to Endianness::Native instead of Endianness::Little. GNU file defines lowercase = little-endian and uppercase = big-endian; uppercase specifiers were already correct .
The fix mapped all lowercase arms to Endianness::Little. The current pointer_specifier_to_type function shows the corrected mapping, codified in GOTCHAS S3.7.
Before/after for the .l arm — both endian and signed were wrong simultaneously:
// Before (wrong)
'l' => Some((TypeKind::Long { endian: Endianness::Native, signed: false }, Endianness::Native))
// After (correct)
'l' => Some((TypeKind::Long { endian: Endianness::Little, signed: true }, Endianness::Little))
Bug 2 — Signedness default#
Every specifier arm set signed: false. libmagic types are signed by default; unsigned variants use a u prefix (ubyte, ushort, etc.) (GOTCHAS S6.3). Pointer types in indirect offsets follow the same convention.
The fix changed all specifier arms to signed: true . The current implementation confirms all arms carry signed: true .
Bug 3 — Adjustment placement#
The original parser consumed the adjustment operand inside the parentheses, making (0x3c.l)+4 parse as an indirect offset with adjustment=0 and a leftover +4 that broke parse_magic_rule() .
The fix closed the paren first, then parsed the outside adjustment :
// Before (wrong): adjustment consumed inside parens
let (input, adjustment) = parse_adjustment(input)?;
let (input, _) = char(')')(input)?;
// After (correct): close paren first, then adjustment
let (input, _) = char(')')(input)?;
let (input, adjustment) = parse_adjustment(input)?;
Subsequent work expanded the grammar to accept both forms: canonical (base.type+N) inside the parens (with the full magic(5) operator set: +, -, *, /, %, &, |, ^) and the legacy (base.type)+N after the parens (only +/-). The two forms are mutually exclusive per rule . See the current parse_indirect_offset implementation for the full grammar.
Spec-First Test Discipline#
All three bugs shared the same root cause: test expectations were derived by running the (wrong) implementation, so tests confirmed bugs rather than catching them .
Rules to follow when adding or modifying type mappings:
-
Write expectations from the spec first. Derive expected values from the GNU
fileman page and GOTCHAS.md before writing any code. In TDD, the RED phase must use spec-sourced values, not implementation output. Document the spec section above each test case . -
Treat GOTCHAS.md as a mandatory checklist :
- S6.3 —
signed: trueby default; unsigned requiresuprefix. - S3.7 — lowercase specifiers = little-endian, uppercase = big-endian; full operator set valid inside parens, only
+/-outside. - S3.5 —
parse_numberdoes not handle+; consume it manually before callingparse_numberfor indirect offset adjustments.
- S6.3 —
-
Never use
Endianness::Nativein indirect offset resolution. Every endianness value must be explicitlyLittleorBig. Tests must use explicit byte sequences, notto_ne_bytes(). -
Extract test inputs from real magic files. Use
/usr/share/misc/magicor the upstream file/file repository rather than inventing syntax .
Diagnostic Symptoms and Cross-References#
Look for these symptoms when debugging similar parser bugs :
(0x3c.l)+4parses as indirect withadjustment=0and leftover+4— adjustment was consumed inside the parens.- Lowercase pointer specifiers (
.s,.l,.q) produceEndianness::Nativeinstead ofEndianness::Little. - Pointer types carry
signed: false, mismatching libmagic's signed-by-default convention.
Related documents:
| Topic | Source |
|---|---|
| Evaluator implementation for indirect offsets | docs/solutions/logic-errors/indirect-offset-resolution.md |
| Parser-evaluator reachability sync | docs/solutions/integration-issues/indirect-offset-parser-evaluator-sync.md |
| Magic format specification | docs/MAGIC_FORMAT.md (lines 106–126) |
| Canonical gotchas checklist | GOTCHAS.md S3.5, S3.6, S3.7, S6.3 |
| Current parser implementation | src/parser/grammar/mod.rs |
| Issue | #37 |