Indirect Offset Advanced Syntax And Anchor-Relative Variants#
The full indirect offset syntax in libmagic-rs extends well beyond the basic (base.type) form. OffsetSpec::Indirect carries three fields that control the full surface: adjustment_op: IndirectAdjustmentOp, base_relative: bool, and result_relative: bool . These were added in v0.6.0 and use #[serde(default)] so older serialized AST snapshots deserialize cleanly .
This article covers the syntax surface and operator semantics. For the 4-step evaluation pipeline, see Indirect Offset Resolution Pipeline. For parser-evaluator reachability, see Indirect Offset Parser-Evaluator Sync. For spec-derived test expectations, see Indirect Offset GNU File Semantic Correctness.
Adjustment Placement: Two Mutually Exclusive Forms#
The adjustment operand can appear in exactly one of two positions per rule :
Form 1 — Inside the parens (canonical magic(5)):
(base.type+N), (base.type-N), (base.type*N), (base.type/N), (base.type%N), (base.type&N), (base.type|N), (base.type^N).
The full operator set — +, -, *, /, %, &, |, ^ — is valid here.
Form 2 — After the closing paren (legacy/alternate):
(base.type)+N, (base.type)-N.
Only + and - are accepted in this form. Use Form 1 for arithmetic beyond add/subtract.
Combining both forms in one rule (e.g., (19.b-1)+2) is not permitted. The parser enforces this: parse_inside_adjustment runs first; if it succeeds, parse_outside_adjustment is skipped. The chosen operator is stored on OffsetSpec::Indirect.adjustment_op; subtraction is folded into IndirectAdjustmentOp::Add with a negative operand .
IndirectAdjustmentOp Enum#
Defined in src/parser/ast.rs with #[derive(Default)] where Add is the #[default] variant:
| Variant | Magic syntax | Operand interpretation in apply_adjustment |
|---|---|---|
Add | +N / -N | i64 signed — subtraction via negative operand |
Mul | *N | reinterpreted as u64 bit pattern |
Div | /N | reinterpreted as u64; zero operand → EvaluationError::InvalidOffset |
Mod | %N | reinterpreted as u64; zero operand → EvaluationError::InvalidOffset |
And | &N | reinterpreted as u64 bit pattern |
Or | ` | N` |
Xor | ^N | reinterpreted as u64 bit pattern |
Add uses signed semantics so that (N.X-1) (encoded by the parser as Add(-1)) performs subtraction correctly. All other ops reinterpret the i64 adjustment as a u64 bit pattern to match libmagic's apprentice.c::do_offset raw-machine-word behavior .
i64::unsigned_abs() is used in the Add path to handle i64::MIN without overflow panic in debug mode . Mul also rejects integer overflow via checked_mul with EvaluationError::InvalidOffset .
Anchor-Relative Wrapper Variants#
Two boolean flags on OffsetSpec::Indirect encode GNU file's anchor-relative forms, where the anchor is EvaluationContext::last_match_end():
base_relative: true — (&N.X) syntax
The pointer-read address is anchor + base_offset. The base shifts to the anchor before the pointer is read .
result_relative: true — &(N.X) syntax
The pointer is read at base_offset (absolute). The read value is then added to the anchor to produce the final offset .
Composition: Both flags can be set simultaneously. With both true, the pointer is read at anchor + base_offset, and then the result is added to the anchor. The grammar therefore covers all combinations:
| Magic syntax | base_relative | result_relative |
|---|---|---|
(N.X) | false | false |
(&N.X) | true | false |
&(N.X) | false | true |
&(&N.X) | true | true |
Adjustment forms compose with anchor variants: (&N.X+adj), &(N.X)+adj, etc. are all valid .
The parser sets base_relative by detecting a leading & inside the parens . result_relative is set in parse_offset when a &( prefix is detected before calling parse_indirect_offset .
Pointer Specifier Table#
pointer_specifier_to_type() maps the single-character specifier after . to a (TypeKind, Endianness) pair. All types are signed by default per GNU file semantics .
| Specifier | Width | Endianness | Signed |
|---|---|---|---|
.b | 1 byte | Little | Yes |
.B | 1 byte | Big | Yes |
.s | 2 bytes | Little | Yes |
.S | 2 bytes | Big | Yes |
.l | 4 bytes | Little | Yes |
.L | 4 bytes | Big | Yes |
.q | 8 bytes | Little | Yes |
.Q | 8 bytes | Big | Yes |
A signed pointer value read as negative (e.g., [0xFF, 0xFF, 0xFF, 0xFF] as .l) is reinterpreted as a raw unsigned u64 via extract_raw_unsigned's *v as u64 cast — yielding u64::MAX. The bounds check (step 4 of the pipeline) catches the enormous value .
Test Discipline for Cross-Platform Byte Buffers#
Prefer big-endian specifiers (.L, .S, .Q) over native-byte-order specifiers (.l, .s, .q) when constructing test byte buffers. Big-endian layouts are deterministic across architectures; native-endian .l produces x86-specific byte sequences that break on big-endian targets .
Never use to_ne_bytes() in test fixtures for indirect offset buffers — use to_be_bytes() or explicit byte arrays .
Key Source Files#
| File | Purpose |
|---|---|
src/parser/ast.rs | IndirectAdjustmentOp enum (lines 118–193); OffsetSpec::Indirect fields (lines 240–275) |
src/parser/grammar/mod.rs | parse_indirect_offset(), pointer_specifier_to_type() |
src/evaluator/offset/indirect.rs | apply_adjustment(), resolve_indirect_offset_with_anchor() |
GOTCHAS.md | S3.7 (adjustment forms, specifier mapping), S6.3 (signed-by-default) |
AGENTS.md | Current Limitations / Offset Specifications |
tests/indirect_offset_integration.rs | End-to-end tests covering all specifiers and adjustment forms |