Documents
Meta-Type Subroutine Dispatch Architecture
Meta-Type Subroutine Dispatch Architecture
Type
Topic
Status
Published
Created
Apr 25, 2026
Updated
Apr 25, 2026
Created by
Dosu Bot
Updated by
Dosu Bot

Meta-Type Subroutine Dispatch Architecture#

Overview#

The magic(5) format includes six meta-type directivesname, use, default, clear, indirect, and offset — that are control-flow rather than value-reading. They do not fit the standard evaluator pipeline (resolve offset → read typed value → apply operator → produce RuleMatch) because there is no buffer read and, in most cases, no value to compare.

These are represented as TypeKind::Meta(MetaType) in the AST , where MetaType carries:

VariantRole
Name(id)Declares a subroutine; hoisted out of the rule tree at load time
Use(id)Invokes a named subroutine; splices its matches inline
DefaultFires when no sibling at the same level has matched
ClearResets the per-level sibling-matched flag
IndirectRe-applies the entire root rule set at a resolved offset
OffsetEmits the resolved file position as Value::Uint for printf substitution

TypeKind::Meta(_) returns None from bit_width() because meta-types consume zero on-disk bytes.

Support was shipped in v0.5.0 via PR #230 (issue #42) . Key source files:

Three-Layer Pattern#

Every meta-type directive is handled by the same three-layer architecture. Apply this pattern when adding any new MetaType variant.

Layer 1 — Parse-time extraction, not runtime lookup#

extract_name_table partitions the top-level rule list at load time: Meta(Name(_)) rules are hoisted into a NameTable (a HashMap<String, Arc<[MagicRule]>>) and removed from ParsedMagic::rules. The evaluator's hot loop never sees a Name rule. Consequences:

  • Duplicate name declarations keep the first and emit warn! once per load, not once per buffer evaluation.
  • Nested Name rules (not well-defined in magic(5)) are scrubbed from the tree with a warn! during extraction.
  • Subroutine bodies are strength-sorted once at load time so use-site evaluation is deterministic regardless of source order.

Layer 2 — ParsedMagic as the parser return type#

parse_text_magic_file, load_magic_file, and load_magic_directory return Result<ParsedMagic, ParseError> instead of Result<Vec<MagicRule>, ParseError> . ParsedMagic carries:

pub struct ParsedMagic {
    pub rules: Vec<MagicRule>, // top-level rules; Name blocks removed
    pub(crate) name_table: NameTable, // extracted subroutine definitions
}

Directory loads merge per-file name tables with first-wins policy: earlier Magdir/ files shadow later ones, matching GNU file semantics, with shadowed definitions logged at warn!.

Layer 3 — Optional RuleEnvironment threaded through EvaluationContext#

RuleEnvironment carries the two whole-database concerns the evaluator needs:

pub(crate) struct RuleEnvironment {
    name_table: Arc<NameTable>,
    root_rules: Arc<[MagicRule]>,
}

EvaluationContext::rule_env: Option<Arc<RuleEnvironment>> . MagicDatabase::evaluate_file attaches the environment; programmatic consumers (evaluate_rules_with_config, property tests, fuzz harnesses) default to None. Use and Indirect rules become silent Ok(vec![]) no-ops when rule_env is None. Arc (not &) is required because the context outlives individual rule borrows.

EvaluationContext also carries base_offset: usize — per-frame state that biases positive OffsetSpec::Absolute(n) resolution inside a Use subroutine body so >N rules resolve relative to the use-site (magic(5) S3.10 semantics). base_offset is saved and restored via the SubroutineScope RAII guard.

parse_text_magic_file / load_magic_directory
  extract_name_table
  ┌──────┴──────────┐
  │ ParsedMagic │
  │ rules │ ──▶ MagicDatabase.rules
  │ name_table │ ──▶ Arc<NameTable> ─┐
  └─────────────────┘ │
                           root_rules (Arc) ──┤
                                    RuleEnvironment
                                    attached to EvaluationContext

Dispatch Decision Matrix#

Meta-types split into two dispatch sites inside the evaluator:

Dispatch siteVariantsReason
Inline in evaluate_rules loop bodyUse, Indirect, Default, ClearMutates the match vector or sibling_matched flag — state that spans multiple rule evaluations at the same level
Meta(_) wildcard arm in evaluate_single_rule_with_anchorName, OffsetSelf-contained: produces at most one RuleMatch from a single rule; does not alter sibling or match-vector state

Use dispatch (loop-level)#

Located at evaluate_rules lines 951–1010. On a cache hit in RuleEnvironment::name_table:

  1. SubroutineScope saves last_match_end and base_offset; seeds base_offset with the use-site offset.
  2. The subroutine's child rules are evaluated with the biased context.
  3. Subroutine matches are spliced into the caller's match vector in document order; the caller's anchor is re-advanced to the use-site offset so siblings see use as having consumed that position.
  4. The use rule's own children (continuation rules at deeper indentation) are then evaluated under a RecursionGuard — skipping them was an early bug that silently broke valid libmagic chains.
  5. SubroutineScope drops, restoring last_match_end and base_offset even when RecursionLimitExceeded propagates.

Mutual recursion (a use b; b use a) is caught by RecursionGuard::enter(context)? and surfaced as EvaluationError::RecursionLimitExceeded.

Indirect dispatch (loop-level)#

Located at lines 750–864. Uses AnchorScope (not SubroutineScope) because indirect must not restore last_match_end to the caller's value — it re-enters at a computed position and any matches produced belong to the new frame. RuleEnvironment::root_rules provides the full rule set for re-entry.

Default / Clear dispatch (loop-level)#

Clear (lines 704–707) resets a frame-local sibling_matched: bool variable inside evaluate_rules. Default (lines 714–740) fires only when !sibling_matched. This flag is explicitly not a field on EvaluationContext — its lifetime is exactly one recursion frame, and making it a context field would break re-entrant evaluation.

Offset dispatch (wildcard arm)#

Located at lines 878–936. Resolves the rule's offset, records a RuleMatch with Value::Uint(resolved_offset), and evaluates children. Printf substitution (%lld, %d) is performed later by format_magic_message in src/output/format.rs, not here.

Continuation-sibling anchor reset#

At recursion_depth > 0, each sibling's &N resolves against the parent-level entry anchor; at depth 0 the anchor chains per S3.8. This is controlled by the is_child_sibling_list gate inside evaluate_rules.

Rejected Alternatives#

Four alternatives were explicitly rejected when designing this architecture . They are load-bearing context for future MetaType work.

1. Runtime lookup in the hot loop. Scanning the full rule tree for Name declarations on every Use evaluation turns O(N) dispatch into amortized quadratic cost and moves duplicate-name detection from per-load to per-buffer. Rejected in favor of Layer 1 parse-time extraction.

2. Non-optional RuleEnvironment / required arg on evaluate_rules. Would force every property test, fuzz harness, and integration test to synthesize an empty environment. rule_env: Option<Arc<RuleEnvironment>> keeps the zero-environment path as the default. Tighten to required only if use ever needs to enforce "must have an environment" as a contract.

3. debug_assert! that Name rules never reach the evaluator. prop_arbitrary_rule_evaluation_never_panics synthesizes arbitrary TypeKind instances including Meta(Name(_)). A debug_assert! would break the never-panics invariant on test builds. The Meta(Name(_)) arm in evaluate_single_rule_with_anchor uses debug! logging instead (lines 242–252).

4. Dispatching Use through evaluate_single_rule_with_anchor. That helper returns Result<Option<(usize, Value)>, _> — one match, one value. Use produces a vector of child matches that must be spliced into the caller's match buffer in document order. Use must stay at the evaluate_rules level.

Adding a New MetaType Variant#

Follow this checklist for every new MetaType variant:

1. Decide the dispatch site. Ask: does this variant mutate the match vector or the sibling_matched flag, or reference state outside the single rule being evaluated?

  • Yes → inline loop-level dispatch inside evaluate_rules (like Use, Indirect, Default, Clear)
  • No → handle it in the Meta(_) wildcard arm of evaluate_single_rule_with_anchor (like Name, Offset)

2. Use RecursionGuard::enter(context)? (mod.rs lines 371–374) for any variant that recurses back into the evaluator. Manual increment/decrement is error-prone; the guard restores depth on all exit paths including errors.

3. Choose the right scope guard if the variant saves/restores context state:

  • SubroutineScope (lines 84–113) — saves and restores both last_match_end and base_offset (use for Use-style subroutine frames)
  • AnchorScope (lines 36–68) — saves and restores anchor and base_offset but does not restore last_match_end (use for Indirect-style re-entry)

4. Log leaked Name-style variants with debug!, not debug_assert!. Property tests synthesize arbitrary TypeKind values including new Meta(_) variants. A debug_assert! in the no-op arm breaks the never-panics invariant.

5. If the variant uses whole-database state (e.g., needs a rule list or lookup table), thread it through RuleEnvironment and make access Option-gated with a silent Ok(vec![]) no-op when rule_env is None.

6. Add a smoke test. The existing fixture tests/meta_types_integration.rs covers name/use, default/clear, and indirect. Add a synthetic or fixture-based test that verifies the new variant does not silently skip its own children. The searchbug.magic assertion (result.description.starts_with("Testfmt")) is the canonical guard for subroutine dispatch + continuation rules.

Related documents:

  • [docs/solutions/logic-errors/raii-scope-guards-for-evaluator-context-save-restore.md] — SubroutineScope implementation detail
  • [docs/solutions/integration-issues/indirect-offset-parser-evaluator-sync.md] — closest sibling pattern (indirect offset semantics)
  • [docs/solutions/integration-issues/implementing-variable-width-typekind-variant.md] — adding a TypeKind variant beyond fixed-shape read_typed_value
  • GOTCHAS S2.1, S2.11, S3, S3.8, S3.10, S13, S14.2