Meta-Type Subroutine Dispatch Architecture#
Overview#
The magic(5) format includes six meta-type directives — name, use, default, clear, indirect, and offset — that are control-flow rather than value-reading. They do not fit the standard evaluator pipeline (resolve offset → read typed value → apply operator → produce RuleMatch) because there is no buffer read and, in most cases, no value to compare.
These are represented as TypeKind::Meta(MetaType) in the AST , where MetaType carries:
| Variant | Role |
|---|---|
Name(id) | Declares a subroutine; hoisted out of the rule tree at load time |
Use(id) | Invokes a named subroutine; splices its matches inline |
Default | Fires when no sibling at the same level has matched |
Clear | Resets the per-level sibling-matched flag |
Indirect | Re-applies the entire root rule set at a resolved offset |
Offset | Emits the resolved file position as Value::Uint for printf substitution |
TypeKind::Meta(_) returns None from bit_width() because meta-types consume zero on-disk bytes.
Support was shipped in v0.5.0 via PR #230 (issue #42) . Key source files:
src/parser/name_table.rs— load-timeNamesubroutine extractionsrc/parser/mod.rs—ParsedMagicstruct and parser entry pointssrc/evaluator/mod.rs—RuleEnvironmentandEvaluationContextsrc/evaluator/engine/mod.rs— all meta-type dispatch arms and RAII scope guardstests/meta_types_integration.rs— integration smoke tests
Three-Layer Pattern#
Every meta-type directive is handled by the same three-layer architecture. Apply this pattern when adding any new MetaType variant.
Layer 1 — Parse-time extraction, not runtime lookup#
extract_name_table partitions the top-level rule list at load time: Meta(Name(_)) rules are hoisted into a NameTable (a HashMap<String, Arc<[MagicRule]>>) and removed from ParsedMagic::rules. The evaluator's hot loop never sees a Name rule. Consequences:
- Duplicate
namedeclarations keep the first and emitwarn!once per load, not once per buffer evaluation. - Nested
Namerules (not well-defined in magic(5)) are scrubbed from the tree with awarn!during extraction. - Subroutine bodies are strength-sorted once at load time so
use-site evaluation is deterministic regardless of source order.
Layer 2 — ParsedMagic as the parser return type#
parse_text_magic_file, load_magic_file, and load_magic_directory return Result<ParsedMagic, ParseError> instead of Result<Vec<MagicRule>, ParseError> . ParsedMagic carries:
pub struct ParsedMagic {
pub rules: Vec<MagicRule>, // top-level rules; Name blocks removed
pub(crate) name_table: NameTable, // extracted subroutine definitions
}
Directory loads merge per-file name tables with first-wins policy: earlier Magdir/ files shadow later ones, matching GNU file semantics, with shadowed definitions logged at warn!.
Layer 3 — Optional RuleEnvironment threaded through EvaluationContext#
RuleEnvironment carries the two whole-database concerns the evaluator needs:
pub(crate) struct RuleEnvironment {
name_table: Arc<NameTable>,
root_rules: Arc<[MagicRule]>,
}
EvaluationContext::rule_env: Option<Arc<RuleEnvironment>> . MagicDatabase::evaluate_file attaches the environment; programmatic consumers (evaluate_rules_with_config, property tests, fuzz harnesses) default to None. Use and Indirect rules become silent Ok(vec![]) no-ops when rule_env is None. Arc (not &) is required because the context outlives individual rule borrows.
EvaluationContext also carries base_offset: usize — per-frame state that biases positive OffsetSpec::Absolute(n) resolution inside a Use subroutine body so >N rules resolve relative to the use-site (magic(5) S3.10 semantics). base_offset is saved and restored via the SubroutineScope RAII guard.
parse_text_magic_file / load_magic_directory
│
▼
extract_name_table
│
┌──────┴──────────┐
│ ParsedMagic │
│ rules │ ──▶ MagicDatabase.rules
│ name_table │ ──▶ Arc<NameTable> ─┐
└─────────────────┘ │
root_rules (Arc) ──┤
▼
RuleEnvironment
attached to EvaluationContext
Dispatch Decision Matrix#
Meta-types split into two dispatch sites inside the evaluator:
| Dispatch site | Variants | Reason |
|---|---|---|
Inline in evaluate_rules loop body | Use, Indirect, Default, Clear | Mutates the match vector or sibling_matched flag — state that spans multiple rule evaluations at the same level |
Meta(_) wildcard arm in evaluate_single_rule_with_anchor | Name, Offset | Self-contained: produces at most one RuleMatch from a single rule; does not alter sibling or match-vector state |
Use dispatch (loop-level)#
Located at evaluate_rules lines 951–1010. On a cache hit in RuleEnvironment::name_table:
SubroutineScopesaveslast_match_endandbase_offset; seedsbase_offsetwith the use-site offset.- The subroutine's child rules are evaluated with the biased context.
- Subroutine matches are spliced into the caller's match vector in document order; the caller's anchor is re-advanced to the use-site offset so siblings see
useas having consumed that position. - The
userule's ownchildren(continuation rules at deeper indentation) are then evaluated under aRecursionGuard— skipping them was an early bug that silently broke valid libmagic chains. SubroutineScopedrops, restoringlast_match_endandbase_offseteven whenRecursionLimitExceededpropagates.
Mutual recursion (a use b; b use a) is caught by RecursionGuard::enter(context)? and surfaced as EvaluationError::RecursionLimitExceeded.
Indirect dispatch (loop-level)#
Located at lines 750–864. Uses AnchorScope (not SubroutineScope) because indirect must not restore last_match_end to the caller's value — it re-enters at a computed position and any matches produced belong to the new frame. RuleEnvironment::root_rules provides the full rule set for re-entry.
Default / Clear dispatch (loop-level)#
Clear (lines 704–707) resets a frame-local sibling_matched: bool variable inside evaluate_rules. Default (lines 714–740) fires only when !sibling_matched. This flag is explicitly not a field on EvaluationContext — its lifetime is exactly one recursion frame, and making it a context field would break re-entrant evaluation.
Offset dispatch (wildcard arm)#
Located at lines 878–936. Resolves the rule's offset, records a RuleMatch with Value::Uint(resolved_offset), and evaluates children. Printf substitution (%lld, %d) is performed later by format_magic_message in src/output/format.rs, not here.
Continuation-sibling anchor reset#
At recursion_depth > 0, each sibling's &N resolves against the parent-level entry anchor; at depth 0 the anchor chains per S3.8. This is controlled by the is_child_sibling_list gate inside evaluate_rules.
Rejected Alternatives#
Four alternatives were explicitly rejected when designing this architecture . They are load-bearing context for future MetaType work.
1. Runtime lookup in the hot loop. Scanning the full rule tree for Name declarations on every Use evaluation turns O(N) dispatch into amortized quadratic cost and moves duplicate-name detection from per-load to per-buffer. Rejected in favor of Layer 1 parse-time extraction.
2. Non-optional RuleEnvironment / required arg on evaluate_rules. Would force every property test, fuzz harness, and integration test to synthesize an empty environment. rule_env: Option<Arc<RuleEnvironment>> keeps the zero-environment path as the default. Tighten to required only if use ever needs to enforce "must have an environment" as a contract.
3. debug_assert! that Name rules never reach the evaluator. prop_arbitrary_rule_evaluation_never_panics synthesizes arbitrary TypeKind instances including Meta(Name(_)). A debug_assert! would break the never-panics invariant on test builds. The Meta(Name(_)) arm in evaluate_single_rule_with_anchor uses debug! logging instead (lines 242–252).
4. Dispatching Use through evaluate_single_rule_with_anchor. That helper returns Result<Option<(usize, Value)>, _> — one match, one value. Use produces a vector of child matches that must be spliced into the caller's match buffer in document order. Use must stay at the evaluate_rules level.
Adding a New MetaType Variant#
Follow this checklist for every new MetaType variant:
1. Decide the dispatch site. Ask: does this variant mutate the match vector or the sibling_matched flag, or reference state outside the single rule being evaluated?
- Yes → inline loop-level dispatch inside
evaluate_rules(likeUse,Indirect,Default,Clear) - No → handle it in the
Meta(_)wildcard arm ofevaluate_single_rule_with_anchor(likeName,Offset)
2. Use RecursionGuard::enter(context)? (mod.rs lines 371–374) for any variant that recurses back into the evaluator. Manual increment/decrement is error-prone; the guard restores depth on all exit paths including errors.
3. Choose the right scope guard if the variant saves/restores context state:
SubroutineScope(lines 84–113) — saves and restores bothlast_match_endandbase_offset(use forUse-style subroutine frames)AnchorScope(lines 36–68) — saves and restores anchor andbase_offsetbut does not restorelast_match_end(use forIndirect-style re-entry)
4. Log leaked Name-style variants with debug!, not debug_assert!. Property tests synthesize arbitrary TypeKind values including new Meta(_) variants. A debug_assert! in the no-op arm breaks the never-panics invariant.
5. If the variant uses whole-database state (e.g., needs a rule list or lookup table), thread it through RuleEnvironment and make access Option-gated with a silent Ok(vec![]) no-op when rule_env is None.
6. Add a smoke test. The existing fixture tests/meta_types_integration.rs covers name/use, default/clear, and indirect. Add a synthetic or fixture-based test that verifies the new variant does not silently skip its own children. The searchbug.magic assertion (result.description.starts_with("Testfmt")) is the canonical guard for subroutine dispatch + continuation rules.
Related documents:
- [
docs/solutions/logic-errors/raii-scope-guards-for-evaluator-context-save-restore.md] —SubroutineScopeimplementation detail - [
docs/solutions/integration-issues/indirect-offset-parser-evaluator-sync.md] — closest sibling pattern (indirectoffset semantics) - [
docs/solutions/integration-issues/implementing-variable-width-typekind-variant.md] — adding aTypeKindvariant beyond fixed-shaperead_typed_value - GOTCHAS S2.1, S2.11, S3, S3.8, S3.10, S13, S14.2