Documents
Printf-Style Format Substitution In Rule Messages
Printf-Style Format Substitution In Rule Messages
Type
Topic
Status
Published
Created
Apr 25, 2026
Updated
Apr 25, 2026
Created by
Dosu Bot
Updated by
Dosu Bot

Printf-Style Format Substitution In Rule Messages#

Magic(5) rule messages frequently contain C-style format specifiers — at_offset %lld, followed_by 0x%02x, version %d.%d — that reference the rule's matched value. libmagic-rs resolves these at description-assembly time via format_magic_message in src/output/format.rs. Without this pass, the library would emit literal specifier tokens and diverge visibly from file(1) output.

format_magic_message is wired into MagicDatabase::build_result via concatenate_messages, which calls it once per RuleMatch before appending to the output string.


Supported Specifiers#

The supported conversion set covers the subset appearing in shipping magic corpora :

SpecifierConversion
%d, %iSigned decimal (i64)
%uUnsigned decimal (u64)
%x, %XLowercase / uppercase hex (with bit-width masking)
%oOctal (with bit-width masking)
%sString; numbers rendered as decimal, bytes as lossy UTF-8
%cSingle character; bytes 0x80–0xFF as Latin-1 code points
%%Literal %

Width and padding modifiers are honored: %05d, %-5d, %#06x. The + and flags are parsed for syntactic completeness but have no rendering effect.

Length modifiers (l, ll, h, hh, j, z, t) are consumed by the parser and silently ignored — all numeric rendering uses u64/i64 width regardless.

Width values are capped at MAX_FORMAT_WIDTH (4096) to prevent unbounded allocation from crafted magic rules.


Hex Masking via TypeKind::bit_width()#

%x, %X, and %o call coerce_to_u64_masked, which masks the coerced u64 to the natural bit width of the rule's TypeKind:

TypeKindMask applied
Byte0xff
Short0xffff
Long / Float / Date0xffff_ffff
Quad / Double / QDate / unknownnone

This prevents a signed byte carrying -1 (stored as i64::MAX-equivalent after sign-extension) from rendering as ffffffffffffffff when the user expects ff, matching GNU file output. Regression tests test_byte_width_masking_on_negative_signed_byte and test_hex_width_masking_respects_16bit pin this behavior.


Composition with the \b Backspace Prefix#

concatenate_messages joins per-rule messages with a space separator, except when a rendered message starts with \u{0008} (backspace), in which case the backspace is stripped and no leading space is inserted.

Crucially, substitution runs before the backspace check. A rule with message \b, version %s correctly produces:

Ansible Vault text, version 1.1

rather than Ansible Vault text , version 1.1.


Unrecognized Specifier Pass-Through#

Any specifier not in the supported subset (e.g., %q) returns None from render, which causes format_magic_message to emit the literal specifier text and log a debug! message. A malformed magic file cannot crash description rendering.

A trailing bare % with nothing following also passes through literally.


Non-ASCII UTF-8 Slice Fix#

The function scans the template at the byte level but copies plain-text runs as string slices (&template[plain_start..i]) rather than byte-by-byte. This is safe because % is ASCII (0x25) and cannot appear as a UTF-8 continuation byte (always >= 0x80), so byte-level scanning never splits a multi-byte sequence.

The earlier byte-by-byte approach used out.push(b as char), which mapped bytes 0x80–0xFF to Latin-1 code points, corrupting a two-byte sequence like é (0xC3 0xA9) into two separate characters. The fix is exercised by the test_non_ascii_template_preserved test (covering café, → ok ←, über).


Testing#

All tests live in the #[cfg(test)] block of src/output/format.rs. Key cases:

  • test_hex_substitution_with_byte_width_masking — canonical searchbug.magic case (0x%02x against ubyte), alt-form prefix placement, zero-value alt-form suppression.
  • test_byte_width_masking_on_negative_signed_byte / test_hex_width_masking_respects_16bit — regression guards for sign-extension masking.
  • test_unknown_specifier_pass_through%q literal pass-through.
  • test_width_cap_prevents_large_allocationMAX_FORMAT_WIDTH safety.

Tests that simulate build_result manually (bypassing load_from_file) must either run messages through format_magic_message or use %-free message strings; unescaped % in RuleMatch::message is interpreted as a specifier.


References#

Printf-Style Format Substitution In Rule Messages | Dosu