Printf-Style Format Substitution In Rule Messages#
Magic(5) rule messages frequently contain C-style format specifiers — at_offset %lld, followed_by 0x%02x, version %d.%d — that reference the rule's matched value. libmagic-rs resolves these at description-assembly time via format_magic_message in src/output/format.rs. Without this pass, the library would emit literal specifier tokens and diverge visibly from file(1) output.
format_magic_message is wired into MagicDatabase::build_result via concatenate_messages, which calls it once per RuleMatch before appending to the output string.
Supported Specifiers#
The supported conversion set covers the subset appearing in shipping magic corpora :
| Specifier | Conversion |
|---|---|
%d, %i | Signed decimal (i64) |
%u | Unsigned decimal (u64) |
%x, %X | Lowercase / uppercase hex (with bit-width masking) |
%o | Octal (with bit-width masking) |
%s | String; numbers rendered as decimal, bytes as lossy UTF-8 |
%c | Single character; bytes 0x80–0xFF as Latin-1 code points |
%% | Literal % |
Width and padding modifiers are honored: %05d, %-5d, %#06x. The + and flags are parsed for syntactic completeness but have no rendering effect.
Length modifiers (l, ll, h, hh, j, z, t) are consumed by the parser and silently ignored — all numeric rendering uses u64/i64 width regardless.
Width values are capped at MAX_FORMAT_WIDTH (4096) to prevent unbounded allocation from crafted magic rules.
Hex Masking via TypeKind::bit_width()#
%x, %X, and %o call coerce_to_u64_masked, which masks the coerced u64 to the natural bit width of the rule's TypeKind:
TypeKind | Mask applied |
|---|---|
Byte | 0xff |
Short | 0xffff |
Long / Float / Date | 0xffff_ffff |
Quad / Double / QDate / unknown | none |
This prevents a signed byte carrying -1 (stored as i64::MAX-equivalent after sign-extension) from rendering as ffffffffffffffff when the user expects ff, matching GNU file output. Regression tests test_byte_width_masking_on_negative_signed_byte and test_hex_width_masking_respects_16bit pin this behavior.
Composition with the \b Backspace Prefix#
concatenate_messages joins per-rule messages with a space separator, except when a rendered message starts with \u{0008} (backspace), in which case the backspace is stripped and no leading space is inserted.
Crucially, substitution runs before the backspace check. A rule with message \b, version %s correctly produces:
Ansible Vault text, version 1.1
rather than Ansible Vault text , version 1.1.
Unrecognized Specifier Pass-Through#
Any specifier not in the supported subset (e.g., %q) returns None from render, which causes format_magic_message to emit the literal specifier text and log a debug! message. A malformed magic file cannot crash description rendering.
A trailing bare % with nothing following also passes through literally.
Non-ASCII UTF-8 Slice Fix#
The function scans the template at the byte level but copies plain-text runs as string slices (&template[plain_start..i]) rather than byte-by-byte. This is safe because % is ASCII (0x25) and cannot appear as a UTF-8 continuation byte (always >= 0x80), so byte-level scanning never splits a multi-byte sequence.
The earlier byte-by-byte approach used out.push(b as char), which mapped bytes 0x80–0xFF to Latin-1 code points, corrupting a two-byte sequence like é (0xC3 0xA9) into two separate characters. The fix is exercised by the test_non_ascii_template_preserved test (covering café, → ok ←, über).
Testing#
All tests live in the #[cfg(test)] block of src/output/format.rs. Key cases:
test_hex_substitution_with_byte_width_masking— canonicalsearchbug.magiccase (0x%02xagainstubyte), alt-form prefix placement, zero-value alt-form suppression.test_byte_width_masking_on_negative_signed_byte/test_hex_width_masking_respects_16bit— regression guards for sign-extension masking.test_unknown_specifier_pass_through—%qliteral pass-through.test_width_cap_prevents_large_allocation—MAX_FORMAT_WIDTHsafety.
Tests that simulate build_result manually (bypassing load_from_file) must either run messages through format_magic_message or use %-free message strings; unescaped % in RuleMatch::message is interpreted as a specifier.
References#
- Implementation:
src/output/format.rs—format_magic_message,parse_spec,render,coerce_to_u64_masked - Call site:
src/lib.rs::concatenate_messages - Canonical reference: GOTCHAS.md S14.2
- Backspace convention: GOTCHAS.md S14.1
- Feature status: AGENTS.md "Currently Implemented"
TypeKind::bit_width():src/parser/ast.rs