Type System And Operator Coverage#

Lead Section#

The Type System And Operator Coverage in libmagic-rs represents the inventory of data types and comparison operators implemented for detecting file formats through magic rule evaluation. As a pure-Rust replacement for the C libmagic library, libmagic-rs follows a phased implementation strategy toward achieving 95%+ compatibility with GNU file by version 1.0.0.

The v0.1.x release provided four basic data types (Byte, Short, Long, and String) with comprehensive endianness and signedness support, alongside four operators including equality, inequality, and bitwise operations. The v0.2.0 release added four comparison operators (<, >, <=, >=) and changed Byte from a unit variant to a signed/unsigned variant. Version 0.3.0 added 64-bit integer support with the Quad type supporting all endianness and signedness variants. Version 0.4.0 added three additional operators (BitwiseXor, BitwiseNot, and AnyValue) bringing the total to 11 implemented operators. The v0.1.0 release added floating-point types (Float and Double with endian variants) to expand format detection capabilities. Version 0.5.0+ added date and timestamp types (Date and QDate with UTC/local time options) for parsing Unix timestamps and pattern types (Regex and Search) for text-based format detection through pattern matching. These primitives enable detection of common file formats including executables, archives, images, text documents, and time-stamped data. All six meta-type directives (default, clear, name, use, indirect, offset) are fully implemented for control-flow subroutines and dynamic reporting.

This article provides a comprehensive technical reference for the type system and operator implementation, tracking both current capabilities and planned enhancements across the version roadmap to full libmagic compatibility.

Current Type System Implementation#

Type Architecture#

The type system in libmagic-rs is implemented across two primary modules: the AST representation in src/parser/ast.rs defines the TypeKind enum for structural representation, while src/evaluator/types.rs provides the runtime implementation for reading typed values from binary data.

Implemented Types#

Byte Type#

Size: 8 bits (1 byte)
Signedness: Explicitly specified with signed: bool field in TypeKind::Byte { signed: bool } (changed from unit variant in v0.2.0)
Endianness: N/A (single byte)
Return Type: Value::Uint(u64) for unsigned, Value::Int(i64) for signed
Implementation: read_byte function (takes 3 parameters as of v0.2.0)

Short Type#

Size: 16 bits (2 bytes)
Signedness: Supports both signed and unsigned via signed: bool parameter
Endianness: Little, Big, and Native variants
Return Type: Value::Uint(u64) for unsigned, Value::Int(i64) for signed
Implementation: read_short function

Long Type#

Size: 32 bits (4 bytes)
Signedness: Supports both signed and unsigned via signed: bool parameter
Endianness: Little, Big, and Native variants
Return Type: Value::Uint(u64) for unsigned, Value::Int(i64) for signed
Implementation: read_long function

Quad Type#

Size: 64 bits (8 bytes)
Signedness: Supports both signed and unsigned via signed: bool parameter
Endianness: Little, Big, and Native variants
Return Type: Value::Uint(u64) for unsigned, Value::Int(i64) for signed
Implementation: read_quad function

Float Type#

Size: 32 bits (4 bytes)
Standard: IEEE 754 single-precision floating-point
Endianness: Little, Big, and Native variants
Return Type: Value::Float(f64) (widened from f32)
Implementation: read_float function

Double Type#

Size: 64 bits (8 bytes)
Standard: IEEE 754 double-precision floating-point
Endianness: Little, Big, and Native variants
Return Type: Value::Float(f64)
Implementation: read_double function

Date Type (32-bit)#

Size: 32 bits (4 bytes)
Standard: Unix timestamp (seconds since epoch)
Endianness: Little, Big, and Native variants
Time Zone: UTC or local time via utc: bool parameter
Return Type: Value::String(String) formatted as "%a %b %e %H:%M:%S %Y" (matching GNU file output)
Implementation: read_date function
Strength Score: 15 points
Version: v0.5.0+

QDate Type (64-bit)#

Size: 64 bits (8 bytes)
Standard: Unix timestamp (seconds since epoch)
Endianness: Little, Big, and Native variants
Time Zone: UTC or local time via utc: bool parameter
Return Type: Value::String(String) formatted as "%a %b %e %H:%M:%S %Y" (matching GNU file output)
Implementation: read_qdate function
Strength Score: 16 points
Version: v0.5.0+

String Type#

Size: Variable length
Parameters: max_length: Option<usize> controls maximum bytes to read; flags: StringFlags contains modifier flags
Flags: The StringFlags field carries eight modifier flags parsed from the /[cCwWtTbf] suffix on a string rule:
- /c (ignore_lowercase) -- case-insensitive comparison when the pattern character is lowercase (asymmetric: only lowercase pattern chars trigger fold; uppercase chars are literal)
- /C (ignore_uppercase) -- case-insensitive comparison when the pattern character is uppercase (only uppercase pattern chars trigger fold)
- /w (compact_optional_whitespace) -- pattern whitespace matches zero or more file whitespace bytes
- /W (compact_whitespace) -- pattern whitespace requires at least one file whitespace byte in the file; additional whitespace is consumed greedily
- /T (trim) -- leading and trailing ASCII whitespace stripped from the pattern before comparison
- /f (full_word) -- post-match word-boundary check (next byte must be EOF or non-word character)
- /t (text_test) -- hint that this rule applies to text files (captured for MIME-output integration; does not currently alter comparison)
- /b (bin_test) -- hint that this rule applies to binary files (captured for MIME-output integration; does not currently alter comparison)
Behavior: Default flags (StringFlags::is_empty()) take the byte-exact fast path through apply_equal. Non-default flags route through the pattern-bearing-type contract (similar to Regex/Search) via compare_string_with_flags, which performs flag-based comparison semantics (case-insensitive, whitespace-flexible, full-word boundary check). The consumed-bytes count returned by compare_string_with_flags drives the relative-offset anchor for child rules, which is load-bearing when /w or /W allow the file to consume more bytes than pattern.len(). Note: StringFlags::is_empty() takes self by value since StringFlags is Copy.
UTF-8 Handling: Invalid UTF-8 sequences replaced with replacement character (U+FFFD)
Return Type: Value::String(String) for unflagged matches; Value::Bytes(Vec<u8>) for flagged matches (/c, /C, /w, /W, /T, /f, /b, /t). Flagged matches use Bytes to preserve byte-exact libmagic semantics and prevent UTF-8 replacement that would break %s substitution.
Implementation: read_string function, compare_string_with_flags function for flag-based matching
Version: v0.1.x (basic string support); flag semantics fully implemented in PR #288 (issue #234)

String16 Type (UCS-2)#

Description: UCS-2 (16-bit Unicode) string with explicit byte order
Size: Variable length (16 bits per code unit)
Endianness: Little (lestring16) and Big (bestring16) variants
Behavior: Reads 16-bit code units until U+0000 terminator or buffer end; surrogate pairs (D800-DFFF) replaced with U+FFFD
Return Type: Value::String(String)
Implementation: read_string16 function
Version: v0.5.0+

PString Type#

Description: Pascal-style length-prefixed strings where the first byte indicates the string length, followed by that many bytes of string data
Size: Variable (1 byte for length prefix + actual string bytes)
Parameters: max_length: Option<usize> caps the length byte value
Behavior: Reads length byte (0-255), then reads that many bytes as string data; not null-terminated
Bounds Checking: Validates both the length byte is readable and that the full string data is within bounds
UTF-8 Handling: Invalid UTF-8 sequences replaced with replacement character (U+FFFD)
Return Type: Value::String(String)
Implementation: read_pstring function
Comparison Operators: Supports all string comparison operators (=, !=, <, >, <=, >=)

Regex Type#

Description: Binary-safe regular expression matching using regex::bytes::Regex
Parameters: count: RegexCount specifies scan window (measured in bytes or lines); flags: RegexFlags contains modifier flags
Flags: Supports /c (case-insensitive via (?i) prefix), /s (match-start anchor advance instead of match-end), /l (line-count mode where window is measured in lines, not bytes)
Behavior: Multi-line matching always enabled (matching libmagic REG_NEWLINE); scan window capped at 8192 bytes (FILE_REGEX_MAX)
Return Type: Value::String(String) (matched text) or Value::Bytes(Vec<u8>) for binary patterns
Implementation: read_regex function
Version: v0.5.0+

Search Type#

Description: Bounded literal pattern search for fixed byte sequences
Size: Variable length
Parameters: range: NonZeroUsize specifies mandatory scan window (measured in bytes from the rule offset); flags: SearchFlags contains modifier flags parsed from the /[sCcWwTtBbf] suffix
Flags: Nine modifier flags split across two semantic categories:
- Anchor-only flags (preserve SIMD fast path):
  - /s (start_anchor) — anchor lands at match-START instead of match-END (required for TGA footer, sfnt name table)
  - /t (text_test) — MIME-output hint for text files (captured; comparison-time effect deferred to !:mime)
  - /b (bin_test) — MIME-output hint for binary files (captured; comparison-time effect deferred to !:mime)
- Comparison-altering flags (trigger byte-walk slow path):
  - /c (ignore_lowercase) — asymmetric case-insensitive: lowercase pattern chars match either case in buffer; uppercase pattern chars literal
  - /C (ignore_uppercase) — asymmetric case-insensitive: uppercase pattern chars match either case in buffer; lowercase pattern chars literal
  - /w (compact_optional_whitespace) — pattern whitespace matches zero or more buffer whitespace bytes
  - /W (compact_whitespace) — pattern whitespace requires at least one buffer whitespace byte, then consumes greedily
  - /T (trim) — trim leading/trailing ASCII whitespace from pattern before comparison
  - /f (full_word) — post-match word-boundary check (STRING_FULL_WORD)
Behavior: Fast path uses memchr::memmem::find for byte-exact matches when only anchor-only flags are set. Comparison-altering flags route through compare_string_with_flags for byte-by-byte walk with flag semantics. Under /T trim, the effective pattern length drives both anchor advance and comparison; empty-after-trim patterns short-circuit to None with warn! to avoid over-matching. Under /w//W, matched_len can exceed pattern.len() due to greedy whitespace absorption. Anchor advance is clamped against remaining buffer length to defend against adversarial pattern math.
Anchor Semantics: Default behavior lands at match-END (byte past matched pattern). When /s is set, anchor lands at match-START. Mirrors libmagic FILE_SEARCH / moffset logic.
Flag Relation: SearchFlags parallels StringFlags structurally — eight fields shared, plus search-only start_anchor. See SearchFlags::to_string_flags() for comparator handoff.
Return Type: Value::String(String)
Implementation: read_search function, compare_string_with_flags function for flag-based matching
Version: v0.5.0+ (basic search); flag semantics fully implemented in PR #297 (issue #235)

Meta-Type Directives#

default: Fires when no sibling at the same indentation level has matched; serves as catch-all fallback
clear: Resets the sibling-matched flag to allow subsequent default rules to fire
name: Declares a named subroutine that can be invoked via use
use: Invokes a named subroutine; expands inline with callee's matches appearing before continuation rules
indirect: Re-evaluates the root rule list at the resolved offset
offset: Reports the resolved file offset as Value::Uint(position) rather than reading typed data; message templates can reference the position via printf-style specifiers (%lld, %d)
Version: v0.5.0+
Implementation: src/evaluator/engine/mod.rs dispatch arms; src/parser/name_table.rs for name/use subroutine table
Semantics: Control-flow directives modify rule traversal; offset is value-bearing but reads no buffer bytes

Endianness Support#

The Endianness enum provides three byte order options:

Little: Least significant byte first (common in x86/x64 architectures)
Big: Most significant byte first (common in network protocols)
Native: System-dependent byte order matching target architecture

All multi-byte types (Short, Long, Quad, Float, Double, Date, and QDate) support all three endianness variants, enabling detection of files from different architectures and platforms.

Current Operator Implementation#

Operator Architecture#

The operator system consists of eleven distinct operators defined in the AST, with runtime evaluation provided by dedicated functions in src/evaluator/operators.rs.

Implemented Operators#

Equal Operator (=)#

Semantics: Tests exact equality between left and right operands
Cross-Type Coercion: Implements intelligent coercion between Uint and Int by converting both to i128 to prevent overflow
Float Equality: Uses epsilon-aware comparison (|a - b| <= f64::EPSILON) for Value::Float; NaN comparisons always return false
Type Compatibility: Direct comparison for same types; cross-integer comparison via i128; false for mismatched non-integer types
Implementation: apply_equal function
Strength Score: +10 points (most specific)

NotEqual Operator (!=)#

Semantics: Logical negation of Equal operator
Syntax: !, !=, <> -- all three forms map to NotEqual
Implementation: Returns !apply_equal(left, right)
Strength Score: +5 points

LessThan Operator (<)#

Semantics: Returns true if left operand is less than right operand
Type Compatibility: Works with integers, floats, strings, and bytes; uses i128 coercion for cross-integer comparisons
Float Comparison: Uses partial_cmp with IEEE 754 semantics; NaN comparisons return None (false)
Implementation: apply_less_than function
Strength Score: +6 points
Version: v0.2.0

GreaterThan Operator (>)#

Semantics: Returns true if left operand is greater than right operand
Type Compatibility: Works with integers, floats, strings, and bytes; uses i128 coercion for cross-integer comparisons
Float Comparison: Uses partial_cmp with IEEE 754 semantics; NaN comparisons return None (false)
Implementation: apply_greater_than function
Strength Score: +6 points
Version: v0.2.0

LessEqual Operator (<=)#

Semantics: Returns true if left operand is less than or equal to right operand
Type Compatibility: Works with integers, floats, strings, and bytes; uses i128 coercion for cross-integer comparisons
Float Comparison: Uses partial_cmp with IEEE 754 semantics; NaN comparisons return None (false)
Implementation: apply_less_equal function
Strength Score: +6 points
Version: v0.2.0

GreaterEqual Operator (>=)#

Semantics: Returns true if left operand is greater than or equal to right operand
Type Compatibility: Works with integers, floats, strings, and bytes; uses i128 coercion for cross-integer comparisons
Float Comparison: Uses partial_cmp with IEEE 754 semantics; NaN comparisons return None (false)
Implementation: apply_greater_equal function
Strength Score: +6 points
Version: v0.2.0

BitwiseAnd Operator (&)#

Semantics: Returns true if (left & right) != 0
Use Case: Checking if specific bits are set in flags or bitmasks
Integer Support: Works with Uint, Int, and mixed integer types
Signed Handling: Signed integers cast to u64 for bitwise operations
Implementation: apply_bitwise_and function
Strength Score: +3 points (least specific)

BitwiseAndMask Operator (&mask=value)#

Semantics: Applies bitmask to left operand, then performs equality comparison with right operand
Syntax: &0xNN=value in magic file format
Use Case: Checking specific bit patterns while ignoring other bits
Implementation: Masks left value, then calls apply_equal
Strength Score: +7 points (moderately specific)

BitwiseXor Operator (^)#

Semantics: Performs bitwise XOR between two integer values; returns true if result is non-zero
Use Case: Detecting bit pattern differences and validation checksums
Integer Support: Works with Uint, Int, and mixed integer types
Signed Handling: Signed integers cast to u64 for bitwise operations
Implementation: apply_bitwise_xor function in src/evaluator/operators/bitwise.rs
Strength Score: +4 points
Version: v0.4.0

BitwiseNot Operator (~)#

Semantics: Computes bitwise complement of left operand, then checks equality with right value
Use Case: Detecting inverted bit patterns
Integer Support: Works with Uint and Int types only
Implementation: apply_bitwise_not function in src/evaluator/operators/bitwise.rs
Strength Score: +4 points
Version: v0.4.0

AnyValue Operator (x)#

Semantics: Unconditional match; always returns true regardless of operand values
Use Case: Extracting information without filtering, always-match continuation rules
Type Compatibility: Accepts any value types
Implementation: apply_any_value function in src/evaluator/operators/mod.rs
Strength Score: +1 point (least specific)
Version: v0.4.0

Operator Type Compatibility Matrix#

Left Type	Right Type	Equal/NotEqual	Comparison (<,>,<=,>=)	BitwiseAnd	BitwiseAndMask	BitwiseXor	BitwiseNot	AnyValue
Uint	Uint	✅ Direct	✅ Direct	✅	✅	✅	✅	✅
Int	Int	✅ Direct	✅ Direct	✅	✅	✅	✅	✅
Uint	Int	✅ i128 coerce	✅ i128 coerce	✅	✅	✅	✅	✅
Float	Float	✅ Epsilon	✅ partial_cmp	❌	❌	❌	❌	✅
Bytes	Bytes	✅ Direct	✅ Lexicographic	❌	❌	❌	❌	✅
String	String	✅ Direct	✅ Lexicographic	❌	❌	❌	❌	✅
Mixed	Mixed	❌	❌	❌	❌	❌	❌	✅

Offset Type System#

Offset Architecture#

The offset system defines where in a file to read data, with four offset types represented in the OffsetSpec enum.

Offset Types#

Absolute Offset#

Syntax: Numeric value (positive from start, negative from end)
Status: ✅ Fully implemented
Implementation: resolve_absolute_offset
Features: Comprehensive bounds checking, arithmetic overflow protection, i64::MIN edge case handling
Strength Score: +10 (most reliable)

FromEnd Offset#

Syntax: Negative values from file end
Status: ✅ Fully implemented
Implementation: Uses same logic as negative Absolute offsets
Strength Score: +8 (reliable)

Indirect Offset#

Syntax: (offset.type)+adjustment - reads pointer value at base offset, dereferences to calculate final offset
Status: ✅ Fully implemented (parsing + evaluation)
Implementation: resolve_indirect_offset
Pointer Types: Supports .b/.B (byte), .s/.S (short), .l/.L (long), .q/.Q (quad) with endianness variants
Adjustment Operators: Inside parens (canonical: (base.type+N), -N, *N, /N, %N, &N, |N, ^N) and after paren (legacy: (base.type)+N, -N only)
Anchor-Relative: Supports base_relative ((&N.X) form) and result_relative (&(N.X) wrapper) variants against GNU file previous-match anchor
Tracking: Issue #37, PR #42
Use Cases: PE executables, compound documents, archive formats with header pointers
Strength Score: +5 (depends on pointer dereference)

Relative Offset#

Syntax: &+N or &-N - positions relative to end of previous match
Status: ✅ Implemented
Implementation: resolve_relative_offset
Version: v0.5.0 (PR #211)
Use Cases: Sequential field parsing in nested rule structures
Strength Score: +3 (depends on previous match)

Version Roadmap and Planned Features#

v0.1.x (MVP Complete)#

Status: Released and published on crates.io

Type Coverage: 30 of ~33 types (91%)

✅ Byte
✅ Short (signed/unsigned, all endianness variants)
✅ Long (signed/unsigned, all endianness variants)
✅ Quad (signed/unsigned, all endianness variants)
✅ Float (all endianness variants)
✅ Double (all endianness variants)
✅ Date (32-bit Unix timestamp, all endianness variants, UTC/local time)
✅ QDate (64-bit Unix timestamp, all endianness variants, UTC/local time)
✅ String (with optional max_length)
✅ String16 (UCS-2 with little/big endian variants)
✅ PString (Pascal string with optional max_length)
✅ Regex (pattern matching with /c, /s, /l flags)
✅ Search (bounded literal pattern search with full /s//c//C//w//W//T//f//t//b flag support)
✅ Meta-types: default, clear, name, use, indirect, offset (control-flow directives and offset reporting)

Operator Coverage: 11 of 11 operators (100%)

✅ Equal (=, ==)
✅ NotEqual (!, !=, <>)
✅ LessThan (<)
✅ GreaterThan (>)
✅ LessEqual (<=)
✅ GreaterEqual (>=)
✅ BitwiseAnd (&)
✅ BitwiseAndMask (&mask=value)
✅ BitwiseXor (^)
✅ BitwiseNot (~)
✅ AnyValue (x)

Offset Coverage: 4 of 4 fully functional (100%)

✅ Absolute (positive and negative)
✅ FromEnd
✅ Indirect (parsing + evaluation)
✅ Relative

Metrics:

1,200+ tests, >94% line coverage
10 built-in format rules (ELF, PE, ZIP, TAR, GZIP, JPEG, PNG, GIF, BMP, PDF)
GNU file compatibility tests passing (count requires refresh from latest CI; v1.0.0 target: 77/81 tests, 95%+)

v0.2.0 (Comparison Operators)#

Status: Released and published on crates.io

Focus: Comparison operators and byte signedness

Type Coverage: 7 of ~33 types (21%)

Type Changes:

✅ Byte type changed from unit variant to Byte { signed: bool } (breaking change)
✅ Quad type added with Quad { endian: Endianness, signed: bool }

New Operators:

✅ LessThan (<)
✅ GreaterThan (>)
✅ LessEqual (<=)
✅ GreaterEqual (>=)

Operator Coverage: 8 of 11 operators (73%)

v0.3.0 (Core Primitives)#

Status: Released

Focus: Offset resolution enhancements

Offset Enhancements:

Indirect offset evaluation (pointer dereferencing)
✅ Relative offset evaluation (position tracking) - implemented in v0.5.0

Coverage:

Types: 7 of ~33 (21%)
Operators: 8 of 11 (73%)
Offsets: 4 of 4 (100%)

v0.4.0 (Bitwise Operators and Advanced Features)#

Status: Released and published on crates.io

Focus: Bitwise operators and advanced type support

New Operators:

✅ BitwiseXor (^) - PR #145
✅ BitwiseNot (~) - PR #145
✅ AnyValue (x) - PR #145

New Types:

✅ Float (32-bit IEEE 754) - PR #162
✅ Double (64-bit IEEE 754) - PR #162
✅ Regex (pattern matching) - PR #214
✅ Search (bounded literal search) - PR #214

Planned New Features:

Additional date endianness variants (medate, meldate)

Coverage:

Types: 30 of ~33 (91%)
Operators: 11 of 11 (100%)
Offsets: 4 of 4 (100%)

v0.5.0 (API and UX Polish)#

Status: Released

Focus: API improvements and developer experience

Implemented Features:

✅ Relative offset evaluation (PR #211) - position tracking against previous-match anchor

Planned Features:

Builder pattern for MagicDatabase (Issue #45)
JSON output metadata (Issue #46)
Parse warnings for skipped rules (Issue #47)
Improved error messages (Issue #49)

v1.0.0 (Production Ready - Planned)#

Focus: Full libmagic compatibility and stability

Target Metrics:

95%+ compatibility with GNU file measured by 81-file test corpus
Stable API with semver guarantees
Migration guide from C libmagic
Performance parity validation

Expected Coverage:

Types: ~31 of ~33 (94%)
Operators: 11 of 11 (100%)
Offsets: 4 of 4 (100%)

Compatibility Coverage Tracking#

Test Infrastructure#

The project maintains a dedicated compatibility test suite that validates output against the canonical libmagic implementation:

Test Corpus: Uses test files from file/file repository as git submodule
CI Integration: Automated testing on push to main/develop branches and daily at 2 AM UTC
Validation: Compares output against .result files from GNU file
Metrics: PASS/FAIL/ERROR status with detailed failure information

Current Compatibility Status (v0.4.0)#

By Category (from Issue #57):

Binary formats: 0 of ~45 tests (requires system magic files)
Text formats: 0 of ~15 tests
Audio formats: 0 of ~5 tests
ZIP subtypes: 0 of ~5 tests
Custom magic tests: 0 of ~4 tests
Filesystem images: 0 of ~3 tests

Overall: 0/81 tests passing (0%) - baseline for measuring progress

Gap Analysis: Most tests return "data" (no matching rule) due to reliance on built-in rules only; system magic file compatibility pending.

Planned Compatibility Milestones#

Version	Compatibility Target	Key Enablers
v0.1.x	0% (baseline)	Built-in rules only, 4 basic operators
v0.2.0	0% (baseline)	Added comparison operators, quad type
v0.3.0	~30%	Indirect offsets
v0.4.0	~40%	Bitwise XOR/NOT/any-value operators, float/double types
v0.5.0	~60%	Regex, date/time types, relative offsets, meta-types
v0.6.0	~80%	Additional date endianness variants, remaining edge cases
v1.0.0	95%+	Full feature parity

libmagic Type Coverage Reference#

Complete Type Inventory#

Based on the GNU libmagic specification, the following table shows all ~33 standard types and their implementation status:

Type Category	Type Name	Size	Endianness	Status	Version
8-bit Integer	`byte`	1 byte	N/A	✅ Implemented	v0.1.x
16-bit Integer	`short`	2 bytes	Native	✅ Implemented	v0.1.x
	`leshort`	2 bytes	Little	✅ Implemented	v0.1.x
	`beshort`	2 bytes	Big	✅ Implemented	v0.1.x
32-bit Integer	`long`	4 bytes	Native	✅ Implemented	v0.1.x
	`lelong`	4 bytes	Little	✅ Implemented	v0.1.x
	`belong`	4 bytes	Big	✅ Implemented	v0.1.x
64-bit Integer	`quad`	8 bytes	Native	✅ Implemented	v0.2.0
	`lequad`	8 bytes	Little	✅ Implemented	v0.2.0
	`bequad`	8 bytes	Big	✅ Implemented	v0.2.0
32-bit Float	`float`	4 bytes	Native	✅ Implemented	v0.1.0
	`lefloat`	4 bytes	Little	✅ Implemented	v0.1.0
	`befloat`	4 bytes	Big	✅ Implemented	v0.1.0
64-bit Float	`double`	8 bytes	Native	✅ Implemented	v0.1.0
	`ledouble`	8 bytes	Little	✅ Implemented	v0.1.0
	`bedouble`	8 bytes	Big	✅ Implemented	v0.1.0
String	`string`	Variable	N/A	✅ Implemented	v0.1.x
	`pstring`	Variable	N/A	✅ Implemented	v0.5.0+
	`lestring16`	Variable	Little	✅ Implemented	v0.5.0+
	`bestring16`	Variable	Big	✅ Implemented	v0.5.0+
Date (32-bit)	`date`	4 bytes	Native UTC	✅ Implemented	v0.5.0+
	`ldate`	4 bytes	Native Local	✅ Implemented	v0.5.0+
	`bedate`	4 bytes	Big UTC	✅ Implemented	v0.5.0+
	`beldate`	4 bytes	Big Local	✅ Implemented	v0.5.0+
	`ledate`	4 bytes	Little UTC	✅ Implemented	v0.5.0+
	`leldate`	4 bytes	Little Local	✅ Implemented	v0.5.0+
	`medate`	4 bytes	Middle UTC	📋 Planned	Future
	`meldate`	4 bytes	Middle Local	📋 Planned	Future
Date (64-bit)	`qdate`	8 bytes	Native UTC	✅ Implemented	v0.5.0+
	`qldate`	8 bytes	Native Local	✅ Implemented	v0.5.0+
	`beqdate`	8 bytes	Big UTC	✅ Implemented	v0.5.0+
	`beqldate`	8 bytes	Big Local	✅ Implemented	v0.5.0+
	`leqdate`	8 bytes	Little UTC	✅ Implemented	v0.5.0+
	`leqldate`	8 bytes	Little Local	✅ Implemented	v0.5.0+
Pattern	`regex`	Variable	N/A	✅ Implemented	v0.5.0+
	`search`	Variable	N/A	✅ Implemented	v0.5.0+
Meta	`default`	N/A	N/A	✅ Implemented	v0.5.0+
	`clear`	N/A	N/A	✅ Implemented	v0.5.0+
	`name`	N/A	N/A	✅ Implemented	v0.5.0+
	`use`	N/A	N/A	✅ Implemented	v0.5.0+
	`indirect`	N/A	N/A	✅ Implemented	v0.5.0+
	`offset`	N/A	N/A	✅ Implemented	v0.5.0+

Current Coverage: 30 of 33 types (91%)
v0.3.0 Status: 7 of 33 types (21%)
v0.4.0 Status: 7 of 33 types (21%)
v0.5.0+ Status: 30 of 33 types (91%)
v1.0.0 Target: ~31 of 33 types (94%)

Complete Operator Inventory#

Operator	Symbol	Description	Status	Version
Equality	`=`, `==`	Exact equality	✅ Implemented	v0.1.x
	`!`, `!=`, `<>`	Inequality	✅ Implemented	v0.1.x
Comparison	`>`	Greater than	✅ Implemented	v0.2.0
	`<`	Less than	✅ Implemented	v0.2.0
	`>=`	Greater or equal	✅ Implemented	v0.2.0
	`<=`	Less or equal	✅ Implemented	v0.2.0
Bitwise	`&`	Bitwise AND (test)	✅ Implemented	v0.1.x
	`&mask=value`	Bitwise AND with mask	✅ Implemented	v0.1.x
	`^`	Bitwise XOR	✅ Implemented	v0.4.0
	`~`	Bitwise NOT	✅ Implemented	v0.4.0
Special	`x`	Any value (always match)	✅ Implemented	v0.4.0

Current Coverage: 11 of 11 operators (100%)
v0.4.0 Status: 11 of 11 operators (100%)
v0.5.0 Target: 11 of 11 operators (100%)
v1.0.0 Target: 11 of 11 operators (100%)

Security and Safety Features#

The type system and operator implementation include comprehensive security guarantees:

Strict Bounds Checking: Prevents buffer overruns during type reading
Integer Overflow Protection: Safe arithmetic in offset calculations and value conversions
No Uninitialized Memory: All reads validated against buffer boundaries
SIMD-Accelerated String Scanning: Uses memchr crate for fast, safe null terminator detection
UTF-8 Validation: Invalid sequences replaced with safe replacement character
Cross-Type Safety: Explicit type handling prevents undefined behavior in operator evaluation
Timestamp Formatting: Uses chrono crate with safe O(1) civil-date conversion algorithm avoiding overflow/hang on extreme timestamp values

All implementations avoid unsafe code where possible, relying on Rust's type system and bounds checking for memory safety.

Rule Message Formatting#

Magic rule messages support printf-style format specifiers (%d, %i, %u, %x, %X, %o, %s, %c) that are substituted with the rule's read value at evaluation time via src/output/format.rs::format_magic_message. Width, padding, and length modifiers are supported. Hex specifiers mask to TypeKind::bit_width() so signed byte -1 renders as ff, not ffffffffffffffff. Alt-form (#) follows C printf: %#06x + 0xab → 0x00ab. Literal % requires escaping as %%.

Magic File Format: Specification and syntax for magic rule definitions
Rule Evaluation Engine: Runtime system for matching files against magic rules
Strength Calculation: Confidence scoring algorithm for rule matches
AST Data Structures: Abstract syntax tree representation of magic rules
Parser Implementation: Parsing logic for magic file syntax
Built-in Rules: Default format detection rules compiled into the binary

Relevant Code Files#

File Path	Description	Key Components
src/parser/ast.rs	AST data structures	TypeKind enum, Operator enum, OffsetSpec enum, Endianness enum
src/evaluator/types.rs	Type reading implementation	read_byte, read_short, read_long, read_quad, read_float, read_double, read_date, read_qdate, read_string, read_pstring, read_typed_value
src/evaluator/types/string.rs	String type implementation	read_string, read_pstring
src/evaluator/types/date.rs	Date/timestamp implementation	read_date, read_qdate, format_unix_timestamp_64, local_utc_offset_secs
src/evaluator/operators.rs	Operator evaluation	apply_equal, apply_not_equal, apply_bitwise_and, apply_operator
src/evaluator/offset.rs	Offset resolution	resolve_offset, resolve_absolute_offset, OffsetError
src/evaluator/strength.rs	Strength scoring	Operator strength scores, offset strength scores
src/parser/grammar.rs	Parser implementation	parse_operator, parse_type, parse_offset
ROADMAP.md	Version planning	Milestone definitions, feature tracking
docs/src/compatibility.md	Compatibility tracking	Type/operator status matrices, platform support
docs/MAGIC_FORMAT.md	Format documentation	Type specifications, operator syntax, examples

This article reflects the state of libmagic-rs as of version 0.5.0+. For the latest implementation status, refer to the project roadmap and GitHub issues.

Type System And Operator Coverage#

Lead Section#

Current Type System Implementation#

Type Architecture#

Implemented Types#

Byte Type#

Short Type#

Long Type#

Quad Type#

Float Type#

Double Type#

Date Type (32-bit)#

QDate Type (64-bit)#

String Type#

String16 Type (UCS-2)#

PString Type#

Regex Type#

Search Type#

Meta-Type Directives#

Endianness Support#

Current Operator Implementation#

Operator Architecture#

Implemented Operators#

Equal Operator (=)#

NotEqual Operator (!=)#

LessThan Operator (<)#

GreaterThan Operator (>)#

LessEqual Operator (<=)#

GreaterEqual Operator (>=)#

BitwiseAnd Operator (&)#

BitwiseAndMask Operator (&mask=value)#

BitwiseXor Operator (^)#

BitwiseNot Operator (~)#

AnyValue Operator (x)#

Operator Type Compatibility Matrix#

Offset Type System#

Offset Architecture#

Offset Types#

Absolute Offset#

FromEnd Offset#

Indirect Offset#

Relative Offset#

Version Roadmap and Planned Features#

v0.1.x (MVP Complete)#

v0.2.0 (Comparison Operators)#

v0.3.0 (Core Primitives)#

v0.4.0 (Bitwise Operators and Advanced Features)#

v0.5.0 (API and UX Polish)#

v1.0.0 (Production Ready - Planned)#

Compatibility Coverage Tracking#

Test Infrastructure#

Current Compatibility Status (v0.4.0)#

Planned Compatibility Milestones#

libmagic Type Coverage Reference#

Complete Type Inventory#

Complete Operator Inventory#

Security and Safety Features#

Rule Message Formatting#

Related Topics#

Relevant Code Files#