Type System And Operator Coverage#
Lead Section#
The Type System And Operator Coverage in libmagic-rs represents the inventory of data types and comparison operators implemented for detecting file formats through magic rule evaluation. As a pure-Rust replacement for the C libmagic library, libmagic-rs follows a phased implementation strategy toward achieving 95%+ compatibility with GNU file by version 1.0.0.
The v0.1.x release provided four basic data types (Byte, Short, Long, and String) with comprehensive endianness and signedness support, alongside four operators including equality, inequality, and bitwise operations. The v0.2.0 release added four comparison operators (<, >, <=, >=) and changed Byte from a unit variant to a signed/unsigned variant. Version 0.3.0 added 64-bit integer support with the Quad type supporting all endianness and signedness variants. Version 0.4.0 added three additional operators (BitwiseXor, BitwiseNot, and AnyValue) bringing the total to 11 implemented operators. The v0.1.0 release added floating-point types (Float and Double with endian variants) to expand format detection capabilities. Version 0.5.0+ added date and timestamp types (Date and QDate with UTC/local time options) for parsing Unix timestamps and pattern types (Regex and Search) for text-based format detection through pattern matching. These primitives enable detection of common file formats including executables, archives, images, text documents, and time-stamped data. All six meta-type directives (default, clear, name, use, indirect, offset) are fully implemented for control-flow subroutines and dynamic reporting.
This article provides a comprehensive technical reference for the type system and operator implementation, tracking both current capabilities and planned enhancements across the version roadmap to full libmagic compatibility.
Current Type System Implementation#
Type Architecture#
The type system in libmagic-rs is implemented across two primary modules: the AST representation in src/parser/ast.rs defines the TypeKind enum for structural representation, while src/evaluator/types.rs provides the runtime implementation for reading typed values from binary data.
Implemented Types#
Byte Type#
- Size: 8 bits (1 byte)
- Signedness: Explicitly specified with
signed: boolfield inTypeKind::Byte { signed: bool }(changed from unit variant in v0.2.0) - Endianness: N/A (single byte)
- Return Type:
Value::Uint(u64)for unsigned,Value::Int(i64)for signed - Implementation: read_byte function (takes 3 parameters as of v0.2.0)
Short Type#
- Size: 16 bits (2 bytes)
- Signedness: Supports both signed and unsigned via
signed: boolparameter - Endianness: Little, Big, and Native variants
- Return Type:
Value::Uint(u64)for unsigned,Value::Int(i64)for signed - Implementation: read_short function
Long Type#
- Size: 32 bits (4 bytes)
- Signedness: Supports both signed and unsigned via
signed: boolparameter - Endianness: Little, Big, and Native variants
- Return Type:
Value::Uint(u64)for unsigned,Value::Int(i64)for signed - Implementation: read_long function
Quad Type#
- Size: 64 bits (8 bytes)
- Signedness: Supports both signed and unsigned via
signed: boolparameter - Endianness: Little, Big, and Native variants
- Return Type:
Value::Uint(u64)for unsigned,Value::Int(i64)for signed - Implementation: read_quad function
Float Type#
- Size: 32 bits (4 bytes)
- Standard: IEEE 754 single-precision floating-point
- Endianness: Little, Big, and Native variants
- Return Type:
Value::Float(f64)(widened from f32) - Implementation: read_float function
Double Type#
- Size: 64 bits (8 bytes)
- Standard: IEEE 754 double-precision floating-point
- Endianness: Little, Big, and Native variants
- Return Type:
Value::Float(f64) - Implementation: read_double function
Date Type (32-bit)#
- Size: 32 bits (4 bytes)
- Standard: Unix timestamp (seconds since epoch)
- Endianness: Little, Big, and Native variants
- Time Zone: UTC or local time via
utc: boolparameter - Return Type:
Value::String(String)formatted as"%a %b %e %H:%M:%S %Y"(matching GNU file output) - Implementation: read_date function
- Strength Score: 15 points
- Version: v0.5.0+
QDate Type (64-bit)#
- Size: 64 bits (8 bytes)
- Standard: Unix timestamp (seconds since epoch)
- Endianness: Little, Big, and Native variants
- Time Zone: UTC or local time via
utc: boolparameter - Return Type:
Value::String(String)formatted as"%a %b %e %H:%M:%S %Y"(matching GNU file output) - Implementation: read_qdate function
- Strength Score: 16 points
- Version: v0.5.0+
String Type#
- Size: Variable length
- Parameters:
max_length: Option<usize>controls maximum bytes to read - Behavior: Dual read-mode dispatch -- when no comparison value is specified, strings are read with NUL-terminator or max_length bounds (via
read_string_exact); when a pattern (regex/search) is specified, strings are read as variable-length up to max_length (viaread_string) - UTF-8 Handling: Invalid UTF-8 sequences replaced with replacement character (U+FFFD)
- Return Type:
Value::String(String) - Implementation: read_string function
String16 Type (UCS-2)#
- Description: UCS-2 (16-bit Unicode) string with explicit byte order
- Size: Variable length (16 bits per code unit)
- Endianness: Little (
lestring16) and Big (bestring16) variants - Behavior: Reads 16-bit code units until U+0000 terminator or buffer end; surrogate pairs (D800-DFFF) replaced with U+FFFD
- Return Type:
Value::String(String) - Implementation: read_string16 function
- Version: v0.5.0+
PString Type#
- Description: Pascal-style length-prefixed strings where the first byte indicates the string length, followed by that many bytes of string data
- Size: Variable (1 byte for length prefix + actual string bytes)
- Parameters:
max_length: Option<usize>caps the length byte value - Behavior: Reads length byte (0-255), then reads that many bytes as string data; not null-terminated
- Bounds Checking: Validates both the length byte is readable and that the full string data is within bounds
- UTF-8 Handling: Invalid UTF-8 sequences replaced with replacement character (U+FFFD)
- Return Type:
Value::String(String) - Implementation: read_pstring function
- Comparison Operators: Supports all string comparison operators (=, !=, <, >, <=, >=)
Meta-Type Directives#
default: Fires when no sibling at the same indentation level has matched; serves as catch-all fallbackclear: Resets the sibling-matched flag to allow subsequentdefaultrules to firename: Declares a named subroutine that can be invoked viauseuse: Invokes a named subroutine; expands inline with callee's matches appearing before continuation rulesindirect: Re-evaluates the root rule list at the resolved offsetoffset: Reports the resolved file offset asValue::Uint(position)rather than reading typed data; message templates can reference the position via printf-style specifiers (%lld,%d)- Version: v0.5.0+
- Implementation:
src/evaluator/engine/mod.rsdispatch arms;src/parser/name_table.rsforname/usesubroutine table - Semantics: Control-flow directives modify rule traversal;
offsetis value-bearing but reads no buffer bytes
Endianness Support#
The Endianness enum provides three byte order options:
- Little: Least significant byte first (common in x86/x64 architectures)
- Big: Most significant byte first (common in network protocols)
- Native: System-dependent byte order matching target architecture
All multi-byte types (Short, Long, Quad, Float, Double, Date, and QDate) support all three endianness variants, enabling detection of files from different architectures and platforms.
Current Operator Implementation#
Operator Architecture#
The operator system consists of eleven distinct operators defined in the AST, with runtime evaluation provided by dedicated functions in src/evaluator/operators.rs.
Implemented Operators#
Equal Operator (=)#
- Semantics: Tests exact equality between left and right operands
- Cross-Type Coercion: Implements intelligent coercion between
UintandIntby converting both toi128to prevent overflow - Float Equality: Uses epsilon-aware comparison (
|a - b| <= f64::EPSILON) forValue::Float; NaN comparisons always return false - Type Compatibility: Direct comparison for same types; cross-integer comparison via i128; false for mismatched non-integer types
- Implementation: apply_equal function
- Strength Score: +10 points (most specific)
NotEqual Operator (!=)#
- Semantics: Logical negation of Equal operator
- Syntax:
!,!=,<>-- all three forms map to NotEqual - Implementation: Returns !apply_equal(left, right)
- Strength Score: +5 points
LessThan Operator (<)#
- Semantics: Returns true if left operand is less than right operand
- Type Compatibility: Works with integers, floats, strings, and bytes; uses i128 coercion for cross-integer comparisons
- Float Comparison: Uses
partial_cmpwith IEEE 754 semantics; NaN comparisons return None (false) - Implementation: apply_less_than function
- Strength Score: +6 points
- Version: v0.2.0
GreaterThan Operator (>)#
- Semantics: Returns true if left operand is greater than right operand
- Type Compatibility: Works with integers, floats, strings, and bytes; uses i128 coercion for cross-integer comparisons
- Float Comparison: Uses
partial_cmpwith IEEE 754 semantics; NaN comparisons return None (false) - Implementation: apply_greater_than function
- Strength Score: +6 points
- Version: v0.2.0
LessEqual Operator (<=)#
- Semantics: Returns true if left operand is less than or equal to right operand
- Type Compatibility: Works with integers, floats, strings, and bytes; uses i128 coercion for cross-integer comparisons
- Float Comparison: Uses
partial_cmpwith IEEE 754 semantics; NaN comparisons return None (false) - Implementation: apply_less_equal function
- Strength Score: +6 points
- Version: v0.2.0
GreaterEqual Operator (>=)#
- Semantics: Returns true if left operand is greater than or equal to right operand
- Type Compatibility: Works with integers, floats, strings, and bytes; uses i128 coercion for cross-integer comparisons
- Float Comparison: Uses
partial_cmpwith IEEE 754 semantics; NaN comparisons return None (false) - Implementation: apply_greater_equal function
- Strength Score: +6 points
- Version: v0.2.0
BitwiseAnd Operator (&)#
- Semantics: Returns true if
(left & right) != 0 - Use Case: Checking if specific bits are set in flags or bitmasks
- Integer Support: Works with
Uint,Int, and mixed integer types - Signed Handling: Signed integers cast to
u64for bitwise operations - Implementation: apply_bitwise_and function
- Strength Score: +3 points (least specific)
BitwiseAndMask Operator (&mask=value)#
- Semantics: Applies bitmask to left operand, then performs equality comparison with right operand
- Syntax:
&0xNN=valuein magic file format - Use Case: Checking specific bit patterns while ignoring other bits
- Implementation: Masks left value, then calls apply_equal
- Strength Score: +7 points (moderately specific)
BitwiseXor Operator (^)#
- Semantics: Performs bitwise XOR between two integer values; returns true if result is non-zero
- Use Case: Detecting bit pattern differences and validation checksums
- Integer Support: Works with
Uint,Int, and mixed integer types - Signed Handling: Signed integers cast to
u64for bitwise operations - Implementation: apply_bitwise_xor function in
src/evaluator/operators/bitwise.rs - Strength Score: +4 points
- Version: v0.4.0
BitwiseNot Operator (~)#
- Semantics: Computes bitwise complement of left operand, then checks equality with right value
- Use Case: Detecting inverted bit patterns
- Integer Support: Works with
UintandInttypes only - Implementation: apply_bitwise_not function in
src/evaluator/operators/bitwise.rs - Strength Score: +4 points
- Version: v0.4.0
AnyValue Operator (x)#
- Semantics: Unconditional match; always returns true regardless of operand values
- Use Case: Extracting information without filtering, always-match continuation rules
- Type Compatibility: Accepts any value types
- Implementation: apply_any_value function in
src/evaluator/operators/mod.rs - Strength Score: +1 point (least specific)
- Version: v0.4.0
Operator Type Compatibility Matrix#
| Left Type | Right Type | Equal/NotEqual | Comparison (<,>,<=,>=) | BitwiseAnd | BitwiseAndMask | BitwiseXor | BitwiseNot | AnyValue |
|---|---|---|---|---|---|---|---|---|
| Uint | Uint | ✅ Direct | ✅ Direct | ✅ | ✅ | ✅ | ✅ | ✅ |
| Int | Int | ✅ Direct | ✅ Direct | ✅ | ✅ | ✅ | ✅ | ✅ |
| Uint | Int | ✅ i128 coerce | ✅ i128 coerce | ✅ | ✅ | ✅ | ✅ | ✅ |
| Float | Float | ✅ Epsilon | ✅ partial_cmp | ❌ | ❌ | ❌ | ❌ | ✅ |
| Bytes | Bytes | ✅ Direct | ✅ Lexicographic | ❌ | ❌ | ❌ | ❌ | ✅ |
| String | String | ✅ Direct | ✅ Lexicographic | ❌ | ❌ | ❌ | ❌ | ✅ |
| Mixed | Mixed | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ |
Offset Type System#
Offset Architecture#
The offset system defines where in a file to read data, with four offset types represented in the OffsetSpec enum.
Offset Types#
Absolute Offset#
- Syntax: Numeric value (positive from start, negative from end)
- Status: ✅ Fully implemented
- Implementation: resolve_absolute_offset
- Features: Comprehensive bounds checking, arithmetic overflow protection, i64::MIN edge case handling
- Strength Score: +10 (most reliable)
FromEnd Offset#
- Syntax: Negative values from file end
- Status: ✅ Fully implemented
- Implementation: Uses same logic as negative Absolute offsets
- Strength Score: +8 (reliable)
Indirect Offset#
- Syntax:
(offset.type)+adjustment- reads pointer value at base offset, dereferences to calculate final offset - Status: ✅ Fully implemented (parsing + evaluation)
- Implementation: resolve_indirect_offset
- Pointer Types: Supports
.b/.B(byte),.s/.S(short),.l/.L(long),.q/.Q(quad) with endianness variants - Adjustment Operators: Inside parens (canonical:
(base.type+N),-N,*N,/N,%N,&N,|N,^N) and after paren (legacy:(base.type)+N,-Nonly) - Anchor-Relative: Supports
base_relative((&N.X)form) andresult_relative(&(N.X)wrapper) variants against GNU file previous-match anchor - Tracking: Issue #37, PR #42
- Use Cases: PE executables, compound documents, archive formats with header pointers
- Strength Score: +5 (depends on pointer dereference)
Relative Offset#
- Syntax:
&+Nor&-N- positions relative to end of previous match - Status: ✅ Implemented
- Implementation: resolve_relative_offset
- Version: v0.5.0 (PR #211)
- Use Cases: Sequential field parsing in nested rule structures
- Strength Score: +3 (depends on previous match)
Version Roadmap and Planned Features#
v0.1.x (MVP Complete)#
Status: Released and published on crates.io
Type Coverage: 30 of ~33 types (91%)
- ✅ Byte
- ✅ Short (signed/unsigned, all endianness variants)
- ✅ Long (signed/unsigned, all endianness variants)
- ✅ Quad (signed/unsigned, all endianness variants)
- ✅ Float (all endianness variants)
- ✅ Double (all endianness variants)
- ✅ Date (32-bit Unix timestamp, all endianness variants, UTC/local time)
- ✅ QDate (64-bit Unix timestamp, all endianness variants, UTC/local time)
- ✅ String (with optional max_length)
- ✅ String16 (UCS-2 with little/big endian variants)
- ✅ PString (Pascal string with optional max_length)
- ✅ Regex (pattern matching with
/c,/s,/lflags) - ✅ Search (bounded literal pattern search)
- ✅ Meta-types:
default,clear,name,use,indirect,offset(control-flow directives and offset reporting)
Operator Coverage: 11 of 11 operators (100%)
- ✅ Equal (=, ==)
- ✅ NotEqual (!, !=, <>)
- ✅ LessThan (<)
- ✅ GreaterThan (>)
- ✅ LessEqual (<=)
- ✅ GreaterEqual (>=)
- ✅ BitwiseAnd (&)
- ✅ BitwiseAndMask (&mask=value)
- ✅ BitwiseXor (^)
- ✅ BitwiseNot (~)
- ✅ AnyValue (x)
Offset Coverage: 4 of 4 fully functional (100%)
- ✅ Absolute (positive and negative)
- ✅ FromEnd
- ✅ Indirect (parsing + evaluation)
- ✅ Relative
Metrics:
- 1,200+ tests, >94% line coverage
- 10 built-in format rules (ELF, PE, ZIP, TAR, GZIP, JPEG, PNG, GIF, BMP, PDF)
- GNU file compatibility tests passing (count requires refresh from latest CI; v1.0.0 target: 77/81 tests, 95%+)
v0.2.0 (Comparison Operators)#
Status: Released and published on crates.io
Focus: Comparison operators and byte signedness
Type Coverage: 7 of ~33 types (21%)
Type Changes:
- ✅ Byte type changed from unit variant to
Byte { signed: bool }(breaking change) - ✅ Quad type added with
Quad { endian: Endianness, signed: bool }
New Operators:
- ✅ LessThan (<)
- ✅ GreaterThan (>)
- ✅ LessEqual (<=)
- ✅ GreaterEqual (>=)
Operator Coverage: 8 of 11 operators (73%)
v0.3.0 (Core Primitives)#
Status: Released
Focus: Offset resolution enhancements
Offset Enhancements:
- Indirect offset evaluation (pointer dereferencing)
- ✅ Relative offset evaluation (position tracking) - implemented in v0.5.0
Coverage:
- Types: 7 of ~33 (21%)
- Operators: 8 of 11 (73%)
- Offsets: 4 of 4 (100%)
v0.4.0 (Bitwise Operators and Advanced Features)#
Status: Released and published on crates.io
Focus: Bitwise operators and advanced type support
New Operators:
New Types:
- ✅ Float (32-bit IEEE 754) - PR #162
- ✅ Double (64-bit IEEE 754) - PR #162
- ✅ Regex (pattern matching) - PR #214
- ✅ Search (bounded literal search) - PR #214
Planned New Features:
- Additional date endianness variants (
medate,meldate)
Coverage:
- Types: 30 of ~33 (91%)
- Operators: 11 of 11 (100%)
- Offsets: 4 of 4 (100%)
v0.5.0 (API and UX Polish)#
Status: Released
Focus: API improvements and developer experience
Implemented Features:
- ✅ Relative offset evaluation (PR #211) - position tracking against previous-match anchor
Planned Features:
- Builder pattern for
MagicDatabase(Issue #45) - JSON output metadata (Issue #46)
- Parse warnings for skipped rules (Issue #47)
- Improved error messages (Issue #49)
v1.0.0 (Production Ready - Planned)#
Focus: Full libmagic compatibility and stability
Target Metrics:
- 95%+ compatibility with GNU file measured by 81-file test corpus
- Stable API with semver guarantees
- Migration guide from C libmagic
- Performance parity validation
Expected Coverage:
- Types: ~31 of ~33 (94%)
- Operators: 11 of 11 (100%)
- Offsets: 4 of 4 (100%)
Compatibility Coverage Tracking#
Test Infrastructure#
The project maintains a dedicated compatibility test suite that validates output against the canonical libmagic implementation:
- Test Corpus: Uses test files from file/file repository as git submodule
- CI Integration: Automated testing on push to main/develop branches and daily at 2 AM UTC
- Validation: Compares output against
.resultfiles from GNU file - Metrics: PASS/FAIL/ERROR status with detailed failure information
Current Compatibility Status (v0.4.0)#
By Category (from Issue #57):
- Binary formats: 0 of ~45 tests (requires system magic files)
- Text formats: 0 of ~15 tests
- Audio formats: 0 of ~5 tests
- ZIP subtypes: 0 of ~5 tests
- Custom magic tests: 0 of ~4 tests
- Filesystem images: 0 of ~3 tests
Overall: 0/81 tests passing (0%) - baseline for measuring progress
Gap Analysis: Most tests return "data" (no matching rule) due to reliance on built-in rules only; system magic file compatibility pending.
Planned Compatibility Milestones#
| Version | Compatibility Target | Key Enablers |
|---|---|---|
| v0.1.x | 0% (baseline) | Built-in rules only, 4 basic operators |
| v0.2.0 | 0% (baseline) | Added comparison operators, quad type |
| v0.3.0 | ~30% | Indirect offsets |
| v0.4.0 | ~40% | Bitwise XOR/NOT/any-value operators, float/double types |
| v0.5.0 | ~60% | Regex, date/time types, relative offsets, meta-types |
| v0.6.0 | ~80% | Additional date endianness variants, remaining edge cases |
| v1.0.0 | 95%+ | Full feature parity |
libmagic Type Coverage Reference#
Complete Type Inventory#
Based on the GNU libmagic specification, the following table shows all ~33 standard types and their implementation status:
| Type Category | Type Name | Size | Endianness | Status | Version |
|---|---|---|---|---|---|
| 8-bit Integer | byte | 1 byte | N/A | ✅ Implemented | v0.1.x |
| 16-bit Integer | short | 2 bytes | Native | ✅ Implemented | v0.1.x |
leshort | 2 bytes | Little | ✅ Implemented | v0.1.x | |
beshort | 2 bytes | Big | ✅ Implemented | v0.1.x | |
| 32-bit Integer | long | 4 bytes | Native | ✅ Implemented | v0.1.x |
lelong | 4 bytes | Little | ✅ Implemented | v0.1.x | |
belong | 4 bytes | Big | ✅ Implemented | v0.1.x | |
| 64-bit Integer | quad | 8 bytes | Native | ✅ Implemented | v0.2.0 |
lequad | 8 bytes | Little | ✅ Implemented | v0.2.0 | |
bequad | 8 bytes | Big | ✅ Implemented | v0.2.0 | |
| 32-bit Float | float | 4 bytes | Native | ✅ Implemented | v0.1.0 |
lefloat | 4 bytes | Little | ✅ Implemented | v0.1.0 | |
befloat | 4 bytes | Big | ✅ Implemented | v0.1.0 | |
| 64-bit Float | double | 8 bytes | Native | ✅ Implemented | v0.1.0 |
ledouble | 8 bytes | Little | ✅ Implemented | v0.1.0 | |
bedouble | 8 bytes | Big | ✅ Implemented | v0.1.0 | |
| String | string | Variable | N/A | ✅ Implemented | v0.1.x |
pstring | Variable | N/A | ✅ Implemented | v0.5.0+ | |
lestring16 | Variable | Little | ✅ Implemented | v0.5.0+ | |
bestring16 | Variable | Big | ✅ Implemented | v0.5.0+ | |
| Date (32-bit) | date | 4 bytes | Native UTC | ✅ Implemented | v0.5.0+ |
ldate | 4 bytes | Native Local | ✅ Implemented | v0.5.0+ | |
bedate | 4 bytes | Big UTC | ✅ Implemented | v0.5.0+ | |
beldate | 4 bytes | Big Local | ✅ Implemented | v0.5.0+ | |
ledate | 4 bytes | Little UTC | ✅ Implemented | v0.5.0+ | |
leldate | 4 bytes | Little Local | ✅ Implemented | v0.5.0+ | |
medate | 4 bytes | Middle UTC | 📋 Planned | Future | |
meldate | 4 bytes | Middle Local | 📋 Planned | Future | |
| Date (64-bit) | qdate | 8 bytes | Native UTC | ✅ Implemented | v0.5.0+ |
qldate | 8 bytes | Native Local | ✅ Implemented | v0.5.0+ | |
beqdate | 8 bytes | Big UTC | ✅ Implemented | v0.5.0+ | |
beqldate | 8 bytes | Big Local | ✅ Implemented | v0.5.0+ | |
leqdate | 8 bytes | Little UTC | ✅ Implemented | v0.5.0+ | |
leqldate | 8 bytes | Little Local | ✅ Implemented | v0.5.0+ | |
| Pattern | regex | Variable | N/A | ✅ Implemented | v0.5.0+ |
search | Variable | N/A | ✅ Implemented | v0.5.0+ | |
| Meta | default | N/A | N/A | ✅ Implemented | v0.5.0+ |
clear | N/A | N/A | ✅ Implemented | v0.5.0+ | |
name | N/A | N/A | ✅ Implemented | v0.5.0+ | |
use | N/A | N/A | ✅ Implemented | v0.5.0+ | |
indirect | N/A | N/A | ✅ Implemented | v0.5.0+ | |
offset | N/A | N/A | ✅ Implemented | v0.5.0+ |
Current Coverage: 30 of 33 types (91%)
v0.3.0 Status: 7 of 33 types (21%)
v0.4.0 Status: 7 of 33 types (21%)
v0.5.0+ Status: 30 of 33 types (91%)
v1.0.0 Target: ~31 of 33 types (94%)
Complete Operator Inventory#
| Operator | Symbol | Description | Status | Version |
|---|---|---|---|---|
| Equality | =, == | Exact equality | ✅ Implemented | v0.1.x |
!, !=, <> | Inequality | ✅ Implemented | v0.1.x | |
| Comparison | > | Greater than | ✅ Implemented | v0.2.0 |
< | Less than | ✅ Implemented | v0.2.0 | |
>= | Greater or equal | ✅ Implemented | v0.2.0 | |
<= | Less or equal | ✅ Implemented | v0.2.0 | |
| Bitwise | & | Bitwise AND (test) | ✅ Implemented | v0.1.x |
&mask=value | Bitwise AND with mask | ✅ Implemented | v0.1.x | |
^ | Bitwise XOR | ✅ Implemented | v0.4.0 | |
~ | Bitwise NOT | ✅ Implemented | v0.4.0 | |
| Special | x | Any value (always match) | ✅ Implemented | v0.4.0 |
Current Coverage: 11 of 11 operators (100%)
v0.4.0 Status: 11 of 11 operators (100%)
v0.5.0 Target: 11 of 11 operators (100%)
v1.0.0 Target: 11 of 11 operators (100%)
Security and Safety Features#
The type system and operator implementation include comprehensive security guarantees:
- Strict Bounds Checking: Prevents buffer overruns during type reading
- Integer Overflow Protection: Safe arithmetic in offset calculations and value conversions
- No Uninitialized Memory: All reads validated against buffer boundaries
- SIMD-Accelerated String Scanning: Uses memchr crate for fast, safe null terminator detection
- UTF-8 Validation: Invalid sequences replaced with safe replacement character
- Cross-Type Safety: Explicit type handling prevents undefined behavior in operator evaluation
- Timestamp Formatting: Uses chrono crate with safe O(1) civil-date conversion algorithm avoiding overflow/hang on extreme timestamp values
All implementations avoid unsafe code where possible, relying on Rust's type system and bounds checking for memory safety.
Rule Message Formatting#
Magic rule messages support printf-style format specifiers (%d, %i, %u, %x, %X, %o, %s, %c) that are substituted with the rule's read value at evaluation time via src/output/format.rs::format_magic_message. Width, padding, and length modifiers are supported. Hex specifiers mask to TypeKind::bit_width() so signed byte -1 renders as ff, not ffffffffffffffff. Alt-form (#) follows C printf: %#06x + 0xab → 0x00ab. Literal % requires escaping as %%.
Related Topics#
- Magic File Format: Specification and syntax for magic rule definitions
- Rule Evaluation Engine: Runtime system for matching files against magic rules
- Strength Calculation: Confidence scoring algorithm for rule matches
- AST Data Structures: Abstract syntax tree representation of magic rules
- Parser Implementation: Parsing logic for magic file syntax
- Built-in Rules: Default format detection rules compiled into the binary
Relevant Code Files#
| File Path | Description | Key Components |
|---|---|---|
| src/parser/ast.rs | AST data structures | TypeKind enum, Operator enum, OffsetSpec enum, Endianness enum |
| src/evaluator/types.rs | Type reading implementation | read_byte, read_short, read_long, read_quad, read_float, read_double, read_date, read_qdate, read_string, read_pstring, read_typed_value |
| src/evaluator/types/string.rs | String type implementation | read_string, read_pstring |
| src/evaluator/types/date.rs | Date/timestamp implementation | read_date, read_qdate, format_unix_timestamp_64, local_utc_offset_secs |
| src/evaluator/operators.rs | Operator evaluation | apply_equal, apply_not_equal, apply_bitwise_and, apply_operator |
| src/evaluator/offset.rs | Offset resolution | resolve_offset, resolve_absolute_offset, OffsetError |
| src/evaluator/strength.rs | Strength scoring | Operator strength scores, offset strength scores |
| src/parser/grammar.rs | Parser implementation | parse_operator, parse_type, parse_offset |
| ROADMAP.md | Version planning | Milestone definitions, feature tracking |
| docs/src/compatibility.md | Compatibility tracking | Type/operator status matrices, platform support |
| docs/MAGIC_FORMAT.md | Format documentation | Type specifications, operator syntax, examples |
This article reflects the state of libmagic-rs as of version 0.5.0+. For the latest implementation status, refer to the project roadmap and GitHub issues.