Documents
Magic File Compatibility Status
Magic File Compatibility Status
Type
Topic
Status
Published
Created
Feb 28, 2026
Updated
Apr 25, 2026
Created by
Dosu Bot
Updated by
Dosu Bot

Magic File Compatibility Status#

libmagic-rs is a pure-Rust clean-room implementation of libmagic, the C library that powers the Unix file command for identifying file types. As of version 0.4.0 (released March 2026), the project is in early development with fundamental file identification capabilities operational. The implementation currently achieves 0% compatibility (0/81 tests passing) against the third-party GNU file test corpus, but follows a structured milestone-based roadmap targeting 95%+ compatibility with GNU file by version 1.0.0.

The implementation emphasizes memory safety with zero unsafe code, thread-safe design, and modern Rust error handling patterns. Development is organized through five strategic epics that address operator completeness, type system expansion, offset resolution mechanisms, specification compliance, and final compatibility validation. The project uses a multi-stage parser architecture built on nom combinators and implements hierarchical rule evaluation with graceful error handling.

This article provides comprehensive tracking of implemented versus unsupported features, magic file directive support status, version milestones spanning v0.1.0 through v1.0.0, and the detailed enhancement roadmap specific to the libmagic-rs implementation.

Current Implementation Status (v0.4.x)#

Implemented Features#

The v0.4.x releases provide foundational file identification capabilities across data types, operators, offsets, nested rules, and string matching. While limited in scope compared to GNU file, these features establish the architectural patterns for future expansion.

Data Types#

The evaluator currently supports fourteen basic data types with endianness variants:

  • Byte: Single byte values (8-bit signed or unsigned)
  • Short: 16-bit integers with native, little-endian, big-endian support (signed and unsigned)
  • Long: 32-bit integers with endianness variants (signed and unsigned)
  • Quad: 64-bit integers with endianness variants (signed and unsigned)
  • Float: IEEE-754 32-bit floating point with endianness variants (float/befloat/lefloat). Implementation in src/evaluator/types/float.rs (PR #162 / v0.3.0) with comparison tolerance for approximate matching.
  • Double: IEEE-754 64-bit floating point with endianness variants (double/bedouble/ledouble). Implementation in src/evaluator/types/float.rs (PR #162 / v0.3.0) with comparison tolerance for approximate matching.
  • Date: 32-bit Unix timestamps (date/ldate/bedate/beldate/ledate/leldate) with endianness variants and UTC/local time formatting. Implementation in src/evaluator/types/date.rs uses chrono crate for timestamp formatting with format string "%a %b %e %H:%M:%S %Y" matching GNU file output.
  • QDate: 64-bit Unix timestamps (qdate/qldate/beqdate/beqldate/leqdate/leqldate) with endianness variants and UTC/local time formatting. Shares formatting implementation with Date type for consistent output.
  • String: Null-terminated or length-limited strings with UTF-8 conversion using SIMD-accelerated null scanning
  • String16: UCS-2 (16-bit Unicode) strings with explicit byte order. Backs the magic(5) lestring16 (little-endian) and bestring16 (big-endian) keywords. Each character occupies two bytes; the reader stops at a U+0000 terminator (encoded as 0x00 0x00) or at the buffer end. Implementation in src/evaluator/types/string.rs decodes code units to Rust String with surrogate pairs replaced by U+FFFD.
  • PString: Pascal-style length-prefixed strings (pstring) with 1/2/4-byte length prefixes plus /J self-inclusive-length flag support. Implementation in src/evaluator/types/string.rs supports combinable flags (e.g., pstring/HJ for 2-byte length prefix with self-inclusive semantics) and provides bounds checking for both the length field and string data.
  • Regex: POSIX-extended regular expression matching using regex::bytes::Regex for binary-safe pattern matching. Implementation in src/evaluator/types/regex.rs supports case-insensitive (/c), start-offset (/s), and line-based (/l) flags with scan window capped at 8192 bytes matching GNU file behavior.
  • Search: Bounded literal byte pattern search within a mandatory range using memchr::memmem::find. Implementation in src/evaluator/types/search.rs scans forward from the rule offset up to the specified range for the first occurrence of the literal pattern.

Operators#

Eighteen operators are fully implemented in evaluation:

  • Equal (=, ==): Equality comparison with cross-type integer coercion
  • NotEqual (!, !=, <>): Inequality comparison. Accepts bare ! (magic(5) canonical form), !=, and <> as aliases.
  • LessThan (<): Less-than comparison with cross-type integer coercion
  • GreaterThan (>): Greater-than comparison with cross-type integer coercion
  • LessEqual (<=): Less-than-or-equal comparison with cross-type integer coercion
  • GreaterEqual (>=): Greater-than-or-equal comparison with cross-type integer coercion
  • BitwiseAnd (&): Bitwise AND pattern matching (returns true if result is non-zero)
  • BitwiseAndMask (&0xMASK): Applies mask to value before equality comparison
  • BitwiseXor (^): Bitwise XOR pattern matching (returns true if result is non-zero)
  • BitwiseNot (~): Applies bitwise complement to value before equality comparison
  • AnyValue (x): Unconditional match that always returns true

The comparison operators provide version checks and range matching capabilities. The bitwise operators (XOR, NOT, AnyValue) enable advanced binary pattern matching and unconditional match patterns.

Offsets#

Three offset types are fully operational with complete indirect and relative support:

  • Absolute offsets: Positive offsets from file start; negative offsets treated as offsets from file end
  • FromEnd offsets: Explicit offsets from file end with comprehensive bounds checking
  • Relative offsets: Resolve as last_match_end + delta against the previous-match anchor, following GNU file/libmagic semantics. The evaluator threads the anchor through EvaluationContext, advancing it after each successful match by the bytes consumed (variable-width types include c-string NUL terminators and pstring length prefixes). Top-level relative offsets resolve from anchor 0. Fully evaluated and implemented in PR #211.
  • Indirect offsets: Pointer dereferencing where a value read from one offset specifies the test location. Critical for PE executables and Office documents. Fully evaluated and implemented in PR #42.

Nested Rules#

The evaluator implements hierarchical rule evaluation where child rules are only evaluated if parent rules match. The implementation includes:

String Matching#

String evaluation implements:

  • Null-termination detection: Reads until the first NUL byte or buffer end
  • Length constraints: Respects optional max_length parameters
  • UTF-8 conversion: Uses String::from_utf8_lossy to replace invalid sequences with replacement characters
  • SIMD optimization: Leverages the memchr crate for efficient null byte scanning
  • Byte-exact comparison: For rules with comparison values (e.g., 0 string PATTERN), reads exactly the pattern length from the file with no NUL truncation, matching magic(5) semantics where embedded NULs in patterns must match corresponding bytes in the file

Parsed But Not Yet Evaluated#

This section previously tracked indirect offsets and relative offsets. Both offset types are fully implemented and evaluated in the evaluator/offset/ submodule:

  • Indirect offsets: Working implementation in src/evaluator/offset/indirect.rs (PR #42)
  • Relative offsets: Working implementation in src/evaluator/offset/relative.rs (PR #211)

Directive Support Status#

Magic file directives control rule behavior and output formatting. The current implementation fully supports strength modification but lacks support for MIME type specification, file extension hints, and named test composition.

Fully Implemented: !#

The parser fully supports the ! directive for modifying rule confidence scores. Supported operations include:

  • Addition: !:strength +N
  • Subtraction: !:strength -N
  • Multiplication: !:strength *N
  • Division: !:strength /N
  • Absolute set: !:strength =N or !:strength N

Implemented: Meta-Type Directives#

The parser and evaluator fully support six meta-type directives for control-flow and rule composition, implemented in PR #42 with evaluator dispatch and printf-style format substitution (%d, %u, %x, %X, %o, %s, %c) in message rendering:

  • !: Named test definitions for rule composition
  • use: References to named subroutines with endian-flip support
  • default: Conditional execution when no sibling at the same level matched
  • clear: Resets the default flag for subsequent siblings
  • indirect: Re-applies root rules at the resolved offset
  • offset: Reports the file position as a value (for message substitution)

Hex specifiers mask to the type's natural bit width, avoiding sign-extended renderings.

Not Yet Implemented#

The following directives are parsed and silently skipped at preprocessing time. Only !:strength is parsed and evaluated among !: directives:

  • !: MIME type specification for structured output (planned for v0.6.0 Directive extension point)
  • !: File extension suggestions (planned for v0.6.0 Directive extension point)
  • !: Apple-specific metadata annotations (planned for v0.6.0 Directive extension point)

These directives parse cleanly (so magic files using them can load) but are not evaluated. They are silently dropped during preprocessing rather than causing errors, allowing system magic databases like /usr/share/file/magic/filesystems to load end-to-end.

Feature Gaps and Development Epics#

Development is organized into five epics that systematically address compatibility gaps. Each epic targets specific version milestones and has defined success criteria.

Epic #53: Operator Completeness (v0.2.0 + v0.4.0)#

Status: Complete. All 18 required operators are implemented.

Implemented operators:

Epic #54: Type System Expansion (v0.2.0 + v0.3.0)#

Status: 14 of 33+ types implemented. The type system expansion is split across two releases to manage code complexity.

Version 0.2.0 targets:

Version 0.3.0 targets:

Epic #55: Offset Resolution (v0.3.0)#

Status: Complete. Both relative and indirect offset mechanisms are fully implemented.

  • Relative offset resolution tracked in Issue #38 - Implemented in PR #211. Resolves Relative(delta) as last_match_end + delta with bounds checking. Follows GNU file semantics where the anchor is global-monotonic across child recursion (no save/restore). Magic-file parser syntax (&+N/&-N) remains TODO.
  • Indirect offset resolution tracked in Issue #37 - Implemented in PR #42. Pointer dereferencing where a value read from one offset specifies the test location, critical for PE executables and Office documents.

Submodule files will be pre-created as placeholders (offset_indirect.rs, offset_relative.rs) to prevent the evaluator module from becoming oversized.

Epic #56: Core Flow Spec Compliance (v0.5.0)#

Status: 7 of 12 flows complete. This epic ensures the library and CLI behave exactly as documented in the Core Flows specification. Version 0.5.0 has been released, and v0.5.x is currently in flight (in development).

Completed flows:

  • Flow 1 (CLI Single File), Flow 2 (CLI Multiple Files), Flow 4 (Library Simple Usage), Flow 6 (Public Evaluation APIs), Flow 7 (Error Communication), Flow 9 (Hierarchical Matching), Flow 10 (Stdin Input)

Open gaps:

Epic #57: Compatibility Validation & v1.0 (v1.0.0)#

Status: 0/81 tests passing (0%). This epic validates compatibility against the GNU file test corpus and serves as the final gate for production release. All prerequisites from epics #53-#56 must complete before validation can begin.

Current test corpus results:

Compatibility targets by format category:

CategoryTestsTarget Pass RateCurrent
Binary formats (RPM, zstd, PGP, etc.)~4543+ (96%)0
Text formats (JSON, scripts, PNM)~1514+ (93%)0
Audio formats (MP3, DSD)~55 (100%)0
ZIP subtypes (DOCX, XLSX, HWP)~54+ (80%)0
Custom magic tests~44 (100%)0
Filesystem images~33 (100%)0

Version Milestones and Release History#

The project follows semantic versioning with incremental feature releases building toward full GNU file compatibility at version 1.0.0.

v0.1.0 - Released February 15, 2026#

The first public release established baseline functionality with these components:

  • CLI tool (rmagic) supporting file and stdin input
  • Built-in magic rules for 10 common file formats: ELF, PE, ZIP, TAR, GZIP, JPEG, PNG, GIF, BMP, and PDF
  • Text magic file parser with basic data types and operators
  • Core library API including MagicDatabase, evaluate_file(), and evaluate_buffer()
  • Dual output formats: JSON and human-readable text
  • Comprehensive test coverage at 94.32%

v0.1.1 - Released February 15, 2026#

An API-compatible maintenance release addressing miscellaneous tasks and regenerating the changelog to fix duplicate entries. No breaking changes or new features.

v0.2.0 - Released March 1, 2026#

Focus: Operator completeness and comparison capabilities. This release unlocks significant compatibility gains through enhanced numeric comparison capabilities.

Implemented features:

v0.3.0 - Planned#

Focus: Offset resolution and type system expansion. This release adds support for indirect offsets, text file detection, and advanced numeric types.

Completed features:

Planned features:

  • ZIP content inspection enhancements

Mandatory prerequisite refactoring:

v0.4.0 - Released March 6, 2026#

Focus: Bitwise operator completion. This release completes the operator implementation roadmap with advanced bitwise operations.

Implemented features:

Compatibility impact:

  • Increased operator coverage to 100% (18 of 18 operators implemented)
  • Enhanced support for magic files using bitwise operations
  • Breaking change: Added new Operator enum variants (requires handling in exhaustive matches)

v0.5.0 - Released#

Focus: Core flow specification compliance. This release ensures all documented behaviors work exactly as specified. Version 0.5.x is currently in flight (in development).

Implementation goals:

  • Complete all 12 Core Flows as specified with no deviations
  • Builder pattern API for advanced configuration
  • Enhanced JSON output with metadata object
  • Improved error messages with actionable troubleshooting suggestions
  • Timeout handling that returns partial results instead of hard errors

v1.0.0 - Planned#

Focus: Production-ready compatibility validation. The 1.0.0 release marks the project as production-ready with validated GNU file compatibility.

Release criteria:

Required dependencies:

Enhancement Roadmap and Development Strategy#

The project follows a carefully sequenced development approach that prioritizes prerequisite work and code quality maintenance alongside feature implementation.

Sequential Development Phases#

The roadmap emphasizes mandatory prerequisite completion ordering to prevent technical debt:

  1. v0.1.0: Baseline release with core functionality - Completed February 2026
  2. v0.2.0: Comparison operators - Completed March 2026
  3. v0.4.0: Bitwise operators - Completed March 2026
  4. v0.3.0: Refactoring tasks (#61, #62, #63) must complete before implementing offset resolution (#37, #38) and additional type features
  5. v0.5.0: All prior feature phases must complete before beginning specification compliance validation
  6. v1.0.0: All epics (#53-#56) must complete before final compatibility validation and production release

Critical Path Features#

These features have the highest impact on overall compatibility and unblock the most test cases:

  1. Comparison operators via PR #104 - Released in v0.2.0. Enable version checks and range matching for numeric values.
  2. Relative offset resolution via PR #211 - Implemented. Resolves offsets relative to previous match locations, essential for nested structures.
  3. Regex and search types via PR #214 - Implemented. Binary-safe regex matching with /c, /s, /l flags and bounded literal search, enabling text file detection (JSON, XML, scripts).
  4. Meta-type directives via PR #42 - Implemented. Control-flow directives (default, clear, name, use, indirect, offset) for rule composition, conditional execution, and subroutines. Required for advanced magic database patterns.
  5. Indirect offsets via PR #42 - Implemented. Pointer dereferencing where a value read from one offset specifies the test location, required for detecting PE executables, Office documents, and formats using internal pointers.

Code Complexity Management#

The project enforces specific code complexity thresholds that trigger mandatory refactoring before feature additions:

  • src/evaluator/operators.rs: Must not exceed 1,620 lines
  • src/evaluator/types.rs: Must not exceed 3,500-4,000 lines
  • src/main.rs: Target maximum of 600 lines

These thresholds prevent individual files from becoming unmaintainable as features expand and encourage proper modularization.

Implementation Architecture#

The libmagic-rs codebase is organized into distinct modules handling parsing, evaluation, and rule management. Key implementation files and their current status:

FilePurposeImplementation Status
src/parser/grammar.rsMagic file parsing using nom combinators2,448 lines; implements parsing for basic types, operators, offsets, and ! directive
src/parser/ast.rsAbstract syntax tree definitionsDefines all offset types including indirect/relative; includes TODO notes for validation method additions
src/parser/hierarchy.rsStack-based hierarchy constructionConverts flat rule lists parsed from magic files into parent-child tree structures
src/evaluator/mod.rsCore rule evaluation engine474 lines; implements hierarchical rule evaluation with graceful error handling and timeout checking
src/evaluator/offset.rsOffset resolution logicImplements absolute and FromEnd offsets; delegates to indirect.rs (PR #42) and relative.rs (PR #211) submodules for full offset evaluation
src/evaluator/types.rsType reading and coercionImplements byte, short, long, quad, string with SIMD optimizations; target for directory module conversion
src/evaluator/operators.rsOperator evaluationImplements =, !=, <, >, <=, >=, &, &mask, ^, ~, x with cross-type integer coercion using i128 intermediate type
docs/src/compatibility.mdCompatibility tracking documentComprehensive comparison of libmagic versus libmagic-rs features with implementation status

Architectural Differences from Original libmagic#

libmagic-rs diverges from the original C implementation in several fundamental design decisions that improve safety and maintainability at the cost of current feature parity.

Memory Safety#

Error Handling#

Thread Safety#

Performance Characteristics#

Preliminary benchmarks show competitive or superior performance despite early development status:

  • Single ELF file identification: 1.17× faster than GNU file
  • Batch processing (1000 small files): 1.09× faster
  • Large file processing (1GB): 1.07× faster
  • Magic file database loading: 1.5× faster
  • Base memory footprint: ~1.5MB versus ~2MB for GNU file
  • Large file memory usage: ~2MB versus ~16MB (benefits from memory-mapped I/O)
  • Magic File Format Specification: The text-based format used to define file identification rules with offsets, types, operators, and messages
  • GNU file Command: The reference implementation that libmagic-rs aims to replace while maintaining compatibility
  • File Type Detection: Techniques for identifying file formats through magic numbers, headers, and structural patterns
  • Rust Systems Programming: Memory-safe alternatives to traditional C system utilities
  • nom Parser Combinators: The Rust parser combinator library used to implement the magic file grammar parser
Magic File Compatibility Status | Dosu