Documents
Enum Extension And Exhaustive Match Synchronization
Enum Extension And Exhaustive Match Synchronization
Type
Topic
Status
Published
Created
Mar 1, 2026
Updated
Mar 8, 2026
Created by
Dosu Bot
Updated by
Dosu Bot

Enum Extension And Exhaustive Match Synchronization#

Enum Extension And Exhaustive Match Synchronization is a project-specific architectural pattern in libmagic-rs that ensures when core enums (Operator, TypeKind, Value) are extended with new variants, all exhaustive pattern matches across 7+ files remain synchronized. The pattern leverages Rust's compile-time exhaustiveness checking to prevent runtime failures and maintain consistency across the parser, evaluator, code generator, and test suite. Enum extensions may also require trait derivation changes (e.g., removing Eq when adding IEEE 754 float types).

The pattern exists because libmagic-rs uses a three-layer architecture where AST definitions live in src/parser/ast.rs, parser grammar uses nom combinators in src/parser/grammar.rs, and evaluator dispatch functions reside in src/evaluator/*.rs. Each layer requires explicit handling of all enum variants. When extending the Operator enum to add comparison operators like <, >, <=, >=, developers must update pattern matches in parser token ordering, operator evaluation, strength scoring, build-time serialization (in two locations), and property test strategies. Missing any of these updates triggers compiler errors due to non-exhaustive matches, but understanding which files need updates and in what order requires project-specific knowledge.

The pattern is particularly critical due to duplicate serialize_* functions in both build.rs and src/build_helpers.rs. These functions exist because Rust build scripts cannot import from the crate being built. The functions generate Rust code for built-in magic rules and must be kept manually synchronized. Additionally, parser token ordering rules require longer operators to be tried before shorter prefixes (e.g., <= before <), adding ordering constraints beyond simple variant additions.

Core Enums#

Operator Enum#

The Operator enum defines comparison and bitwise operators for magic rule tests. Current variants:

  • Equal: Equality comparison (= or ==)
  • NotEqual: Inequality comparison (!= or <>)
  • LessThan: Less-than comparison (<)
  • GreaterThan: Greater-than comparison (>)
  • LessEqual: Less-than-or-equal comparison (<=)
  • GreaterEqual: Greater-than-or-equal comparison (>=)
  • BitwiseAnd: Bitwise AND operation (&)
  • BitwiseAndMask(u64): Bitwise AND with mask value (&0xFF)

The enum derives Debug, Clone, Serialize, Deserialize, PartialEq, Eq for use across parsing, evaluation, and serialization contexts.

TypeKind Enum#

The TypeKind enum defines data types for interpreting bytes from file buffers. Current variants:

  • Byte { signed: bool }: Single 8-bit byte with explicit signedness
  • Short { endian: Endianness, signed: bool }: 16-bit integer with endianness and signedness options
  • Long { endian: Endianness, signed: bool }: 32-bit integer with endianness and signedness options
  • Quad { endian: Endianness, signed: bool }: 64-bit integer with endianness and signedness options
  • String { max_length: Option<usize> }: Variable-length string with optional maximum length

The Endianness enum provides Little, Big, and Native byte order options for multi-byte types.

  • Float { endian: Endianness }: 32-bit IEEE 754 floating-point with endianness (no signed field)
  • Double { endian: Endianness }: 64-bit IEEE 754 double-precision floating-point with endianness (no signed field)
  • Date { endian: Endianness, utc: bool }: 32-bit Unix timestamp (seconds since epoch) with endianness and timezone options
  • QDate { endian: Endianness, utc: bool }: 64-bit Unix timestamp (seconds since epoch) with endianness and timezone options

Exhaustive Match Locations#

When adding a new variant to Operator or TypeKind, exhaustive match statements must be updated in seven files:

1. AST Definition (src/parser/ast.rs)#

Define the enum variant with documentation. Include rustdoc comments explaining the variant's semantics.

2. Parser Grammar (src/parser/grammar.rs)#

Update parse_operator() or equivalent parser functions to recognize the new syntax. For operators, follow the token ordering rule: longer tokens before shorter prefixes. The function uses sequential if let checks rather than alt() combinator to maintain explicit precedence control.

When adding <= and >=, they must be parsed before < and > to prevent premature matching. The implementation uses manual lookahead with .starts_with() to reject invalid sequences like === or &&. This token ordering requirement was implemented in PR #104 for comparison operator support (released in v0.2.0).

3. Operator Evaluation (src/evaluator/operators.rs)#

Update apply_operator() function to dispatch the new variant to a handler function:

pub fn apply_operator(operator: &Operator, left: &Value, right: &Value) -> bool {
    match operator {
        Operator::Equal => apply_equal(left, right),
        Operator::NotEqual => apply_not_equal(left, right),
        Operator::LessThan => apply_less_than(left, right),
        Operator::GreaterThan => apply_greater_than(left, right),
        Operator::LessEqual => apply_less_equal(left, right),
        Operator::GreaterEqual => apply_greater_equal(left, right),
        Operator::BitwiseAnd => apply_bitwise_and(left, right),
        Operator::BitwiseAndMask(mask) => { /* inline logic */ },
    }
}

The test suite includes test_apply_operator_all_combinations with a second exhaustive match that verifies consistency between apply_operator() and individual handler functions.

4. Type Reading (src/evaluator/types.rs)#

Update read_typed_value() function to dispatch TypeKind variants to specialized reading functions:

pub fn read_typed_value(
    buffer: &[u8],
    offset: usize,
    type_kind: &TypeKind,
) -> Result<Value, TypeReadError> {
    match type_kind {
        TypeKind::Byte { signed } => read_byte(buffer, offset, *signed),
        TypeKind::Short { endian, signed } => read_short(buffer, offset, *endian, *signed),
        TypeKind::Long { endian, signed } => read_long(buffer, offset, *endian, *signed),
        TypeKind::Quad { endian, signed } => read_quad(buffer, offset, *endian, *signed),
        TypeKind::Float { endian } => read_float(buffer, offset, *endian),
        TypeKind::Double { endian } => read_double(buffer, offset, *endian),
        TypeKind::Date { endian, utc } => read_date(buffer, offset, *endian, *utc),
        TypeKind::QDate { endian, utc } => read_qdate(buffer, offset, *endian, *utc),
        TypeKind::String { max_length } => read_string(buffer, offset, *max_length),
    }
}

Multi-byte types (Short, Long, Quad, Float, Double) include nested exhaustive matches on Endianness using the byteorder crate for Little, Big, and Native variants. IEEE 754 floating-point types require special handling:

  • Epsilon-aware equality: Float comparisons use |a - b| <= f64::EPSILON rather than exact equality
  • Partial ordering: Comparison operators use partial_cmp to handle NaN cases (returns None for NaN operands)
  • Special values: Explicit handling for NaN (never equal to anything, including itself) and infinity (only equal to same-sign infinity)

5. Strength Scoring (src/evaluator/strength.rs)#

Update operator scoring and TypeKind scoring:

// Operator strength scores
strength += match &rule.op {
    Operator::Equal => 10, // Most specific
    Operator::NotEqual => 5,
    Operator::LessThan
    | Operator::GreaterThan
    | Operator::LessEqual
    | Operator::GreaterEqual => 6, // Moderately specific
    Operator::BitwiseAndMask(_) => 7, // Moderately specific
    Operator::BitwiseAnd => 3, // Least specific
};

// TypeKind strength scores
strength += match &rule.typ {
    TypeKind::String { max_length } => {
        if max_length.is_some() { 25 } else { 20 }
    }
    TypeKind::Quad { .. } | TypeKind::Double { .. } | TypeKind::QDate { .. } => 16,
    TypeKind::Long { .. } | TypeKind::Float { .. } | TypeKind::Date { .. } => 15,
    TypeKind::Short { .. } => 10,
    TypeKind::Byte { .. } => 5,
};

Strength scores prioritize more specific operators and types for accurate file type detection. Floating-point types (Float, Double) are assigned the same scores as their integer counterparts (Long, Quad) based on bit width. Date and QDate types follow the same scoring pattern based on their 32-bit and 64-bit sizes.

6. Build-Time Serialization (build.rs AND src/build_helpers.rs)#

Critical: Both files contain duplicate serialize_operator() and serialize_type_kind() functions that must be updated identically:

fn serialize_operator(op: &Operator) -> String {
    match op {
        Operator::Equal => "Operator::Equal".to_string(),
        Operator::NotEqual => "Operator::NotEqual".to_string(),
        Operator::LessThan => "Operator::LessThan".to_string(),
        Operator::GreaterThan => "Operator::GreaterThan".to_string(),
        Operator::LessEqual => "Operator::LessEqual".to_string(),
        Operator::GreaterEqual => "Operator::GreaterEqual".to_string(),
        Operator::BitwiseAnd => "Operator::BitwiseAnd".to_string(),
        Operator::BitwiseAndMask(mask) => format!("Operator::BitwiseAndMask({mask})"),
    }
}

The duplication exists because Rust build scripts cannot import from the crate being built. The build_helpers module is conditionally compiled with #[cfg(any(test, doc))] to enable comprehensive testing of build process logic. Tests exist only in build_helpers.rs, not in build.rs.

7. Property Test Strategies (tests/property_tests.rs)#

Update proptest generators to exhaustively cover all enum variants:

fn arb_operator() -> impl Strategy<Value = Operator> {
    prop_oneof![
        Just(Operator::Equal),
        Just(Operator::NotEqual),
        Just(Operator::LessThan),
        Just(Operator::GreaterThan),
        Just(Operator::LessEqual),
        Just(Operator::GreaterEqual),
        Just(Operator::BitwiseAnd),
        (0u64..=255u64).prop_map(Operator::BitwiseAndMask),
    ]
}

The arb_type_kind() strategy generates all TypeKind variants with combinations of endianness and signedness options. PR #133 updated this strategy to include Quad variants and all three Endianness options (Little, Big, and Native).

Critical Synchronization Challenges#

Duplicate Serialization Functions#

The most error-prone aspect is maintaining identical implementations in build.rs and src/build_helpers.rs. When adding a new operator or type variant, both files must receive identical updates, or generated built-in rules will be malformed. The build_helpers module comment explicitly documents this architectural constraint.

Parser Token Ordering#

When adding operators to the grammar, follow the rule: longer tokens must be parsed before shorter prefixes. The parse_operator() implementation parses <= and >= before < and > to prevent premature matching. The implementation uses sequential if let checks with manual lookahead rather than nom's alt() combinator for explicit precedence control.

Implemented ordering for comparison operators in src/parser/grammar.rs:

  1. <= and >= (two-character operators)
  2. < and > (single-character operators)

Test coverage validates this precedence behavior.

Proptest Strategy Completeness#

Property test strategies must exhaustively generate all enum variants for comprehensive fuzzing. The current strategies have coverage gaps that should be addressed when extending enums.

Usage and Extension Workflow#

Follow this sequence when adding a new enum variant:

  1. Define the enum variant in src/parser/ast.rs with documentation
  2. Update parser grammar in src/parser/grammar.rs, respecting token ordering rules for operators
  3. Implement evaluator logic in appropriate modules (operators.rs for Operator, types.rs for TypeKind)
  4. Update strength scoring in src/evaluator/strength.rs with appropriate score assignments
  5. Duplicate in build files - Update both build.rs AND src/build_helpers.rs identically
  6. Extend property test strategies in tests/property_tests.rs for exhaustive coverage
  7. Run exhaustive match compiler checks - Rust's compiler will flag all non-exhaustive matches
  8. Verify serialization round-trips - Test that built-in rules remain valid after changes
  9. Add comprehensive unit tests for the new variant in relevant module test sections

The project's zero-warnings policy with Clippy automatically catches non-exhaustive matches at compile time, guiding developers to remaining update locations.

Historical Examples#

StrengthModifier Enum Addition (PR #30)#

PR #30 added the StrengthModifier enum with variants Add, Subtract, Multiply, Divide, and Set. Files modified:

  • build.rs - Import statements updated
  • src/build_helpers.rs - Import statements and duplicate serialization functions added
  • Test files - All MagicRule constructors updated to include strength_modifier: None

Issues encountered:

Comparison Operators (PR #104, Released in v0.2.0)#

PR #104 implemented comparison operators (<, >, <=, >=) as requested in Issue #34. Released in v0.2.0, this PR serves as a real-world example of the enum extension process described in this document. Files modified:

  1. Added 4 new Operator enum variants (LessThan, GreaterThan, LessEqual, GreaterEqual) to src/parser/ast.rs
  2. Updated parser in src/parser/grammar.rs with correct token ordering (<= before <, >= before >)
  3. Implemented comparison logic in src/evaluator/operators.rs with compare_values() helper function supporting cross-type integer coercion via i128 and lexicographic string comparison
  4. Updated strength scoring in src/evaluator/strength.rs (comparison operators scored at 6, moderately specific)
  5. Synchronized updates to both build.rs and src/build_helpers.rs serialization functions
  6. Extended proptest strategies in tests/property_tests.rs to cover all comparison operator variants

The PR also modified TypeKind::Byte from a unit variant to Byte { signed: bool } for explicit signedness, requiring cascading updates across all exhaustive matches in the codebase. The read_byte function signature changed from 2 parameters to 3 parameters, adding the signed parameter. These changes are released in v0.2.0.

Quad Type Implementation (PR #133)#

PR #133 added 64-bit integer support through the TypeKind::Quad variant, demonstrating the exact enum extension workflow described in this document. All seven exhaustive match locations were updated:

  1. AST definition (src/parser/ast.rs): Added Quad { endian, signed } variant with documentation and examples
  2. Parser grammar (src/parser/grammar.rs): Implemented parsing for quad, uquad, lequad, ulequad, bequad, and ubequad type names with support for full 64-bit mask values in attached operators
  3. Type reading (src/evaluator/types.rs): Implemented read_quad() function with endianness handling and integrated into read_typed_value() dispatch
  4. Strength scoring (src/evaluator/strength.rs): Assigned Quad types a score of 16, higher than Long (15) to reflect greater specificity
  5. Build-time serialization: Updated serialize_type_kind() identically in both build.rs and src/build_helpers.rs to handle Quad variants
  6. Property test strategies (tests/property_tests.rs): Extended arb_type_kind() to generate Quad variants with all endianness options
  7. Value coercion (src/evaluator/types.rs): Added handling for unsigned values above i64::MAX when coercing to signed Quad types

The PR also enhanced parser numeric literal handling to support the full unsigned 64-bit range (0 to u64::MAX), required for magic rules matching values like 0xffffffffffffffff.

Float and Double Type Implementation (PR #162)#

PR #162 added IEEE 754 floating-point support through TypeKind::Float and TypeKind::Double variants and a new Value::Float(f64) variant, demonstrating the enum extension pattern across 17 changed files. This PR illustrates how enum extensions can cascade to affect trait derivations: the Value enum no longer derives Eq due to IEEE 754 NaN semantics. All seven exhaustive match locations were updated:

  1. AST definition (src/parser/ast.rs): Added Float { endian } and Double { endian } variants to TypeKind (no signed field -- IEEE 754 handles sign internally). Added Value::Float(f64) variant. Removed Eq derivation from Value enum due to NaN incompatibility.
  2. Parser grammar (src/parser/types.rs): Implemented parsing for 6 new type keywords (float, befloat, lefloat, double, bedouble, ledouble) in parse_type_keyword and type_keyword_to_kind functions. Added parse_float_value grammar function with mandatory decimal point to distinguish float literals from integers.
  3. Type reading (src/evaluator/types/float.rs): Created new submodule with read_float() (4 bytes, widened to f64) and read_double() (8 bytes) functions. Both return Value::Float(f64) and include endianness dispatch and comprehensive unit tests covering buffer overrun, offset overflow, and all endianness variants.
  4. Strength scoring (src/evaluator/strength.rs): Assigned Float types a score of 15 (same as Long) and Double types a score of 16 (same as Quad) based on bit width. Updated value length bonus logic to handle Value::Float (no length bonus for numeric types).
  5. Build-time serialization (src/build_helpers.rs): Updated serialize_type_kind() to handle Float and Double variants. Added serialize_value() support for Value::Float. Tests verify correct serialization of all endianness combinations.
  6. Property test strategies (tests/property_tests.rs): Extended arb_type_kind() to generate Float and Double variants with all endianness options. Extended arb_value() to generate Value::Float with range -1e10..1e10.
  7. Operator evaluation: Updated apply_equal() and apply_not_equal() in src/evaluator/operators/equality.rs to use epsilon-aware equality (|a - b| <= f64::EPSILON) with explicit NaN and infinity handling. Updated compare_values() in src/evaluator/operators/comparison.rs to use partial_cmp for float ordering, returning None for NaN comparisons.

The PR also updated output modules (src/output/json.rs, src/output/mod.rs) and code generation (src/parser/codegen.rs) to handle Value::Float in exhaustive matches. Integration tests in tests/evaluator_tests.rs verify end-to-end float/double evaluation through evaluate_rules.

Date and QDate Type Implementation (PR #165, Released in v0.5.0)#

PR #165 added Unix timestamp support through TypeKind::Date (32-bit) and TypeKind::QDate (64-bit) variants, completing the date/timestamp feature requested in Issue #41. Released in v0.5.0, this PR demonstrates the enum extension workflow with special handling for timestamp formatting. All seven exhaustive match locations were updated:

  1. AST definition (src/parser/ast.rs): Added Date { endian, utc } and QDate { endian, utc } variants to TypeKind. Both include utc: bool field to control UTC vs local time interpretation. Updated TypeKind::bit_size() to return 32 for Date and 64 for QDate.
  2. Parser grammar (src/parser/types.rs): Implemented parsing for 12 new type keywords covering all endianness and timezone combinations (date, ldate, bedate, beldate, ledate, leldate, qdate, qldate, beqdate, beqldate, leqdate, leqldate) in parse_type_keyword and type_keyword_to_kind functions.
  3. Type reading (src/evaluator/types/date.rs): Created new submodule with read_date() (4 bytes) and read_qdate() (8 bytes) functions. Both return Value::String containing formatted timestamps matching GNU file output format. Used chrono crate for timestamp formatting with UTC and local time support.
  4. Strength scoring (src/evaluator/strength.rs): Assigned Date types a score of 15 (same as Long) and QDate types a score of 16 (same as Quad) based on bit width, maintaining consistency with integer types of the same size.
  5. Build-time serialization (src/parser/codegen.rs): Updated serialize_type_kind() to handle Date and QDate variants, serializing both endianness and utc fields.
  6. Property test strategies: Extended type generation to cover Date and QDate variants with all endianness and timezone options.
  7. Value coercion (src/evaluator/types/mod.rs): Added special handling to coerce numeric expected values (both Value::Uint and Value::Int) into formatted timestamp strings for date types, ensuring correct matching behavior.

Date types read integer timestamps from file buffers and format them as human-readable strings, requiring value coercion logic to convert numeric expected values from magic rules into formatted strings for comparison.

Preemptive Modularization Strategy#

Issue #62 recommended creating focused submodules before implementing v0.2.0 features to prevent file oversizing as enums grow. The approach:

  • Create new submodules for unimplemented features with placeholder functions marked with #[ignore] tests and TODO comments
  • For comparison operators: create src/evaluator/comparison.rs with placeholder functions
  • This preemptive modularization keeps individual files under 600 lines and maintains clear separation of concerns

Relevant Code Files#

FilePurposeKey Functions/Matches
src/parser/ast.rsEnum definitionsOperator (106-117), TypeKind (80-104), Endianness (133-141)
src/parser/grammar.rsParser implementationsparse_operator() (165-220), token ordering logic
src/evaluator/operators.rsOperator evaluationapply_operator() (219-239), test match (1569-1619)
src/evaluator/types.rsType readingread_typed_value() (350-361), endianness matches
src/evaluator/strength.rsStrength calculationOperator match (89-98), TypeKind match (72-86)
src/build_helpers.rsBuild helper functionsserialize_operator() (234-241), serialize_type_kind() (212-232)
build.rsBuild scriptDuplicate serialization functions (270-290, 292-299)
tests/property_tests.rsProptest strategiesarb_operator() (58-65), arb_type_kind() (28-55)
  • Parser-Evaluator Architecture: The three-layer design requiring exhaustive matches across AST, parser, and evaluator
  • Type System And Operator Coverage: Current and planned type/operator variants
  • Magic File Compatibility Status: Feature gaps addressed by enum extensions
  • Rust Exhaustiveness Checking: Language-level guarantee that all enum variants are handled in match expressions
  • Build Script Limitations: Architectural constraint preventing code reuse between build.rs and library code
Enum Extension And Exhaustive Match Synchronization | Dosu