Enum Extension And Exhaustive Match Synchronization#
Enum Extension And Exhaustive Match Synchronization is a project-specific architectural pattern in libmagic-rs that ensures when core enums (Operator, TypeKind, Value) are extended with new variants, all exhaustive pattern matches across 7+ files remain synchronized. The pattern leverages Rust's compile-time exhaustiveness checking to prevent runtime failures and maintain consistency across the parser, evaluator, code generator, and test suite. Enum extensions may also require trait derivation changes (e.g., removing Eq when adding IEEE 754 float types).
The pattern exists because libmagic-rs uses a three-layer architecture where AST definitions live in src/parser/ast.rs, parser grammar uses nom combinators in src/parser/grammar.rs, and evaluator dispatch functions reside in src/evaluator/*.rs. Each layer requires explicit handling of all enum variants. When extending the Operator enum to add comparison operators like <, >, <=, >=, developers must update pattern matches in parser token ordering, operator evaluation, strength scoring, build-time serialization (in two locations), and property test strategies. Missing any of these updates triggers compiler errors due to non-exhaustive matches, but understanding which files need updates and in what order requires project-specific knowledge.
The pattern is particularly critical due to duplicate serialize_* functions in both build.rs and src/build_helpers.rs. These functions exist because Rust build scripts cannot import from the crate being built. The functions generate Rust code for built-in magic rules and must be kept manually synchronized. Additionally, parser token ordering rules require longer operators to be tried before shorter prefixes (e.g., <= before <), adding ordering constraints beyond simple variant additions.
Core Enums#
Operator Enum#
The Operator enum defines comparison and bitwise operators for magic rule tests. Current variants:
Equal: Equality comparison (=or==)NotEqual: Inequality comparison (!=or<>)LessThan: Less-than comparison (<)GreaterThan: Greater-than comparison (>)LessEqual: Less-than-or-equal comparison (<=)GreaterEqual: Greater-than-or-equal comparison (>=)BitwiseAnd: Bitwise AND operation (&)BitwiseAndMask(u64): Bitwise AND with mask value (&0xFF)
The enum derives Debug, Clone, Serialize, Deserialize, PartialEq, Eq for use across parsing, evaluation, and serialization contexts.
TypeKind Enum#
The TypeKind enum defines data types for interpreting bytes from file buffers. Current variants:
Byte { signed: bool }: Single 8-bit byte with explicit signednessShort { endian: Endianness, signed: bool }: 16-bit integer with endianness and signedness optionsLong { endian: Endianness, signed: bool }: 32-bit integer with endianness and signedness optionsQuad { endian: Endianness, signed: bool }: 64-bit integer with endianness and signedness optionsString { max_length: Option<usize> }: Variable-length string with optional maximum length
The Endianness enum provides Little, Big, and Native byte order options for multi-byte types.
Float { endian: Endianness }: 32-bit IEEE 754 floating-point with endianness (no signed field)Double { endian: Endianness }: 64-bit IEEE 754 double-precision floating-point with endianness (no signed field)Date { endian: Endianness, utc: bool }: 32-bit Unix timestamp (seconds since epoch) with endianness and timezone optionsQDate { endian: Endianness, utc: bool }: 64-bit Unix timestamp (seconds since epoch) with endianness and timezone options
Exhaustive Match Locations#
When adding a new variant to Operator or TypeKind, exhaustive match statements must be updated in seven files:
1. AST Definition (src/parser/ast.rs)#
Define the enum variant with documentation. Include rustdoc comments explaining the variant's semantics.
2. Parser Grammar (src/parser/grammar.rs)#
Update parse_operator() or equivalent parser functions to recognize the new syntax. For operators, follow the token ordering rule: longer tokens before shorter prefixes. The function uses sequential if let checks rather than alt() combinator to maintain explicit precedence control.
When adding <= and >=, they must be parsed before < and > to prevent premature matching. The implementation uses manual lookahead with .starts_with() to reject invalid sequences like === or &&. This token ordering requirement was implemented in PR #104 for comparison operator support (released in v0.2.0).
3. Operator Evaluation (src/evaluator/operators.rs)#
Update apply_operator() function to dispatch the new variant to a handler function:
pub fn apply_operator(operator: &Operator, left: &Value, right: &Value) -> bool {
match operator {
Operator::Equal => apply_equal(left, right),
Operator::NotEqual => apply_not_equal(left, right),
Operator::LessThan => apply_less_than(left, right),
Operator::GreaterThan => apply_greater_than(left, right),
Operator::LessEqual => apply_less_equal(left, right),
Operator::GreaterEqual => apply_greater_equal(left, right),
Operator::BitwiseAnd => apply_bitwise_and(left, right),
Operator::BitwiseAndMask(mask) => { /* inline logic */ },
}
}
The test suite includes test_apply_operator_all_combinations with a second exhaustive match that verifies consistency between apply_operator() and individual handler functions.
4. Type Reading (src/evaluator/types.rs)#
Update read_typed_value() function to dispatch TypeKind variants to specialized reading functions:
pub fn read_typed_value(
buffer: &[u8],
offset: usize,
type_kind: &TypeKind,
) -> Result<Value, TypeReadError> {
match type_kind {
TypeKind::Byte { signed } => read_byte(buffer, offset, *signed),
TypeKind::Short { endian, signed } => read_short(buffer, offset, *endian, *signed),
TypeKind::Long { endian, signed } => read_long(buffer, offset, *endian, *signed),
TypeKind::Quad { endian, signed } => read_quad(buffer, offset, *endian, *signed),
TypeKind::Float { endian } => read_float(buffer, offset, *endian),
TypeKind::Double { endian } => read_double(buffer, offset, *endian),
TypeKind::Date { endian, utc } => read_date(buffer, offset, *endian, *utc),
TypeKind::QDate { endian, utc } => read_qdate(buffer, offset, *endian, *utc),
TypeKind::String { max_length } => read_string(buffer, offset, *max_length),
}
}
Multi-byte types (Short, Long, Quad, Float, Double) include nested exhaustive matches on Endianness using the byteorder crate for Little, Big, and Native variants. IEEE 754 floating-point types require special handling:
- Epsilon-aware equality: Float comparisons use
|a - b| <= f64::EPSILONrather than exact equality - Partial ordering: Comparison operators use
partial_cmpto handle NaN cases (returnsNonefor NaN operands) - Special values: Explicit handling for NaN (never equal to anything, including itself) and infinity (only equal to same-sign infinity)
5. Strength Scoring (src/evaluator/strength.rs)#
Update operator scoring and TypeKind scoring:
// Operator strength scores
strength += match &rule.op {
Operator::Equal => 10, // Most specific
Operator::NotEqual => 5,
Operator::LessThan
| Operator::GreaterThan
| Operator::LessEqual
| Operator::GreaterEqual => 6, // Moderately specific
Operator::BitwiseAndMask(_) => 7, // Moderately specific
Operator::BitwiseAnd => 3, // Least specific
};
// TypeKind strength scores
strength += match &rule.typ {
TypeKind::String { max_length } => {
if max_length.is_some() { 25 } else { 20 }
}
TypeKind::Quad { .. } | TypeKind::Double { .. } | TypeKind::QDate { .. } => 16,
TypeKind::Long { .. } | TypeKind::Float { .. } | TypeKind::Date { .. } => 15,
TypeKind::Short { .. } => 10,
TypeKind::Byte { .. } => 5,
};
Strength scores prioritize more specific operators and types for accurate file type detection. Floating-point types (Float, Double) are assigned the same scores as their integer counterparts (Long, Quad) based on bit width. Date and QDate types follow the same scoring pattern based on their 32-bit and 64-bit sizes.
6. Build-Time Serialization (build.rs AND src/build_helpers.rs)#
Critical: Both files contain duplicate serialize_operator() and serialize_type_kind() functions that must be updated identically:
fn serialize_operator(op: &Operator) -> String {
match op {
Operator::Equal => "Operator::Equal".to_string(),
Operator::NotEqual => "Operator::NotEqual".to_string(),
Operator::LessThan => "Operator::LessThan".to_string(),
Operator::GreaterThan => "Operator::GreaterThan".to_string(),
Operator::LessEqual => "Operator::LessEqual".to_string(),
Operator::GreaterEqual => "Operator::GreaterEqual".to_string(),
Operator::BitwiseAnd => "Operator::BitwiseAnd".to_string(),
Operator::BitwiseAndMask(mask) => format!("Operator::BitwiseAndMask({mask})"),
}
}
The duplication exists because Rust build scripts cannot import from the crate being built. The build_helpers module is conditionally compiled with #[cfg(any(test, doc))] to enable comprehensive testing of build process logic. Tests exist only in build_helpers.rs, not in build.rs.
7. Property Test Strategies (tests/property_tests.rs)#
Update proptest generators to exhaustively cover all enum variants:
fn arb_operator() -> impl Strategy<Value = Operator> {
prop_oneof![
Just(Operator::Equal),
Just(Operator::NotEqual),
Just(Operator::LessThan),
Just(Operator::GreaterThan),
Just(Operator::LessEqual),
Just(Operator::GreaterEqual),
Just(Operator::BitwiseAnd),
(0u64..=255u64).prop_map(Operator::BitwiseAndMask),
]
}
The arb_type_kind() strategy generates all TypeKind variants with combinations of endianness and signedness options. PR #133 updated this strategy to include Quad variants and all three Endianness options (Little, Big, and Native).
Critical Synchronization Challenges#
Duplicate Serialization Functions#
The most error-prone aspect is maintaining identical implementations in build.rs and src/build_helpers.rs. When adding a new operator or type variant, both files must receive identical updates, or generated built-in rules will be malformed. The build_helpers module comment explicitly documents this architectural constraint.
Parser Token Ordering#
When adding operators to the grammar, follow the rule: longer tokens must be parsed before shorter prefixes. The parse_operator() implementation parses <= and >= before < and > to prevent premature matching. The implementation uses sequential if let checks with manual lookahead rather than nom's alt() combinator for explicit precedence control.
Implemented ordering for comparison operators in src/parser/grammar.rs:
<=and>=(two-character operators)<and>(single-character operators)
Test coverage validates this precedence behavior.
Proptest Strategy Completeness#
Property test strategies must exhaustively generate all enum variants for comprehensive fuzzing. The current strategies have coverage gaps that should be addressed when extending enums.
Usage and Extension Workflow#
Follow this sequence when adding a new enum variant:
- Define the enum variant in
src/parser/ast.rswith documentation - Update parser grammar in
src/parser/grammar.rs, respecting token ordering rules for operators - Implement evaluator logic in appropriate modules (
operators.rsforOperator,types.rsforTypeKind) - Update strength scoring in
src/evaluator/strength.rswith appropriate score assignments - Duplicate in build files - Update both
build.rsANDsrc/build_helpers.rsidentically - Extend property test strategies in
tests/property_tests.rsfor exhaustive coverage - Run exhaustive match compiler checks - Rust's compiler will flag all non-exhaustive matches
- Verify serialization round-trips - Test that built-in rules remain valid after changes
- Add comprehensive unit tests for the new variant in relevant module test sections
The project's zero-warnings policy with Clippy automatically catches non-exhaustive matches at compile time, guiding developers to remaining update locations.
Historical Examples#
StrengthModifier Enum Addition (PR #30)#
PR #30 added the StrengthModifier enum with variants Add, Subtract, Multiply, Divide, and Set. Files modified:
build.rs- Import statements updatedsrc/build_helpers.rs- Import statements and duplicate serialization functions added- Test files - All
MagicRuleconstructors updated to includestrength_modifier: None
Issues encountered:
- Dead code warnings requiring
#[allow(dead_code, unused_imports)]annotations in build scripts - Test constructor cascades: all tests creating structs needed updates for new optional fields
Comparison Operators (PR #104, Released in v0.2.0)#
PR #104 implemented comparison operators (<, >, <=, >=) as requested in Issue #34. Released in v0.2.0, this PR serves as a real-world example of the enum extension process described in this document. Files modified:
- Added 4 new
Operatorenum variants (LessThan,GreaterThan,LessEqual,GreaterEqual) tosrc/parser/ast.rs - Updated parser in
src/parser/grammar.rswith correct token ordering (<=before<,>=before>) - Implemented comparison logic in
src/evaluator/operators.rswithcompare_values()helper function supporting cross-type integer coercion viai128and lexicographic string comparison - Updated strength scoring in
src/evaluator/strength.rs(comparison operators scored at 6, moderately specific) - Synchronized updates to both
build.rsandsrc/build_helpers.rsserialization functions - Extended proptest strategies in
tests/property_tests.rsto cover all comparison operator variants
The PR also modified TypeKind::Byte from a unit variant to Byte { signed: bool } for explicit signedness, requiring cascading updates across all exhaustive matches in the codebase. The read_byte function signature changed from 2 parameters to 3 parameters, adding the signed parameter. These changes are released in v0.2.0.
Quad Type Implementation (PR #133)#
PR #133 added 64-bit integer support through the TypeKind::Quad variant, demonstrating the exact enum extension workflow described in this document. All seven exhaustive match locations were updated:
- AST definition (
src/parser/ast.rs): AddedQuad { endian, signed }variant with documentation and examples - Parser grammar (
src/parser/grammar.rs): Implemented parsing forquad,uquad,lequad,ulequad,bequad, andubequadtype names with support for full 64-bit mask values in attached operators - Type reading (
src/evaluator/types.rs): Implementedread_quad()function with endianness handling and integrated intoread_typed_value()dispatch - Strength scoring (
src/evaluator/strength.rs): Assigned Quad types a score of 16, higher than Long (15) to reflect greater specificity - Build-time serialization: Updated
serialize_type_kind()identically in bothbuild.rsandsrc/build_helpers.rsto handle Quad variants - Property test strategies (
tests/property_tests.rs): Extendedarb_type_kind()to generate Quad variants with all endianness options - Value coercion (
src/evaluator/types.rs): Added handling for unsigned values abovei64::MAXwhen coercing to signed Quad types
The PR also enhanced parser numeric literal handling to support the full unsigned 64-bit range (0 to u64::MAX), required for magic rules matching values like 0xffffffffffffffff.
Float and Double Type Implementation (PR #162)#
PR #162 added IEEE 754 floating-point support through TypeKind::Float and TypeKind::Double variants and a new Value::Float(f64) variant, demonstrating the enum extension pattern across 17 changed files. This PR illustrates how enum extensions can cascade to affect trait derivations: the Value enum no longer derives Eq due to IEEE 754 NaN semantics. All seven exhaustive match locations were updated:
- AST definition (
src/parser/ast.rs): AddedFloat { endian }andDouble { endian }variants toTypeKind(no signed field -- IEEE 754 handles sign internally). AddedValue::Float(f64)variant. RemovedEqderivation fromValueenum due to NaN incompatibility. - Parser grammar (
src/parser/types.rs): Implemented parsing for 6 new type keywords (float,befloat,lefloat,double,bedouble,ledouble) inparse_type_keywordandtype_keyword_to_kindfunctions. Addedparse_float_valuegrammar function with mandatory decimal point to distinguish float literals from integers. - Type reading (
src/evaluator/types/float.rs): Created new submodule withread_float()(4 bytes, widened to f64) andread_double()(8 bytes) functions. Both returnValue::Float(f64)and include endianness dispatch and comprehensive unit tests covering buffer overrun, offset overflow, and all endianness variants. - Strength scoring (
src/evaluator/strength.rs): Assigned Float types a score of 15 (same as Long) and Double types a score of 16 (same as Quad) based on bit width. Updated value length bonus logic to handleValue::Float(no length bonus for numeric types). - Build-time serialization (
src/build_helpers.rs): Updatedserialize_type_kind()to handle Float and Double variants. Addedserialize_value()support forValue::Float. Tests verify correct serialization of all endianness combinations. - Property test strategies (
tests/property_tests.rs): Extendedarb_type_kind()to generate Float and Double variants with all endianness options. Extendedarb_value()to generateValue::Floatwith range-1e10..1e10. - Operator evaluation: Updated
apply_equal()andapply_not_equal()insrc/evaluator/operators/equality.rsto use epsilon-aware equality (|a - b| <= f64::EPSILON) with explicit NaN and infinity handling. Updatedcompare_values()insrc/evaluator/operators/comparison.rsto usepartial_cmpfor float ordering, returningNonefor NaN comparisons.
The PR also updated output modules (src/output/json.rs, src/output/mod.rs) and code generation (src/parser/codegen.rs) to handle Value::Float in exhaustive matches. Integration tests in tests/evaluator_tests.rs verify end-to-end float/double evaluation through evaluate_rules.
Date and QDate Type Implementation (PR #165, Released in v0.5.0)#
PR #165 added Unix timestamp support through TypeKind::Date (32-bit) and TypeKind::QDate (64-bit) variants, completing the date/timestamp feature requested in Issue #41. Released in v0.5.0, this PR demonstrates the enum extension workflow with special handling for timestamp formatting. All seven exhaustive match locations were updated:
- AST definition (
src/parser/ast.rs): AddedDate { endian, utc }andQDate { endian, utc }variants toTypeKind. Both includeutc: boolfield to control UTC vs local time interpretation. UpdatedTypeKind::bit_size()to return 32 for Date and 64 for QDate. - Parser grammar (
src/parser/types.rs): Implemented parsing for 12 new type keywords covering all endianness and timezone combinations (date,ldate,bedate,beldate,ledate,leldate,qdate,qldate,beqdate,beqldate,leqdate,leqldate) inparse_type_keywordandtype_keyword_to_kindfunctions. - Type reading (
src/evaluator/types/date.rs): Created new submodule withread_date()(4 bytes) andread_qdate()(8 bytes) functions. Both returnValue::Stringcontaining formatted timestamps matching GNUfileoutput format. Usedchronocrate for timestamp formatting with UTC and local time support. - Strength scoring (
src/evaluator/strength.rs): Assigned Date types a score of 15 (same as Long) and QDate types a score of 16 (same as Quad) based on bit width, maintaining consistency with integer types of the same size. - Build-time serialization (
src/parser/codegen.rs): Updatedserialize_type_kind()to handle Date and QDate variants, serializing both endianness and utc fields. - Property test strategies: Extended type generation to cover Date and QDate variants with all endianness and timezone options.
- Value coercion (
src/evaluator/types/mod.rs): Added special handling to coerce numeric expected values (bothValue::UintandValue::Int) into formatted timestamp strings for date types, ensuring correct matching behavior.
Date types read integer timestamps from file buffers and format them as human-readable strings, requiring value coercion logic to convert numeric expected values from magic rules into formatted strings for comparison.
Preemptive Modularization Strategy#
Issue #62 recommended creating focused submodules before implementing v0.2.0 features to prevent file oversizing as enums grow. The approach:
- Create new submodules for unimplemented features with placeholder functions marked with
#[ignore]tests and TODO comments - For comparison operators: create
src/evaluator/comparison.rswith placeholder functions - This preemptive modularization keeps individual files under 600 lines and maintains clear separation of concerns
Relevant Code Files#
| File | Purpose | Key Functions/Matches |
|---|---|---|
| src/parser/ast.rs | Enum definitions | Operator (106-117), TypeKind (80-104), Endianness (133-141) |
| src/parser/grammar.rs | Parser implementations | parse_operator() (165-220), token ordering logic |
| src/evaluator/operators.rs | Operator evaluation | apply_operator() (219-239), test match (1569-1619) |
| src/evaluator/types.rs | Type reading | read_typed_value() (350-361), endianness matches |
| src/evaluator/strength.rs | Strength calculation | Operator match (89-98), TypeKind match (72-86) |
| src/build_helpers.rs | Build helper functions | serialize_operator() (234-241), serialize_type_kind() (212-232) |
| build.rs | Build script | Duplicate serialization functions (270-290, 292-299) |
| tests/property_tests.rs | Proptest strategies | arb_operator() (58-65), arb_type_kind() (28-55) |
Related Topics#
- Parser-Evaluator Architecture: The three-layer design requiring exhaustive matches across AST, parser, and evaluator
- Type System And Operator Coverage: Current and planned type/operator variants
- Magic File Compatibility Status: Feature gaps addressed by enum extensions
- Rust Exhaustiveness Checking: Language-level guarantee that all enum variants are handled in match expressions
- Build Script Limitations: Architectural constraint preventing code reuse between build.rs and library code