Float and Double Type Support#

Lead Section#

Float and Double type support is an implemented feature (as of PR #162) for libmagic-rs that enables detection and evaluation of IEEE 754 floating-point data in binary files. The implementation adds six magic file keywords (float, befloat, lefloat, double, bedouble, ledouble) corresponding to 32-bit and 64-bit floating-point types with configurable endianness.

The implementation extends the TypeKind enum in src/parser/ast.rs with Float and Double variants, adds a Value::Float(f64) variant to the Value enum, and implements buffer reading logic for IEEE 754 binary representation. Critically, the Eq derive has been removed from the Value enum due to IEEE 754 NaN semantics, which violate Rust's Eq trait requirements (NaN != NaN by specification).

This article documents the complete architecture for float/double support, including AST structure, parser grammar, evaluator logic, strength calculation, comparison operations, and the cascading code changes across 10+ files required to maintain exhaustive pattern matching invariants.

IEEE 754 Type Architecture#

TypeKind Enum Extensions#

The TypeKind enum has been extended with two floating-point variants following the same pattern as multi-byte integer types (Short, Long, Quad):

pub enum TypeKind {
    // ... existing variants
    Float { endian: Endianness }, // 32-bit IEEE 754 single-precision
    Double { endian: Endianness }, // 64-bit IEEE 754 double-precision
}

Unlike integer types, Float and Double do not include a signed field because IEEE 754 represents the sign as a separate bit in the floating-point format, making all floating-point values inherently signed. This contrasts with integer types like Long and Quad, which have both endian: Endianness and signed: bool fields.

Endianness Support#

The Endianness enum provides three byte order options that apply to both Float and Double types:

Little: Least significant byte first (little-endian)
Big: Most significant byte first (big-endian)
Native: System-dependent byte order (matches target architecture)

These correspond to the six magic file keywords:

Keyword	Type	Endianness	TypeKind Variant
`float`	32-bit	Native	`TypeKind::Float { endian: Endianness::Native }`
`lefloat`	32-bit	Little	`TypeKind::Float { endian: Endianness::Little }`
`befloat`	32-bit	Big	`TypeKind::Float { endian: Endianness::Big }`
`double`	64-bit	Native	`TypeKind::Double { endian: Endianness::Native }`
`ledouble`	64-bit	Little	`TypeKind::Double { endian: Endianness::Little }`
`bedouble`	64-bit	Big	`TypeKind::Double { endian: Endianness::Big }`

Parser Grammar Implementation#

Type Keyword Parsing#

Type keyword parsing in src/parser/types.rs uses nom's alt() combinator to recognize type names. Following the pattern for integer types, float and double keywords must be organized by bit width with longest prefixes first to prevent ambiguous matches:

// 64-bit floating-point types
alt((
    tag("bedouble"),
    tag("ledouble"),
    tag("double"),
)),
// 32-bit floating-point types
alt((
    tag("befloat"),
    tag("lefloat"),
    tag("float"),
)),

This ordering prevents the parser from matching "bedouble" as "be" followed by unparsed "double".

Type Name to TypeKind Mapping#

The type_keyword_to_kind function maps string keywords to TypeKind enum variants. Following the pattern for integer types:

"float" => TypeKind::Float { endian: Endianness::Native },
"lefloat" => TypeKind::Float { endian: Endianness::Little },
"befloat" => TypeKind::Float { endian: Endianness::Big },
"double" => TypeKind::Double { endian: Endianness::Native },
"ledouble" => TypeKind::Double { endian: Endianness::Little },
"bedouble" => TypeKind::Double { endian: Endianness::Big },

Float Literal Parsing Requirements#

Magic file format requires decimal points in floating-point literals to disambiguate them from integer values. Examples:

Valid: 3.14, 1.0, 0.5, 2.718, 2.5e10 (scientific notation)
Invalid: 3, 14, 1 (parsed as integers, not floats)

The parser must use nom's recognize combinator with a mandatory decimal point pattern to distinguish float literals from integers, and float parsing must be ordered before integer parsing in the parse_value alt chain to ensure proper matching.

Value Enum and Type Representation#

Value::Float Variant#

The Value enum has been extended with a new variant:

pub enum Value {
    Uint(u64),
    Int(i64),
    Bytes(Vec<u8>),
    String(String),
    Float(f64), // New variant
}

Both 32-bit Float and 64-bit Double types use Value::Float(f64) for unified representation. When reading 32-bit floats from buffers, the f32 value is promoted to f64 for storage. Precision loss is acceptable since the original value can be exactly represented in 64-bit format.

Critical: Removal of Eq Derive#

The Value enum previously derived Eq:

#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub enum Value {
    // ...
}

The Eq derive has been removed with the addition of Value::Float(f64) because IEEE 754 floating-point semantics violate Rust's Eq trait requirements:

NaN inequality: IEEE 754 specifies that NaN != NaN, violating reflexivity (the Eq trait requires a == a for all values)
Trait contract: Rust's Eq trait is a marker trait indicating reflexive, symmetric, and transitive equality; NaN breaks reflexivity
Type safety: Rust's type system prevents deriving Eq on types containing f32 or f64 to maintain trait correctness

The Value enum now retains PartialEq for comparison operators, but Eq has been removed:

#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)] // Eq removed
pub enum Value {
    // ...
}

This breaking change cascades to any structs deriving from Value (such as MagicRule) that previously relied on automatic Eq derivation. These structs must either:

Remove their own Eq derive and keep PartialEq
Implement custom equality that explicitly handles the NaN case

Evaluator Buffer Reading Logic#

read_float and read_double Implementation#

Float and double types are implemented in src/evaluator/types/float.rs following the pattern established for integer types in src/evaluator/types/numeric.rs:

pub fn read_float(
    buffer: &[u8],
    offset: usize,
    endian: Endianness,
) -> Result<Value, TypeReadError> {
    // 1. Bounds check with checked arithmetic
    let end = offset.checked_add(4).ok_or(TypeReadError::BufferOverrun {
        offset,
        buffer_len: buffer.len(),
    })?;

    // 2. Extract 4-byte slice
    let bytes = buffer.get(offset..end).ok_or(TypeReadError::BufferOverrun {
        offset,
        buffer_len: buffer.len(),
    })?;

    // 3. Read as u32 with appropriate endianness using byteorder crate
    let bits = match endian {
        Endianness::Little => LittleEndian::read_u32(bytes),
        Endianness::Big => BigEndian::read_u32(bytes),
        Endianness::Native => NativeEndian::read_u32(bytes),
    };

    // 4. Interpret bit pattern as f32, promote to f64
    let value = f32::from_bits(bits);
    Ok(Value::Float(f64::from(value)))
}

pub fn read_double(
    buffer: &[u8],
    offset: usize,
    endian: Endianness,
) -> Result<Value, TypeReadError> {
    // Similar pattern with 8 bytes
    let end = offset.checked_add(8).ok_or(...)?;
    let bytes = buffer.get(offset..end).ok_or(...)?;

    let bits = match endian {
        Endianness::Little => LittleEndian::read_u64(bytes),
        Endianness::Big => BigEndian::read_u64(bytes),
        Endianness::Native => NativeEndian::read_u64(bytes),
    };

    let value = f64::from_bits(bits);
    Ok(Value::Float(value))
}

Both functions use the byteorder crate (already a project dependency) for endianness conversion, following the pattern in read_long and read_quad. The implementation uses LittleEndian::read_f32, BigEndian::read_f32, and NativeEndian::read_f32 methods (with corresponding read_f64 for doubles) to interpret byte sequences as IEEE 754 values.

Dispatcher Integration#

The read_typed_value dispatcher in src/evaluator/types/mod.rs includes Float and Double cases:

pub fn read_typed_value(
    buffer: &[u8],
    offset: usize,
    type_kind: &TypeKind,
) -> Result<Value, TypeReadError> {
    match type_kind {
        // ... existing variants
        TypeKind::Float { endian } => read_float(buffer, offset, *endian),
        TypeKind::Double { endian } => read_double(buffer, offset, *endian),
    }
}

Note that Float and Double patterns do not destructure a signed field, unlike integer type patterns.

Comparison Operations and NaN Handling#

IEEE 754 Comparison Semantics#

Comparison operators in src/evaluator/operators/comparison.rs implement IEEE 754 semantics for floating-point values using f64::partial_cmp:

Equality (==): Uses epsilon-aware comparison (|a - b| <= f64::EPSILON) with explicit NaN and infinity handling in src/evaluator/operators/equality.rs
Inequality (!=): Negates epsilon-aware equality; NaN != NaN returns true by IEEE 754 specification
Ordering (<, >, <=, >=): Any comparison with NaN returns None (propagates as false)
Bitwise operations (&, ^, ~): Not meaningful for floating-point; return error

Implementation using f64::partial_cmp:

pub fn compare_values(left: &Value, right: &Value) -> Option<Ordering> {
    match (left, right) {
        // ... existing integer and string patterns
        (Value::Float(a), Value::Float(b)) => a.partial_cmp(b),
        _ => None,
    }
}

The partial_cmp method returns None when either operand is NaN, which correctly propagates to comparison operators as "false".

Cross-Type Comparison Policy#

The existing comparison logic uses i128 coercion for cross-type integer comparisons. Floating-point values do not participate in cross-type coercion:

Value::Float compared with Value::Int or Value::Uint → returns None (type mismatch)
Value::Float compared with Value::Float → uses IEEE 754 semantics via partial_cmp
Value::Float compared with Value::String or Value::Bytes → returns None (type mismatch)

This prevents ambiguous implicit conversions and maintains type safety.

Strength Calculation#

The strength scoring system in src/evaluator/strength.rs assigns confidence points based on type specificity. Float and Double types are scored similarly to their integer counterparts:

strength += match &rule.typ {
    TypeKind::String { max_length } => {
        let base = 20;
        if max_length.is_some() { base + 5 } else { base }
    },
    TypeKind::Quad { .. } => 16,
    TypeKind::Double { .. } => 16, // Same as Quad (64-bit specificity)
    TypeKind::Long { .. } => 15,
    TypeKind::Float { .. } => 15, // Same as Long (32-bit specificity)
    TypeKind::Short { .. } => 10,
    TypeKind::Byte { .. } => 5,
};

Rationale:

Double (16 points): Matches Quad scoring to reflect 64-bit data size and equivalent specificity
Float (15 points): Matches Long scoring to reflect 32-bit data size and comparable detection confidence

Build-Time Code Generation#

TypeKind Serialization#

The serialize_type_kind function in src/parser/codegen.rs generates Rust code for TypeKind variants during the build process. Float and Double variants have been added following the pattern for integer types:

TypeKind::Float { endian } => format!(
    "TypeKind::Float {{ endian: {} }}",
    serialize_endianness(*endian)
),
TypeKind::Double { endian } => format!(
    "TypeKind::Double {{ endian: {} }}",
    serialize_endianness(*endian)
),

Note that Float and Double serialization does not include a signed field, unlike Long and Quad variants.

Critical Synchronization Requirement#

The serialize_type_kind function appears in two locations that have been synchronized:

src/parser/codegen.rs (used for runtime code generation)
src/build_helpers.rs (used by build.rs for compile-time embedding)

Both functions have been updated identically to support Float and Double variants. This dual-location pattern is a known architectural constraint in libmagic-rs.

Property-Based Testing#

The arb_type_kind strategy in tests/property_tests.rs generates TypeKind variants for fuzzing. Float and Double generators have been added:

fn arb_type_kind() -> impl Strategy<Value = TypeKind> {
    prop_oneof![
        // ... existing generators
        arb_endianness().prop_map(|endian| TypeKind::Float { endian }),
        arb_endianness().prop_map(|endian| TypeKind::Double { endian }),
    ]
}

These generators only vary endian (no signed field), ensuring exhaustive coverage of all three endianness variants (Little, Big, Native) for both Float and Double types.

Complete Implementation Checklist#

Float and Double types required synchronized updates across multiple files due to Rust's exhaustive pattern matching. The complete implementation:

1. AST Definition#

File: src/parser/ast.rs
Change: ✅ Added Float { endian: Endianness } and Double { endian: Endianness } to TypeKind enum
Change: ✅ Added Float(f64) to Value enum
Critical: ✅ Removed Eq derive from Value enum

2. Parser Grammar#

File: src/parser/types.rs
Change: ✅ Added float/double keyword groups to parse_type_keyword alt combinator (lines 43-78)
Change: ✅ Added six keyword-to-TypeKind mappings in type_keyword_to_kind (lines 112-201)
Ordering: ✅ Longest prefixes first (bedouble before double, befloat before float)

3. Type Reading Functions#

File: src/evaluator/types/float.rs (new file)
Change: ✅ Implemented read_float() and read_double() functions
File: src/evaluator/types/mod.rs
Change: ✅ Added Float and Double cases to read_typed_value() dispatcher (lines 64-76)
Change: ✅ Exported read_float and read_double functions

4. Strength Scoring#

File: src/evaluator/strength.rs
Change: ✅ Added TypeKind::Float { .. } => 15 and TypeKind::Double { .. } => 16 cases (lines 72-88)

5. Comparison Operators#

File: src/evaluator/operators/comparison.rs
Change: ✅ Added (Value::Float(a), Value::Float(b)) => a.partial_cmp(b) case to compare_values (lines 29-39)
File: src/evaluator/operators/equality.rs
Change: ✅ Implemented epsilon-aware equality with explicit NaN/infinity handling

6. Build-Time Serialization (Dual Locations)#

File: src/parser/codegen.rs
Change: ✅ Added Float and Double serialization cases to serialize_type_kind (lines 172-196)
File: src/build_helpers.rs
Critical: ✅ Updated identical serialize_type_kind function synchronously

7. Property Tests#

File: tests/property_tests.rs
Change: ✅ Added Float and Double generators to arb_type_kind() strategy (lines 37-50)

8. Output Formatting#

File: src/output/json.rs
Change: ✅ Added Value::Float(f) case to formatting functions (lines 255-293)

9. Grammar Parsing#

File: src/parser/grammar/mod.rs
Change: ✅ Implemented parse_float_value grammar function with decimal point requirement
Ordering: ✅ Placed float literal parser before integer parser in parse_value alt chain

10. Additional Exhaustive Matches#

Various files: ✅ Updated all non-exhaustive TypeKind and Value pattern matches
Test files: ✅ Updated test constructors and fixtures to handle new variants

Parser-Evaluator Architecture#

libmagic-rs uses a three-layer architecture where AST definitions, parser grammar, and evaluator dispatch functions exist in separate files. Each layer requires explicit handling of new enum variants:

AST layer (src/parser/ast.rs): Define TypeKind and Value variants
Parser layer (src/parser/types.rs, src/parser/grammar/mod.rs): Recognize keywords, parse literals, construct variants
Evaluator layer (src/evaluator/types/*.rs): Implement buffer reading logic, return appropriate Value variants

All three layers must remain synchronized for the type system to function correctly.

Exhaustive Match Synchronization Pattern#

Rust's exhaustive pattern matching automatically catches non-exhaustive matches at compile time when enum variants are added. The compiler will flag:

Missing TypeKind match arms in evaluator functions
Missing Value match arms in comparison and formatting functions
Missing cases in test constructors and property test generators

This compile-time safety is a key design principle in libmagic-rs, ensuring that all code paths are updated when the type system changes.

Type System and Operator Coverage: Complete documentation of supported types and operators
Enum Extension and Exhaustive Match Synchronization: Checklist pattern for adding enum variants across the codebase
Type Signedness Defaults and Unsigned Type Variants: How signedness cascades through integer types (contrasts with float/double which are always signed)
Nom Alt Combinator Branch Limit and Nesting Pattern: Parser structure for handling multiple type name branches

Implementation Status#

Status: ✅ Implemented in v0.1.0 (PR #162)

Float and Double type support has been fully implemented and merged. The implementation includes:

✅ TypeKind::Float and TypeKind::Double variants with Endianness field
✅ Value::Float(f64) variant (Value no longer derives Eq)
✅ Six type keywords: float, double, befloat, bedouble, lefloat, ledouble
✅ parse_float_value grammar for parsing float literals with mandatory decimal point
✅ read_float and read_double evaluator functions in src/evaluator/types/float.rs
✅ Epsilon-aware equality (f64::EPSILON) with NaN/infinity handling in src/evaluator/operators/equality.rs
✅ IEEE 754 partial_cmp for comparison operators in src/evaluator/operators/comparison.rs
✅ Strength scoring for Float (15 points) and Double (16 points) types
✅ JSON output support for float values
✅ Exhaustive match updates across all affected files (ast.rs, types.rs, mod.rs, codegen.rs, build_helpers.rs, strength.rs, comparison.rs, equality.rs, json.rs)
✅ Property-based test generators in tests/property_tests.rs
✅ Comprehensive unit tests for float reading, endianness, equality semantics, NaN/infinity edge cases

The implementation follows the architectural patterns established for integer types (Long, Quad) and maintains exhaustive pattern matching across 10+ files. All 150 tests pass with zero clippy warnings.

Relevant Code Files#

File	Purpose	Implementation Status
`src/parser/ast.rs`	AST type definitions	✅ Float/Double TypeKind variants, Float Value variant, Eq derive removed
`src/parser/types.rs`	Type keyword parsing	✅ 6 float/double keyword mappings
`src/parser/grammar/mod.rs`	Grammar and literal parsing	✅ parse_float_value with decimal point requirement
`src/evaluator/types/mod.rs`	Type reading dispatcher	✅ Float/Double dispatch cases
`src/evaluator/types/float.rs`	Float type readers	✅ read_float and read_double implementations
`src/evaluator/strength.rs`	Strength scoring	✅ Float (15) and Double (16) scores
`src/evaluator/operators/comparison.rs`	Comparison operators	✅ Float comparison with partial_cmp
`src/evaluator/operators/equality.rs`	Equality operators	✅ Epsilon-aware equality with NaN/infinity handling
`src/parser/codegen.rs`	Code generation	✅ Float/Double serialization
`src/build_helpers.rs`	Build-time code gen	✅ Float/Double serialization (synced with codegen.rs)
`src/output/json.rs`	JSON output formatting	✅ Value::Float formatting
`tests/property_tests.rs`	Property-based tests	✅ Float/Double generators
`tests/evaluator_tests.rs`	Integration tests	✅ Float/double rule evaluation tests
`src/parser/grammar/tests.rs`	Parser tests	✅ Float/double parsing tests

Citations: