Float and Double Type Support#
Lead Section#
Float and Double type support is an implemented feature (as of PR #162) for libmagic-rs that enables detection and evaluation of IEEE 754 floating-point data in binary files. The implementation adds six magic file keywords (float, befloat, lefloat, double, bedouble, ledouble) corresponding to 32-bit and 64-bit floating-point types with configurable endianness.
The implementation extends the TypeKind enum in src/parser/ast.rs with Float and Double variants, adds a Value::Float(f64) variant to the Value enum, and implements buffer reading logic for IEEE 754 binary representation. Critically, the Eq derive has been removed from the Value enum due to IEEE 754 NaN semantics, which violate Rust's Eq trait requirements (NaN != NaN by specification).
This article documents the complete architecture for float/double support, including AST structure, parser grammar, evaluator logic, strength calculation, comparison operations, and the cascading code changes across 10+ files required to maintain exhaustive pattern matching invariants.
IEEE 754 Type Architecture#
TypeKind Enum Extensions#
The TypeKind enum has been extended with two floating-point variants following the same pattern as multi-byte integer types (Short, Long, Quad):
pub enum TypeKind {
// ... existing variants
Float { endian: Endianness }, // 32-bit IEEE 754 single-precision
Double { endian: Endianness }, // 64-bit IEEE 754 double-precision
}
Unlike integer types, Float and Double do not include a signed field because IEEE 754 represents the sign as a separate bit in the floating-point format, making all floating-point values inherently signed. This contrasts with integer types like Long and Quad, which have both endian: Endianness and signed: bool fields.
Endianness Support#
The Endianness enum provides three byte order options that apply to both Float and Double types:
Little: Least significant byte first (little-endian)Big: Most significant byte first (big-endian)Native: System-dependent byte order (matches target architecture)
These correspond to the six magic file keywords:
| Keyword | Type | Endianness | TypeKind Variant |
|---|---|---|---|
float | 32-bit | Native | TypeKind::Float { endian: Endianness::Native } |
lefloat | 32-bit | Little | TypeKind::Float { endian: Endianness::Little } |
befloat | 32-bit | Big | TypeKind::Float { endian: Endianness::Big } |
double | 64-bit | Native | TypeKind::Double { endian: Endianness::Native } |
ledouble | 64-bit | Little | TypeKind::Double { endian: Endianness::Little } |
bedouble | 64-bit | Big | TypeKind::Double { endian: Endianness::Big } |
Parser Grammar Implementation#
Type Keyword Parsing#
Type keyword parsing in src/parser/types.rs uses nom's alt() combinator to recognize type names. Following the pattern for integer types, float and double keywords must be organized by bit width with longest prefixes first to prevent ambiguous matches:
// 64-bit floating-point types
alt((
tag("bedouble"),
tag("ledouble"),
tag("double"),
)),
// 32-bit floating-point types
alt((
tag("befloat"),
tag("lefloat"),
tag("float"),
)),
This ordering prevents the parser from matching "bedouble" as "be" followed by unparsed "double".
Type Name to TypeKind Mapping#
The type_keyword_to_kind function maps string keywords to TypeKind enum variants. Following the pattern for integer types:
"float" => TypeKind::Float { endian: Endianness::Native },
"lefloat" => TypeKind::Float { endian: Endianness::Little },
"befloat" => TypeKind::Float { endian: Endianness::Big },
"double" => TypeKind::Double { endian: Endianness::Native },
"ledouble" => TypeKind::Double { endian: Endianness::Little },
"bedouble" => TypeKind::Double { endian: Endianness::Big },
Float Literal Parsing Requirements#
Magic file format requires decimal points in floating-point literals to disambiguate them from integer values. Examples:
- Valid:
3.14,1.0,0.5,2.718,2.5e10(scientific notation) - Invalid:
3,14,1(parsed as integers, not floats)
The parser must use nom's recognize combinator with a mandatory decimal point pattern to distinguish float literals from integers, and float parsing must be ordered before integer parsing in the parse_value alt chain to ensure proper matching.
Value Enum and Type Representation#
Value::Float Variant#
The Value enum has been extended with a new variant:
pub enum Value {
Uint(u64),
Int(i64),
Bytes(Vec<u8>),
String(String),
Float(f64), // New variant
}
Both 32-bit Float and 64-bit Double types use Value::Float(f64) for unified representation. When reading 32-bit floats from buffers, the f32 value is promoted to f64 for storage. Precision loss is acceptable since the original value can be exactly represented in 64-bit format.
Critical: Removal of Eq Derive#
The Value enum previously derived Eq:
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub enum Value {
// ...
}
The Eq derive has been removed with the addition of Value::Float(f64) because IEEE 754 floating-point semantics violate Rust's Eq trait requirements:
- NaN inequality: IEEE 754 specifies that
NaN != NaN, violating reflexivity (theEqtrait requiresa == afor all values) - Trait contract: Rust's
Eqtrait is a marker trait indicating reflexive, symmetric, and transitive equality; NaN breaks reflexivity - Type safety: Rust's type system prevents deriving
Eqon types containingf32orf64to maintain trait correctness
The Value enum now retains PartialEq for comparison operators, but Eq has been removed:
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)] // Eq removed
pub enum Value {
// ...
}
This breaking change cascades to any structs deriving from Value (such as MagicRule) that previously relied on automatic Eq derivation. These structs must either:
- Remove their own
Eqderive and keepPartialEq - Implement custom equality that explicitly handles the NaN case
Evaluator Buffer Reading Logic#
read_float and read_double Implementation#
Float and double types are implemented in src/evaluator/types/float.rs following the pattern established for integer types in src/evaluator/types/numeric.rs:
pub fn read_float(
buffer: &[u8],
offset: usize,
endian: Endianness,
) -> Result<Value, TypeReadError> {
// 1. Bounds check with checked arithmetic
let end = offset.checked_add(4).ok_or(TypeReadError::BufferOverrun {
offset,
buffer_len: buffer.len(),
})?;
// 2. Extract 4-byte slice
let bytes = buffer.get(offset..end).ok_or(TypeReadError::BufferOverrun {
offset,
buffer_len: buffer.len(),
})?;
// 3. Read as u32 with appropriate endianness using byteorder crate
let bits = match endian {
Endianness::Little => LittleEndian::read_u32(bytes),
Endianness::Big => BigEndian::read_u32(bytes),
Endianness::Native => NativeEndian::read_u32(bytes),
};
// 4. Interpret bit pattern as f32, promote to f64
let value = f32::from_bits(bits);
Ok(Value::Float(f64::from(value)))
}
pub fn read_double(
buffer: &[u8],
offset: usize,
endian: Endianness,
) -> Result<Value, TypeReadError> {
// Similar pattern with 8 bytes
let end = offset.checked_add(8).ok_or(...)?;
let bytes = buffer.get(offset..end).ok_or(...)?;
let bits = match endian {
Endianness::Little => LittleEndian::read_u64(bytes),
Endianness::Big => BigEndian::read_u64(bytes),
Endianness::Native => NativeEndian::read_u64(bytes),
};
let value = f64::from_bits(bits);
Ok(Value::Float(value))
}
Both functions use the byteorder crate (already a project dependency) for endianness conversion, following the pattern in read_long and read_quad. The implementation uses LittleEndian::read_f32, BigEndian::read_f32, and NativeEndian::read_f32 methods (with corresponding read_f64 for doubles) to interpret byte sequences as IEEE 754 values.
Dispatcher Integration#
The read_typed_value dispatcher in src/evaluator/types/mod.rs includes Float and Double cases:
pub fn read_typed_value(
buffer: &[u8],
offset: usize,
type_kind: &TypeKind,
) -> Result<Value, TypeReadError> {
match type_kind {
// ... existing variants
TypeKind::Float { endian } => read_float(buffer, offset, *endian),
TypeKind::Double { endian } => read_double(buffer, offset, *endian),
}
}
Note that Float and Double patterns do not destructure a signed field, unlike integer type patterns.
Comparison Operations and NaN Handling#
IEEE 754 Comparison Semantics#
Comparison operators in src/evaluator/operators/comparison.rs implement IEEE 754 semantics for floating-point values using f64::partial_cmp:
- Equality (
==): Uses epsilon-aware comparison (|a - b| <= f64::EPSILON) with explicit NaN and infinity handling insrc/evaluator/operators/equality.rs - Inequality (
!=): Negates epsilon-aware equality;NaN != NaNreturnstrueby IEEE 754 specification - Ordering (
<,>,<=,>=): Any comparison with NaN returnsNone(propagates asfalse) - Bitwise operations (
&,^,~): Not meaningful for floating-point; return error
Implementation using f64::partial_cmp:
pub fn compare_values(left: &Value, right: &Value) -> Option<Ordering> {
match (left, right) {
// ... existing integer and string patterns
(Value::Float(a), Value::Float(b)) => a.partial_cmp(b),
_ => None,
}
}
The partial_cmp method returns None when either operand is NaN, which correctly propagates to comparison operators as "false".
Cross-Type Comparison Policy#
The existing comparison logic uses i128 coercion for cross-type integer comparisons. Floating-point values do not participate in cross-type coercion:
Value::Floatcompared withValue::IntorValue::Uint→ returnsNone(type mismatch)Value::Floatcompared withValue::Float→ uses IEEE 754 semantics viapartial_cmpValue::Floatcompared withValue::StringorValue::Bytes→ returnsNone(type mismatch)
This prevents ambiguous implicit conversions and maintains type safety.
Strength Calculation#
The strength scoring system in src/evaluator/strength.rs assigns confidence points based on type specificity. Float and Double types are scored similarly to their integer counterparts:
strength += match &rule.typ {
TypeKind::String { max_length } => {
let base = 20;
if max_length.is_some() { base + 5 } else { base }
},
TypeKind::Quad { .. } => 16,
TypeKind::Double { .. } => 16, // Same as Quad (64-bit specificity)
TypeKind::Long { .. } => 15,
TypeKind::Float { .. } => 15, // Same as Long (32-bit specificity)
TypeKind::Short { .. } => 10,
TypeKind::Byte { .. } => 5,
};
Rationale:
- Double (16 points): Matches
Quadscoring to reflect 64-bit data size and equivalent specificity - Float (15 points): Matches
Longscoring to reflect 32-bit data size and comparable detection confidence
Build-Time Code Generation#
TypeKind Serialization#
The serialize_type_kind function in src/parser/codegen.rs generates Rust code for TypeKind variants during the build process. Float and Double variants have been added following the pattern for integer types:
TypeKind::Float { endian } => format!(
"TypeKind::Float {{ endian: {} }}",
serialize_endianness(*endian)
),
TypeKind::Double { endian } => format!(
"TypeKind::Double {{ endian: {} }}",
serialize_endianness(*endian)
),
Note that Float and Double serialization does not include a signed field, unlike Long and Quad variants.
Critical Synchronization Requirement#
The serialize_type_kind function appears in two locations that have been synchronized:
src/parser/codegen.rs(used for runtime code generation)src/build_helpers.rs(used bybuild.rsfor compile-time embedding)
Both functions have been updated identically to support Float and Double variants. This dual-location pattern is a known architectural constraint in libmagic-rs.
Property-Based Testing#
The arb_type_kind strategy in tests/property_tests.rs generates TypeKind variants for fuzzing. Float and Double generators have been added:
fn arb_type_kind() -> impl Strategy<Value = TypeKind> {
prop_oneof![
// ... existing generators
arb_endianness().prop_map(|endian| TypeKind::Float { endian }),
arb_endianness().prop_map(|endian| TypeKind::Double { endian }),
]
}
These generators only vary endian (no signed field), ensuring exhaustive coverage of all three endianness variants (Little, Big, Native) for both Float and Double types.
Complete Implementation Checklist#
Float and Double types required synchronized updates across multiple files due to Rust's exhaustive pattern matching. The complete implementation:
1. AST Definition#
- File:
src/parser/ast.rs - Change: ✅ Added
Float { endian: Endianness }andDouble { endian: Endianness }toTypeKindenum - Change: ✅ Added
Float(f64)toValueenum - Critical: ✅ Removed
Eqderive fromValueenum
2. Parser Grammar#
- File:
src/parser/types.rs - Change: ✅ Added float/double keyword groups to
parse_type_keywordalt combinator (lines 43-78) - Change: ✅ Added six keyword-to-TypeKind mappings in
type_keyword_to_kind(lines 112-201) - Ordering: ✅ Longest prefixes first (
bedoublebeforedouble,befloatbeforefloat)
3. Type Reading Functions#
- File:
src/evaluator/types/float.rs(new file) - Change: ✅ Implemented
read_float()andread_double()functions - File:
src/evaluator/types/mod.rs - Change: ✅ Added Float and Double cases to
read_typed_value()dispatcher (lines 64-76) - Change: ✅ Exported
read_floatandread_doublefunctions
4. Strength Scoring#
- File:
src/evaluator/strength.rs - Change: ✅ Added
TypeKind::Float { .. } => 15andTypeKind::Double { .. } => 16cases (lines 72-88)
5. Comparison Operators#
- File:
src/evaluator/operators/comparison.rs - Change: ✅ Added
(Value::Float(a), Value::Float(b)) => a.partial_cmp(b)case tocompare_values(lines 29-39) - File:
src/evaluator/operators/equality.rs - Change: ✅ Implemented epsilon-aware equality with explicit NaN/infinity handling
6. Build-Time Serialization (Dual Locations)#
- File:
src/parser/codegen.rs - Change: ✅ Added Float and Double serialization cases to
serialize_type_kind(lines 172-196) - File:
src/build_helpers.rs - Critical: ✅ Updated identical
serialize_type_kindfunction synchronously
7. Property Tests#
- File:
tests/property_tests.rs - Change: ✅ Added Float and Double generators to
arb_type_kind()strategy (lines 37-50)
8. Output Formatting#
- File:
src/output/json.rs - Change: ✅ Added
Value::Float(f)case to formatting functions (lines 255-293)
9. Grammar Parsing#
- File:
src/parser/grammar/mod.rs - Change: ✅ Implemented
parse_float_valuegrammar function with decimal point requirement - Ordering: ✅ Placed float literal parser before integer parser in
parse_valuealt chain
10. Additional Exhaustive Matches#
- Various files: ✅ Updated all non-exhaustive TypeKind and Value pattern matches
- Test files: ✅ Updated test constructors and fixtures to handle new variants
Architectural Context and Related Patterns#
Parser-Evaluator Architecture#
libmagic-rs uses a three-layer architecture where AST definitions, parser grammar, and evaluator dispatch functions exist in separate files. Each layer requires explicit handling of new enum variants:
- AST layer (
src/parser/ast.rs): DefineTypeKindandValuevariants - Parser layer (
src/parser/types.rs,src/parser/grammar/mod.rs): Recognize keywords, parse literals, construct variants - Evaluator layer (
src/evaluator/types/*.rs): Implement buffer reading logic, return appropriateValuevariants
All three layers must remain synchronized for the type system to function correctly.
Exhaustive Match Synchronization Pattern#
Rust's exhaustive pattern matching automatically catches non-exhaustive matches at compile time when enum variants are added. The compiler will flag:
- Missing TypeKind match arms in evaluator functions
- Missing Value match arms in comparison and formatting functions
- Missing cases in test constructors and property test generators
This compile-time safety is a key design principle in libmagic-rs, ensuring that all code paths are updated when the type system changes.
Related Topics#
- Type System and Operator Coverage: Complete documentation of supported types and operators
- Enum Extension and Exhaustive Match Synchronization: Checklist pattern for adding enum variants across the codebase
- Type Signedness Defaults and Unsigned Type Variants: How signedness cascades through integer types (contrasts with float/double which are always signed)
- Nom Alt Combinator Branch Limit and Nesting Pattern: Parser structure for handling multiple type name branches
Implementation Status#
Status: ✅ Implemented in v0.1.0 (PR #162)
Float and Double type support has been fully implemented and merged. The implementation includes:
- ✅ TypeKind::Float and TypeKind::Double variants with Endianness field
- ✅ Value::Float(f64) variant (Value no longer derives Eq)
- ✅ Six type keywords: float, double, befloat, bedouble, lefloat, ledouble
- ✅ parse_float_value grammar for parsing float literals with mandatory decimal point
- ✅ read_float and read_double evaluator functions in src/evaluator/types/float.rs
- ✅ Epsilon-aware equality (f64::EPSILON) with NaN/infinity handling in src/evaluator/operators/equality.rs
- ✅ IEEE 754 partial_cmp for comparison operators in src/evaluator/operators/comparison.rs
- ✅ Strength scoring for Float (15 points) and Double (16 points) types
- ✅ JSON output support for float values
- ✅ Exhaustive match updates across all affected files (ast.rs, types.rs, mod.rs, codegen.rs, build_helpers.rs, strength.rs, comparison.rs, equality.rs, json.rs)
- ✅ Property-based test generators in tests/property_tests.rs
- ✅ Comprehensive unit tests for float reading, endianness, equality semantics, NaN/infinity edge cases
The implementation follows the architectural patterns established for integer types (Long, Quad) and maintains exhaustive pattern matching across 10+ files. All 150 tests pass with zero clippy warnings.
Relevant Code Files#
| File | Purpose | Implementation Status |
|---|---|---|
src/parser/ast.rs | AST type definitions | ✅ Float/Double TypeKind variants, Float Value variant, Eq derive removed |
src/parser/types.rs | Type keyword parsing | ✅ 6 float/double keyword mappings |
src/parser/grammar/mod.rs | Grammar and literal parsing | ✅ parse_float_value with decimal point requirement |
src/evaluator/types/mod.rs | Type reading dispatcher | ✅ Float/Double dispatch cases |
src/evaluator/types/float.rs | Float type readers | ✅ read_float and read_double implementations |
src/evaluator/strength.rs | Strength scoring | ✅ Float (15) and Double (16) scores |
src/evaluator/operators/comparison.rs | Comparison operators | ✅ Float comparison with partial_cmp |
src/evaluator/operators/equality.rs | Equality operators | ✅ Epsilon-aware equality with NaN/infinity handling |
src/parser/codegen.rs | Code generation | ✅ Float/Double serialization |
src/build_helpers.rs | Build-time code gen | ✅ Float/Double serialization (synced with codegen.rs) |
src/output/json.rs | JSON output formatting | ✅ Value::Float formatting |
tests/property_tests.rs | Property-based tests | ✅ Float/Double generators |
tests/evaluator_tests.rs | Integration tests | ✅ Float/double rule evaluation tests |
src/parser/grammar/tests.rs | Parser tests | ✅ Float/double parsing tests |
Citations:
- TypeKind enum in src/parser/ast.rs
- Value enum in src/parser/ast.rs
- Endianness enum in src/parser/ast.rs
- Type keyword parsing in src/parser/types.rs
- Type keyword to TypeKind mapping in src/parser/types.rs
- read_long implementation in src/evaluator/types/numeric.rs
- read_typed_value dispatcher in src/evaluator/types/mod.rs
- Strength calculation in src/evaluator/strength.rs
- Comparison operators in src/evaluator/operators/comparison.rs
- TypeKind serialization in src/parser/codegen.rs
- Property test generators in tests/property_tests.rs
- JSON output formatting in src/output/json.rs