Float Epsilon Equality Pattern#

Lead Section#

The Float Epsilon Equality Pattern is an architectural pattern for implementing epsilon-aware floating-point equality comparisons with explicit handling of IEEE 754 edge cases (NaN and infinity). This pattern addresses the fundamental challenge that floating-point arithmetic introduces precision errors, making exact equality comparisons unreliable for computed values.

In the libmagic-rs project, this pattern was implemented in v0.1.0 (PR #162) to support float and double magic file types. The pattern enforces a critical semantic distinction: equality operators (=, ==, !=) use epsilon-based comparison to accommodate floating-point precision limitations, while ordering operators (<, >, <=, >=) use strict IEEE 754 semantics to preserve mathematical ordering properties. This split exists because epsilon comparison violates transitivity, which would break ordering relationships.

The pattern's implementation requires explicit handling of special IEEE 754 values: NaN must never equal anything (including itself), infinities require exact bit-pattern comparison, and signed zeros must be treated as equal. The implementation uses f64::EPSILON as the tolerance threshold to determine equality for normal floating-point values.

Background and Motivation#

Floating-Point Precision Challenges#

Floating-point arithmetic in computers follows the IEEE 754 standard, which represents real numbers in binary format with finite precision. This representation inherently introduces rounding errors in many arithmetic operations. For example, the mathematical identity (0.1 + 0.2) == 0.3 fails in most programming languages because 0.1 and 0.2 cannot be represented exactly in binary floating-point.

Direct equality comparison using == for floating-point values is unreliable for computed results. Even mathematically equivalent expressions may produce slightly different bit patterns due to differences in operation order, compiler optimizations, or hardware architectures. This makes naive equality tests brittle and error-prone.

The Epsilon Tolerance Solution#

Epsilon-based equality comparison treats two floating-point values as equal if they differ by less than a small tolerance threshold called epsilon. The standard approach uses machine epsilon (f32::EPSILON for 32-bit floats, f64::EPSILON for 64-bit doubles), which represents the smallest value such that 1.0 + epsilon != 1.0 in floating-point arithmetic.

The implementation in libmagic-rs uses the formula (a - b).abs() <= epsilon to determine equality, allowing values within the tolerance threshold to be considered equal while preserving the ability to distinguish values with meaningful differences.

Why Ordering Must Differ from Equality#

A critical architectural decision in this pattern is that ordering operators cannot use epsilon comparison. This is because epsilon-based equality is not transitive: if A ≈ B (within epsilon) and B ≈ C (within epsilon), it does not guarantee that A ≈ C. For example, with epsilon = 0.01:

1.00 ≈ 1.005 (difference 0.005 < 0.01)
1.005 ≈ 1.01 (difference 0.005 < 0.01)
But 1.00 and 1.01 differ by exactly 0.01, at the boundary

This violation of transitivity would corrupt ordering relationships, making comparison operators unreliable for sorting or range checking. Therefore, ordering operators must use strict IEEE 754 comparison via partial_cmp(), which provides mathematically sound (but partial) ordering semantics.

Pattern Architecture#

Operator Semantic Split#

The pattern enforces distinct comparison semantics for two operator categories:

Equality operators (=, ==, !=):

Use epsilon-based comparison for floating-point values
Delegate to floats_equal() helper function
Return true when values differ by less than or equal to epsilon
Handle NaN and infinity as special cases

Ordering operators (<, >, <=, >=):

Use strict IEEE 754 partial_cmp() semantics
Delegate to compare_values() helper function
Return None when comparing NaN values (partial ordering)
Preserve mathematical transitivity and total ordering properties (except for NaN)

The operator implementation architecture in libmagic-rs uses helper functions to centralize type-specific comparison logic, with individual operator functions serving as simple delegators. This pattern extends naturally to floating-point types.

The floats_equal() Helper Function#

The core of the pattern is the floats_equal() helper function, which implements epsilon-aware equality with IEEE 754 edge case handling. The implementation in src/evaluator/operators/equality.rs follows this logic:

const FLOAT_EPSILON: f64 = f64::EPSILON;

fn floats_equal(a: f64, b: f64) -> bool {
    // NaN is never equal to anything, including itself
    if a.is_nan() || b.is_nan() {
        return false;
    }

    // Infinities must match exactly (same sign)
    if a.is_infinite() || b.is_infinite() {
        #[allow(clippy::float_cmp)]
        return a == b; // Exact comparison for infinities
    }

    // Standard epsilon comparison for normal values
    (a - b).abs() <= FLOAT_EPSILON
}

Design rationale:

NaN checks must come first: IEEE 754 specifies that NaN is not equal to anything, including itself. The epsilon comparison (NaN - NaN).abs() <= epsilon produces nonsense (NaN <= epsilon is always false, but through incorrect logic), so explicit checks are required.
Infinity requires exact comparison: The naive formula (inf - inf).abs() <= epsilon fails because inf - inf = NaN, which makes the comparison always return false. Infinities must use exact bit-pattern comparison: positive infinity equals only positive infinity, negative infinity equals only negative infinity.
Epsilon comparison for normal values: Only after ruling out NaN and infinity can the standard epsilon formula safely apply to finite values. This ordering ensures each case is handled with appropriate semantics.
Clippy lint suppression: Direct float comparison (a == b for infinities) requires #[allow(clippy::float_cmp)] annotation, as the project enables the float_cmp lint at "warn" level to catch inappropriate float comparisons elsewhere in the codebase.

Integration with Value Enum#

The Value enum in libmagic-rs supports floating-point values through a unified Float variant that stores f64:

pub enum Value {
    Uint(u64),
    Int(i64),
    Float(f64), // Stores both 32-bit float and 64-bit double
    Bytes(Vec<u8>),
    String(String),
}

Both 32-bit float and 64-bit double types are widened to f64 and stored as Value::Float. The apply_equal() function in equality.rs pattern-matches on float variants and delegates to floats_equal():

pub fn apply_equal(left: &Value, right: &Value) -> bool {
    if let (Value::Float(a), Value::Float(b)) = (left, right) {
        return floats_equal(*a, *b);
    }
    compare_values(left, right) == Some(Ordering::Equal)
}

Cross-type comparisons between floats and integers are not supported; such comparisons return false for equality and None for ordering.

IEEE 754 Edge Case Handling#

NaN Behavior#

IEEE 754 specifies that NaN (Not-a-Number) is not equal to any value, including itself. This property is explicitly enforced:

NaN == NaN returns false
NaN == any_value returns false
NaN != NaN returns true

Any floating-point operation involving NaN produces NaN (e.g., NaN + 1.0 = NaN, NaN * 0.0 = NaN). This "poisoning" behavior means the floats_equal() function must check for NaN before attempting arithmetic operations like subtraction.

Infinity Handling#

Positive and negative infinity are distinct values that can be equal within IEEE 754 semantics:

+inf == +inf returns true
-inf == -inf returns true
+inf == -inf returns false
inf == finite_value returns false (no finite value equals infinity)

The critical implementation detail is that inf - inf = NaN, which breaks the epsilon comparison formula. Therefore, infinities must be detected and handled with exact bit-pattern comparison before the epsilon logic.

Signed Zero#

IEEE 754 distinguishes between positive zero (+0.0) and negative zero (-0.0), though they compare as equal in arithmetic operations. The epsilon comparison naturally handles this correctly: |+0.0 - (-0.0)| = 0.0 < epsilon, so no special handling is required.

Denormalized Numbers#

Denormalized (subnormal) numbers are very small values near zero that use a different representation to extend the range of floating-point. The epsilon comparison handles these correctly for most use cases, but applications requiring precise control over underflow behavior may need additional logic.

Epsilon Value Selection#

Machine Epsilon#

The implementation uses f64::EPSILON as the epsilon threshold. This constant represents the smallest value ε such that 1.0 + ε ≠ 1.0 in the f64 floating-point format:

f64::EPSILON ≈ 2.22 × 10⁻¹⁶ (approximately 16 decimal digits of precision)

Machine epsilon represents the fundamental precision limit of floating-point arithmetic and provides a natural threshold for equality comparison.

Absolute vs. Relative Epsilon#

The pattern uses absolute epsilon: |a - b| <= epsilon. This works well for values near 1.0 but has limitations:

For very large values (e.g., 10¹⁰), the absolute epsilon becomes insignificant relative to the value magnitude
For very small values (e.g., 10⁻¹⁰), the absolute epsilon may be too large, causing distinct values to compare as equal

More sophisticated implementations may use relative epsilon: |a - b| < epsilon * max(|a|, |b|), which scales the tolerance to the magnitude of the operands. However, the implementation uses absolute epsilon for simplicity and compatibility with the original libmagic behavior.

ULP-Based Comparison#

An alternative approach is Units in the Last Place (ULP) comparison, which treats floating-point values as integers by comparing their bit representations. This provides magnitude-independent precision but requires more complex implementation. The Float Epsilon Equality Pattern as described does not use ULP comparison.

Testing Strategy#

Required Test Coverage#

Comprehensive testing for the pattern must cover multiple dimensions:

1. Epsilon tolerance tests:

Values within epsilon should be equal: assert!(floats_equal(1.0, 1.0 + f64::EPSILON/2.0, f64::EPSILON))
Values outside epsilon should not be equal: assert!(!floats_equal(1.0, 1.0 + 2.0*f64::EPSILON, f64::EPSILON))
Boundary cases at exactly epsilon distance

2. NaN edge case tests:

NaN == NaN must return false
NaN == finite_value must return false
finite_value == NaN must return false
Both operands as NaN

3. Infinity edge case tests:

inf == inf must return true
-inf == -inf must return true
inf == -inf must return false
inf == large_finite_value must return false
Mixed infinity and finite values

4. Zero handling tests:

+0.0 == -0.0 must return true
+0.0 == tiny_value (within epsilon) must return true
Zero comparison consistency with epsilon

5. Ordering operator tests (strict IEEE 754):

1.0 < 1.0 + f64::EPSILON/2.0 must return true (no epsilon tolerance)
Ordering preserves transitivity
NaN comparisons return None (partial ordering)

6. Cross-type comparison tests:

Float vs. Double comparisons
Float/Double vs. Integer comparisons with appropriate coercion
Precision loss handling during type coercion

Property-Based Testing#

The pattern should be validated with property-based tests using tools like proptest:

Reflexivity (for non-NaN): For all x (not NaN), floats_equal(x, x) is true
Symmetry: If floats_equal(a, b) then floats_equal(b, a)
Transitivity violation: Document that epsilon equality is NOT transitive (intentional design choice)
Ordering consistency: If a < b (ordering operator), then !floats_equal(a, b) should typically be true (with epsilon edge cases documented)

The test infrastructure uses proptest for property-based testing, which extends to float types.

Integration Testing with Magic Rules#

Magic file rules testing requires constructing MagicRule instances with TypeKind::Float or TypeKind::Double and evaluating them against IEEE 754-encoded byte buffers. Test cases should include:

Parsing float values from big-endian and little-endian byte sequences
Evaluating equality conditions with computed float values
Handling special value encodings (NaN, infinity in binary form)
Cross-type comparisons in magic rule contexts

The byteorder crate provides read_f32() and read_f64() methods with endianness support, enabling byte-level testing of float parsing and comparison.

Implementation Status in libmagic-rs#

Floating-point types were implemented in v0.1.0 (PR #162) with six type variants to support IEEE 754 single and double precision with endianness control:

32-bit float types: float (native), befloat (big-endian), lefloat (little-endian)
64-bit double types: double (native), bedouble (big-endian), ledouble (little-endian)

Implementation status:

✅ Integer equality (Uint, Int) with cross-type coercion via i128
✅ String and byte sequence equality
✅ Operator dispatch architecture with helper functions
✅ Value::Float(f64) variant storing both float and double values
✅ TypeKind::Float and TypeKind::Double variants with endianness
✅ floats_equal() helper function with epsilon-aware comparison
✅ Float type parsing via read_float() and read_double() in evaluator/types/float.rs
✅ Six type keywords (float, befloat, lefloat, double, bedouble, ledouble) in parser

Test Coverage#

The implementation includes comprehensive test coverage:

Unit tests: evaluator/types/float.rs covers endianness variants, buffer overrun, offset overflow
Equality tests: epsilon-aware comparison, NaN, infinity edge cases in evaluator/operators/equality.rs
Comparison tests: float ordering, NaN returns None in evaluator/operators/comparison.rs
Integration tests: float/double rules through evaluate_rules in tests/evaluator_tests.rs
Grammar tests: float literal parsing, type precedence in parser/grammar/tests.rs
Property tests: arb_type_kind includes Float/Double variants in tests/property_tests.rs
Roundtrip test: all 27 type keywords parse and convert without panic

The test suite includes 150 passing tests with zero clippy warnings.

Common Pitfalls and Gotchas#

1. Forgetting NaN Checks#

Pitfall: Implementing epsilon comparison without explicit NaN handling:

// WRONG: Produces incorrect results for NaN
fn floats_equal_naive(a: f64, b: f64, epsilon: f64) -> bool {
    (a - b).abs() < epsilon // NaN - NaN = NaN, NaN.abs() = NaN, NaN < epsilon = false
}

The expression NaN - NaN produces NaN, and NaN < epsilon is always false, giving the correct result by accident. However, the logic is incorrect and fragile. Explicit NaN checks are required for clarity and correctness.

2. Infinity Subtraction Trap#

Pitfall: Not handling infinity before epsilon comparison:

// WRONG: Fails for infinity
fn floats_equal_naive(a: f64, b: f64, epsilon: f64) -> bool {
    if a.is_nan() || b.is_nan() { return false; }
    (a - b).abs() < epsilon // inf - inf = NaN, breaks the comparison
}

When both operands are infinity, inf - inf = NaN, causing the comparison to return false even when both infinities have the same sign. Infinities must be checked and handled with exact comparison before the epsilon formula.

3. Epsilon Too Small#

Pitfall: Using machine epsilon (f64::EPSILON) for values far from 1.0:

For very large values (e.g., 1e10), machine epsilon becomes insignificant:

1e10 and 1e10 + 1.0 differ by 1.0
But |1e10 - (1e10 + 1.0)| = 1.0, which is much larger than f64::EPSILON ≈ 2e-16
So they are correctly distinguished as unequal

However, if computational error is expected at the magnitude of the values, an absolute epsilon of f64::EPSILON may be too strict. Applications may need larger tolerances based on expected error.

4. Applying Epsilon to Ordering#

Pitfall: Using epsilon comparison in ordering operators:

// WRONG: Breaks transitivity
fn partial_less_than(a: f64, b: f64, epsilon: f64) -> bool {
    (b - a) > epsilon // Attempting epsilon-aware less-than
}

This violates transitivity and creates inconsistent ordering relationships. Ordering operators must use strict IEEE 754 comparison via partial_cmp() to preserve mathematical properties.

5. Sign Bit Comparisons#

Pitfall: Using bitwise comparison (memcmp or integer casting) for floats:

Floating-point values should never be compared by casting to integers or using bitwise operations (except for specialized ULP comparison). Signed zeros, denormalized numbers, and NaN encodings have complex bit patterns that don't correspond to arithmetic value ordering. Always use IEEE 754-aware comparison methods.

6. Incompatible Epsilon Across Magnitudes#

Pitfall: Using absolute epsilon for values across many orders of magnitude:

An absolute epsilon of 1e-10 is appropriate for values near 1.0 but:

Too large for values near 1e-15 (may conflate distinct values)
Too small for values near 1e10 (may fail to account for computational error)

The libmagic-rs implementation uses fixed absolute epsilon, which is appropriate for magic file type detection where floating-point values typically represent file format metadata in predictable ranges.

Cross-Type Integer Coercion Pattern#

The Cross-Type Integer Coercion Pattern demonstrates a similar architectural principle: complex type-pair handling logic is consolidated in helper functions, with operator functions serving as simple delegators. This pattern ensures "single source of truth" for comparison semantics and extends to float comparisons through the floats_equal() helper.

Comparison Operators and compare_values() Helper Pattern#

The operator architecture uses a compare_values() helper that returns Option<Ordering>. This design naturally accommodates float comparisons:

Some(Ordering::Equal | Less | Greater) for comparable values
None for incomparable cases (e.g., NaN comparisons, incompatible types)

The Float Epsilon Equality Pattern fits naturally into this architecture, with equality operators using floats_equal() while ordering operators use partial_cmp() through compare_values().

IEEE 754 Standard#

The IEEE 754 floating-point standard defines the representation and behavior of floating-point arithmetic. The Float Epsilon Equality Pattern explicitly conforms to IEEE 754 semantics for NaN, infinity, and signed zero handling, ensuring compatibility with standard floating-point operations across languages and platforms.

Relevant Code Files#

File Path	Description
`src/evaluator/operators/equality.rs`	Equality operator implementations with `floats_equal()` helper and `FLOAT_EPSILON` constant
`src/evaluator/operators/comparison.rs`	Ordering operator implementations using strict IEEE 754 `partial_cmp()` semantics
`src/parser/ast.rs`	`Value::Float(f64)` variant and `TypeKind::Float`/`TypeKind::Double` definitions
`src/evaluator/types/float.rs`	`read_float()` and `read_double()` implementations with endianness support
`src/parser/types.rs`	Six type keywords (`float`, `befloat`, `lefloat`, `double`, `bedouble`, `ledouble`)
`tests/evaluator_tests.rs`	Integration tests for float/double magic rules
`tests/property_tests.rs`	Property-based testing with `arb_type_kind` including float types

References#

IEEE 754 Floating-Point Standard - Official specification for floating-point arithmetic
libmagic-rs PR #162: Float and Double Type Implementation - Implementation pull request with full code changes
libmagic-rs Issue #40: Float and Double Type Implementation - Original specification and design discussion
Comparing Floating Point Numbers - Comprehensive guide to floating-point comparison approaches
What Every Computer Scientist Should Know About Floating-Point Arithmetic - Classic paper on floating-point behavior