Evaluator Engine#
The evaluator engine executes magic rules against file buffers to identify file types. It provides safe, efficient rule evaluation with hierarchical processing, graceful error recovery, and configurable resource limits.
Overview#
The evaluator processes magic rules hierarchically:
- Load file into memory-mapped buffer
- Resolve offsets (absolute, relative, from-end)
- Read typed values from buffer with bounds checking
- Apply operators for comparison
- Process children if parent rule matches
- Collect results with match metadata
Architecture#
File Buffer → Offset Resolution → Type Reading → Operator Application → Results
↑ ↑ ↑ ↑ ↑
Memory Map Context State Endian Handling Match Logic Hierarchical
Module Organization#
The evaluator module separates public interface from implementation:
evaluator/mod.rs- Public API surface: definesEvaluationContextandRuleMatchtypes, re-exports core evaluation functions from the engine submoduleevaluator/engine/mod.rs- Core evaluation implementation:evaluate_single_rule,evaluate_rules,evaluate_rules_with_configevaluator/offset/mod.rs- Offset resolutionevaluator/operators/mod.rs- Operator applicationevaluator/types/- Type reading and coercion (organized as submodules as of v0.4.2)types/mod.rs- Public API surface:read_typed_value,coerce_value_to_type, re-exports type functionstypes/numeric.rs- Numeric type handling:read_byte,read_short,read_long,read_quadwith endianness and signedness supporttypes/float.rs- Floating-point type handling:read_float(32-bit IEEE 754),read_double(64-bit IEEE 754) with endianness supporttypes/date.rs- Date and timestamp type handling:read_date(32-bit Unix timestamps),read_qdate(64-bit Unix timestamps) with endianness and UTC/local time supporttypes/string.rs- String type handling:read_stringwith null-termination and UTF-8 conversion,read_pstringwith configurable length-prefix widths (1, 2, or 4 bytes)types/tests.rs- Module tests
evaluator/strength.rs- Rule strength calculation
The refactoring improves organization by separating concerns: mod.rs handles the public API surface and data types, while engine/ contains the core evaluation logic. The types module was refactored in v0.4.2 from a single 1,836-line file into focused submodules for numeric, floating-point, date/timestamp, and string handling, improving maintainability without changing the public API. From a public API perspective, all types and functions are imported from the evaluator module as before -- the internal organization is transparent to library users.
Core Components#
EvaluationContext#
Maintains state during rule processing:
pub struct EvaluationContext {
/// Current offset position for relative calculations
current_offset: usize,
/// Current recursion depth for safety limits
recursion_depth: u32,
/// Configuration for evaluation behavior
config: EvaluationConfig,
}
Note: Fields are private; use accessor methods like current_offset(), recursion_depth(), and config().
Key Methods:
new()- Create context with default configurationcurrent_offset()/set_current_offset()- Track current buffer positionrecursion_depth()- Query current recursion depthincrement_recursion_depth()/decrement_recursion_depth()- Track recursion safelytimeout_ms()- Query configured timeoutreset()- Reset context state for reuse
RuleMatch#
Represents a successful rule match:
pub struct RuleMatch {
/// Human-readable description from the matched rule
pub message: String,
/// Offset where the match occurred
pub offset: usize,
/// Depth in the rule hierarchy (0 = root rule)
pub level: u32,
/// The matched value (parsed according to rule type)
pub value: Value,
/// Confidence score (0.0 to 1.0) based on rule hierarchy depth
pub confidence: f64,
}
The Value type is from parser::ast::Value and represents the actual matched content according to the rule's type specification. Note that Value implements only PartialEq (not Eq) due to floating-point NaN semantics.
Offset Resolution (evaluator/offset.rs)#
Handles all offset types safely:
- Absolute offsets: Direct file positions (
0,0x100) - Relative offsets: Based on previous match positions (
&+4) - From-end offsets: Calculated from file size (
-4from end) - Bounds checking: All offset calculations are validated
pub fn resolve_offset(
spec: &OffsetSpec,
buffer: &[u8],
) -> Result<usize, LibmagicError>
Type Reading (evaluator/types/)#
Interprets bytes according to type specifications. The types module is organized into submodules for numeric, floating-point, date/timestamp, and string type handling (refactored from a single file in v0.4.2):
- Byte: Single byte values (signed or unsigned)
- Short: 16-bit integers with endianness
- Long: 32-bit integers with endianness
- Quad: 64-bit integers with endianness
- Float: 32-bit IEEE 754 floating-point with endianness (native, big-endian
befloat, little-endianlefloat) - Double: 64-bit IEEE 754 floating-point with endianness (native, big-endian
bedouble, little-endianledouble) - Date: 32-bit Unix timestamps (signed seconds since epoch) with configurable endianness and UTC/local time formatting
- QDate: 64-bit Unix timestamps (signed seconds since epoch) with configurable endianness and UTC/local time formatting
- String: Byte sequences with length limits
- PString: Pascal-style length-prefixed strings with 1-byte (
/B), 2-byte (/Hor/h), or 4-byte (/Lor/l) length prefixes, supporting big-endian and little-endian byte order - Bounds checking: Prevents buffer overruns
pub fn read_typed_value(
buffer: &[u8],
offset: usize,
type_kind: &TypeKind,
) -> Result<Value, TypeReadError>
The read_byte function signature changed in v0.2.0 to accept three parameters (buffer, offset, and signed) instead of two, allowing explicit control over signed vs unsigned byte interpretation.
Floating-Point Type Reading (evaluator/types/float.rs):
pub fn read_float(
buffer: &[u8],
offset: usize,
endian: Endianness,
) -> Result<Value, TypeReadError>
pub fn read_double(
buffer: &[u8],
offset: usize,
endian: Endianness,
) -> Result<Value, TypeReadError>
read_float()reads 4 bytes and interprets asf32, converting tof64and returningValue::Float(f64)read_double()reads 8 bytes and interprets asf64, returningValue::Float(f64)- Both respect endianness specified in
TypeKind::FloatorTypeKind::Double
Date and QDate Type Reading (evaluator/types/date.rs):
pub fn read_date(
buffer: &[u8],
offset: usize,
endian: Endianness,
utc: bool,
) -> Result<Value, TypeReadError>
pub fn read_qdate(
buffer: &[u8],
offset: usize,
endian: Endianness,
utc: bool,
) -> Result<Value, TypeReadError>
read_date()reads 4 bytes as a 32-bit Unix timestamp (seconds since epoch) and returnsValue::Stringformatted as"Www Mmm DD HH:MM:SS YYYY"to match GNU file outputread_qdate()reads 8 bytes as a 64-bit Unix timestamp (seconds since epoch) and returnsValue::Stringformatted as"Www Mmm DD HH:MM:SS YYYY"to match GNU file output- Both support endianness (little-endian, big-endian, native)
- Both support UTC or local time formatting
- The evaluator reads raw integer timestamps from the buffer and converts them to formatted date strings for comparison
- Example: A 32-bit value
1234567890at offset 0 with typeldatewould be evaluated as"Fri Feb 13 23:31:30 2009"
Pascal String Type Reading (evaluator/types/string.rs):
pub fn read_pstring(
buffer: &[u8],
offset: usize,
max_length: Option<usize>,
length_width: PStringLengthWidth,
length_includes_itself: bool,
) -> Result<Value, TypeReadError>
read_pstring()reads a length-prefixed Pascal string with configurable prefix width- Length prefix width (
length_width):PStringLengthWidth::OneByte- 1-byte length prefix (/Bsuffix, default)PStringLengthWidth::TwoByteBE- 2-byte big-endian length prefix (/Hsuffix)PStringLengthWidth::TwoByteLE- 2-byte little-endian length prefix (/hsuffix)PStringLengthWidth::FourByteBE- 4-byte big-endian length prefix (/Lsuffix)PStringLengthWidth::FourByteLE- 4-byte little-endian length prefix (/lsuffix)
- Length interpretation:
- Reads 1, 2, or 4 bytes from buffer using
from_be_bytesorfrom_le_bytesdepending on variant - The length value specifies how many bytes of string data follow the prefix
- Reads 1, 2, or 4 bytes from buffer using
/Jflag (length_includes_itself):- When
true, the stored length value includes the prefix width itself (JPEG-style) - The evaluator subtracts the prefix width (1, 2, or 4 bytes) from the length to get effective content length
- Example: A 2-byte big-endian prefix with value
7and/Jflag yields7 - 2 = 5bytes of string content
- When
- Returns
Value::Stringwith UTF-8 conversion (using lossy conversion for invalid UTF-8)
Operator Application (evaluator/operators.rs)#
Applies comparison operations:
- Equal (
=,==): Exact value matching - NotEqual (
!=,<>): Non-matching values - LessThan (
<): Less-than comparison (numeric or lexicographic) (added in v0.2.0) - GreaterThan (
>): Greater-than comparison (numeric or lexicographic) (added in v0.2.0) - LessEqual (
<=): Less-than-or-equal comparison (numeric or lexicographic) (added in v0.2.0) - GreaterEqual (
>=): Greater-than-or-equal comparison (numeric or lexicographic) (added in v0.2.0) - BitwiseAnd (
&): Pattern matching for flags - BitwiseAndMask: AND with mask then compare
Comparison operators support numeric comparisons across different integer types using i128 coercion for cross-type compatibility.
Floating-Point Operator Semantics:
Float values (Value::Float) work with comparison and equality operators but have special handling:
- Equality operators (
==,!=): Use epsilon-aware comparison withf64::EPSILONtolerance- Two floats are considered equal when
|a - b| <= f64::EPSILON - Implementation is in
floats_equal()helper function (evaluator/operators/equality.rs)
- Two floats are considered equal when
- Ordering operators (
<,>,<=,>=): Use IEEE 754partial_cmpsemantics- Standard floating-point ordering:
-∞ < finite values < +∞ - Implementation is in
compare_values()function (evaluator/operators/comparison.rs)
- Standard floating-point ordering:
- NaN handling:
NaN != NaNreturnstrue(NaN is never equal to anything, including itself)- All comparison operations with NaN return
false(NaN is not comparable)
- Infinity handling:
- Positive and negative infinity are only equal to the same sign of infinity
- Infinities are ordered correctly:
NEG_INFINITY < finite < INFINITY
- Type mismatch: Float values cannot be compared with
IntorUint(returnsfalseorNone)
pub fn apply_operator(
operator: &Operator,
left: &Value,
right: &Value,
) -> bool
Example with comparison operators:
use libmagic_rs::parser::ast::{Operator, Value};
use libmagic_rs::evaluator::operators::apply_operator;
// Less-than comparison (v0.2.0+)
assert!(apply_operator(
&Operator::LessThan,
&Value::Uint(5),
&Value::Uint(10)
));
// Greater-than-or-equal comparison (v0.2.0+)
assert!(apply_operator(
&Operator::GreaterEqual,
&Value::Uint(10),
&Value::Uint(10)
));
// Cross-type integer comparison (v0.2.0+)
assert!(apply_operator(
&Operator::LessThan,
&Value::Int(-1),
&Value::Uint(0)
));
Example with floating-point operators:
use libmagic_rs::parser::ast::{Operator, Value};
use libmagic_rs::evaluator::operators::apply_operator;
// Epsilon-aware equality
assert!(apply_operator(
&Operator::Equal,
&Value::Float(1.0),
&Value::Float(1.0 + f64::EPSILON)
));
// Float ordering
assert!(apply_operator(
&Operator::LessThan,
&Value::Float(1.5),
&Value::Float(2.0)
));
// NaN inequality
assert!(apply_operator(
&Operator::NotEqual,
&Value::Float(f64::NAN),
&Value::Float(f64::NAN)
));
// Infinity comparison
assert!(apply_operator(
&Operator::LessThan,
&Value::Float(f64::NEG_INFINITY),
&Value::Float(0.0)
));
Evaluation Algorithm#
The evaluator uses a depth-first hierarchical algorithm:
pub fn evaluate_rules(
rules: &[MagicRule],
buffer: &[u8],
) -> Result<Vec<RuleMatch>, EvaluationError>
Algorithm:
-
For each root rule:
- Resolve offset from buffer
- Read value at offset according to type
- Apply operator to compare actual vs expected
- If match: add to results, recursively evaluate children
- If no match: skip children, continue to next rule
-
Child rules inherit context from parent match
-
Results accumulate hierarchically (parent message + child details)
Hierarchical Processing#
Configuration#
Evaluation behavior is controlled via EvaluationConfig:
pub struct EvaluationConfig {
/// Maximum recursion depth for nested rules (default: 20)
pub max_recursion_depth: u32,
/// Maximum string length to read (default: 8192)
pub max_string_length: usize,
/// Stop at first match or continue for all matches (default: true)
pub stop_at_first_match: bool,
/// Enable MIME type mapping in results (default: false)
pub enable_mime_types: bool,
/// Timeout for evaluation in milliseconds (default: None)
pub timeout_ms: Option<u64>,
}
Preset Configurations:
// Default balanced configuration
let config = EvaluationConfig::default();
// Optimized for speed
let config = EvaluationConfig::performance();
// Find all matches with full details
let config = EvaluationConfig::comprehensive();
Safety Features#
Memory Safety#
- Bounds checking: All buffer access is validated before reading
- Integer overflow protection: Safe arithmetic using
checked_*andsaturating_* - Resource limits: Configurable limits prevent resource exhaustion
Error Handling#
The evaluator uses graceful degradation:
- Invalid offsets: Skip rule, continue with others
- Type mismatches: Skip rule, continue with others
- Timeout exceeded: Return error (partial results are not preserved)
- Recursion limit: Stop descent, continue siblings
pub enum EvaluationError {
BufferOverrun { offset: usize },
InvalidOffset { offset: i64 },
UnsupportedType { type_name: String },
RecursionLimitExceeded { depth: u32 },
StringLengthExceeded { length: usize, max_length: usize },
InvalidStringEncoding { offset: usize },
Timeout { timeout_ms: u64 },
TypeReadError(TypeReadError),
}
Timeout Protection#
// With 5 second timeout
let config = EvaluationConfig {
timeout_ms: Some(5000),
..Default::default()
};
let result = evaluate_rules_with_config(&rules, buffer, &config)?;
API Reference#
Primary Functions#
/// Evaluate rules with context for recursion tracking
pub fn evaluate_rules(
rules: &[MagicRule],
buffer: &[u8],
context: &mut EvaluationContext,
) -> Result<Vec<RuleMatch>, LibmagicError>;
/// Evaluate rules with custom configuration (creates context internally)
pub fn evaluate_rules_with_config(
rules: &[MagicRule],
buffer: &[u8],
config: &EvaluationConfig,
) -> Result<Vec<RuleMatch>, LibmagicError>;
/// Evaluate a single rule (used internally and for testing)
pub fn evaluate_single_rule(
rule: &MagicRule,
buffer: &[u8],
) -> Result<Option<(usize, Value)>, LibmagicError>;
Usage Example#
use libmagic_rs::{evaluate_rules, EvaluationConfig};
use libmagic_rs::parser::parse_text_magic_file;
// Parse magic rules
let magic_content = r#"
0 string \x7fELF ELF executable
>4 byte 1 32-bit
>4 byte 2 64-bit
"#;
let rules = parse_text_magic_file(magic_content)?;
// Read target file
let buffer = std::fs::read("sample.bin")?;
// Evaluate with default config
let matches = evaluate_rules(&rules, &buffer)?;
for m in matches {
println!("Match at offset {}: {}", m.offset, m.message);
}
Example with comparison operators (v0.2.0+):
use libmagic_rs::{evaluate_rules, EvaluationConfig};
use libmagic_rs::parser::parse_text_magic_file;
// Parse magic rule with comparison operator
let magic_content = r#"
0 leshort <100 Small value detected
0 leshort >=1000 Large value detected
"#;
let rules = parse_text_magic_file(magic_content)?;
let buffer = vec![0x0A, 0x00]; // Little-endian 10
let matches = evaluate_rules(&rules, &buffer)?;
// Matches first rule (<100)
assert_eq!(matches[0].message, "Small value detected");
Example with floating-point types:
use libmagic_rs::{evaluate_rules, EvaluationConfig};
use libmagic_rs::parser::parse_text_magic_file;
// Parse magic rule with float type
let magic_content = r#"
0 lefloat 3.14159 Pi constant detected
0 bedouble >100.0 Large double value
"#;
let rules = parse_text_magic_file(magic_content)?;
// IEEE 754 little-endian representation of 3.14159f32
let buffer = vec![0xd0, 0x0f, 0x49, 0x40];
let matches = evaluate_rules(&rules, &buffer)?;
assert_eq!(matches[0].message, "Pi constant detected");
Example with pstring types:
use libmagic_rs::{evaluate_rules, EvaluationConfig};
use libmagic_rs::parser::parse_text_magic_file;
// Parse magic rules with pstring variants
let magic_content = r#"
0 pstring/B MAGIC Pascal string (1-byte prefix)
0 pstring/H =\x00\x05MAGIC Pascal string (2-byte BE prefix)
0 pstring/h =\x05\x00MAGIC Pascal string (2-byte LE prefix)
0 pstring/L =\x00\x00\x00\x05MAGIC Pascal string (4-byte BE prefix)
0 pstring/l =\x05\x00\x00\x00MAGIC Pascal string (4-byte LE prefix)
"#;
let rules = parse_text_magic_file(magic_content)?;
// 1-byte prefix: length=5, then "MAGIC"
let buffer = b"\x05MAGIC";
let matches = evaluate_rules(&rules, &buffer)?;
assert_eq!(matches[0].message, "Pascal string (1-byte prefix)");
// 2-byte big-endian prefix with /J flag: stored length 7 (includes 2-byte prefix), effective content 5 bytes
let magic_content_j = r#"
0 pstring/HJ =MAGIC JPEG-style pstring with self-inclusive length
"#;
let rules_j = parse_text_magic_file(magic_content_j)?;
let buffer_j = b"\x00\x07MAGIC"; // 2-byte BE prefix: value 7, minus 2 = 5 bytes of content
let matches_j = evaluate_rules(&rules_j, &buffer_j)?;
assert_eq!(matches_j[0].message, "JPEG-style pstring with self-inclusive length");
Implementation Status#
- Basic evaluation engine structure
- Offset resolution (absolute, relative, from-end)
- Type reading with endianness support (Byte, Short, Long, Quad, Float, Double, Date, QDate, String, PString with 1/2/4-byte prefixes)
- Operator application (Equal, NotEqual, LessThan, GreaterThan, LessEqual, GreaterEqual, BitwiseAnd, BitwiseAndMask)
- Hierarchical rule processing with child evaluation
- Error handling with graceful degradation
- Timeout protection
- Recursion depth limiting
- Comprehensive test coverage (150+ tests)
- Indirect offset support (pointer dereferencing)
- Regex type support
- Performance optimizations (rule ordering, caching)
Performance Considerations#
Lazy Evaluation#
- Parent-first: Only evaluate children if parent matches
- Early termination: Stop on first match when configured
- Skip on error: Continue evaluation after non-fatal errors
Memory Efficiency#
- Memory mapping: Files accessed via mmap, not loaded entirely
- Zero-copy reads: Slice references where possible
- Bounded strings: String reads limited to prevent memory exhaustion