Parser Doubled Operator Rejection Pattern#
The Parser Doubled Operator Rejection Pattern is a validation technique in the libmagic-rs parser that prevents silent misparsing of malformed operator syntax by explicitly rejecting doubled operator sequences like &&, ===, and similar invalid token combinations. Implemented in the parse_operator function of src/parser/grammar.rs, this pattern uses manual lookahead checking via .starts_with() to detect when additional operator characters follow a successfully parsed operator token, returning an error rather than allowing the parser to continue with incorrect tokenization.
The pattern emerged from the need to prevent ambiguous tokenization in the magic rule format, where operators like & (bitwise AND) and = (equality) must be distinguished from their doubled variants that might appear due to typographical errors or confusion with other programming language syntax (e.g., && for logical AND in C, === for strict equality in JavaScript). Without these guards, input like 0 uquad && 0xff could be silently misparsed: the first & consumed as a standalone operator, the second & parsed separately, and 0xff treated as the comparison value—producing a syntactically valid but semantically incorrect rule.
This architectural pattern represents reusable knowledge for parser validation: when the same token type can appear in multiple parsing contexts, all code paths must implement consistent validation guards to prevent partial matching from causing silent correctness bugs. The pattern is particularly critical in domain-specific languages like magic file format, where subtle parsing differences can cause incorrect file type identification that is difficult to debug.
Problem: Silent Misparsing Through Partial Token Matching#
The Misparsing Scenario#
The magic rule format follows the syntax:
offset type [operator] [mask] value message
For example:
- Valid:
0 long = 0xcafebabe Java class file - Valid:
4 byte & 0x80 (compressed) - Invalid:
0 long === 0xcafebabe(triple equals not supported) - Invalid:
4 byte && 0x80(double ampersand not supported)
Without explicit guards, a naive parser implementation could partially match invalid sequences:
Example 1: Triple Equals (===)
- Input:
0 long === 0xcafebabe - Without guard: Parser matches
==, leaves= 0xcafebabeas remaining input - Result: The remaining
=could be misinterpreted as a new token or silently consume part of the value - Expected: Parse error rejecting
===as invalid operator syntax
Example 2: Double Ampersand (&&)
- Input:
4 byte && 0x80 - Without guard: Parser matches
&, leaves& 0x80as remaining input - Result: Second
&parsed as separate standalone operator,0x80becomes value - Expected: Parse error rejecting
&&as invalid operator syntax
These silent failures are particularly dangerous in file format detection because the parser produces valid AST structures that lead to incorrect file type identification, making bugs difficult to trace back to the parsing stage.
Why Partial Matching Occurs#
The parser uses sequential if let checks rather than nom's alt() combinator to maintain explicit precedence control. This approach requires manual validation because each operator check consumes only the characters it recognizes:
// Without guard - VULNERABLE TO MISPARSING
if let Ok((remaining, _)) = tag("&")(input) {
return Ok((remaining, Operator::BitwiseAnd));
}
// If input is "&&0xff", this matches "&", leaving "&0xff" unparsed
The remaining input &0xff then continues through the parser, potentially being processed again as a separate operator, creating incorrect rule structure.
Solution: Manual Lookahead Guards#
Implementation Pattern#
The doubled operator rejection pattern adds explicit validation after successful operator matching:
if let Ok((remaining, _)) = tag::<&str, &str, nom::error::Error<&str>>("&")(input) {
// Check that we don't have another '&' following (to reject "&&")
if remaining.starts_with('&') {
return Err(nom::Err::Error(nom::error::Error::new(
input,
nom::error::ErrorKind::Tag,
)));
}
let (remaining, _) = multispace0(remaining)?;
return Ok((remaining, Operator::BitwiseAnd));
}
This implementation:
- Matches the valid operator using
tag("&")to consume a single&character - Checks remaining input via
remaining.starts_with('&')to detect doubled operators - Returns error immediately if invalid sequence detected, preventing misparsing
- Continues normally if validation passes, consuming optional whitespace
Complete Implementation in parse_operator#
The parse_operator function implements three doubled operator guards:
Guard 1: Reject === after matching ==
Lines 178-184:
if let Ok((remaining, _)) = tag::<&str, &str, nom::error::Error<&str>>("==")(input) {
// Check that we don't have another '=' following (to reject "===")
if remaining.starts_with('=') {
return Err(nom::Err::Error(nom::error::Error::new(
input,
nom::error::ErrorKind::Tag,
)));
}
let (remaining, _) = multispace0(remaining)?;
return Ok((remaining, Operator::Equal));
}
Guard 2: Reject == after matching single =
Lines 209-219:
if let Ok((remaining, _)) = tag::<&str, &str, nom::error::Error<&str>>("=")(input) {
// Check that we don't have another '=' following (to reject "==")
if remaining.starts_with('=') {
return Err(nom::Err::Error(nom::error::Error::new(
input,
nom::error::ErrorKind::Tag,
)));
}
let (remaining, _) = multispace0(remaining)?;
return Ok((remaining, Operator::Equal));
}
Guard 3: Reject && after matching single &
Lines 221-228:
if let Ok((remaining, _)) = tag::<&str, &str, nom::error::Error<&str>>("&")(input) {
// Check that we don't have another '&' following (to reject "&&")
if remaining.starts_with('&') {
return Err(nom::Err::Error(nom::error::Error::new(
input,
nom::error::ErrorKind::Tag,
)));
}
let (remaining, _) = multispace0(remaining)?;
return Ok((remaining, Operator::BitwiseAnd));
}
Test Coverage#
The test test_parse_operator_invalid_input explicitly verifies the guards work correctly:
#[test]
fn test_parse_operator_invalid_input() {
// Should fail on invalid operators
assert!(parse_operator("").is_err());
assert!(parse_operator("abc").is_err());
assert!(parse_operator("123").is_err());
assert!(parse_operator("!").is_err());
assert!(parse_operator("===").is_err()); // Too many equals
assert!(parse_operator("&&").is_err()); // Double ampersand not supported
}
This test ensures the guards reject doubled operators as expected while still allowing valid sequences like == and single = to parse correctly.
Relationship to Token Ordering Pattern#
The doubled operator rejection pattern works in conjunction with the token ordering pattern, which requires parsing longer operators before shorter prefixes:
Token Ordering Requirements:
- Parse
==before single= - Parse
!=and<>before checking for invalid sequences - Parse
<=and>=before single<and>
This ordering was implemented in PR #104 when comparison operators were added. The pattern is:
Parsing Sequence (lines 173-248):
1. Check for "==" (two-character valid operator)
- Guard: reject if "===" detected
2. Check for "!=" and "<>" (two-character valid operators)
3. Check for "<=" and ">=" (two-character valid operators)
4. Check for single "<" and ">" (one-character valid operators)
5. Check for single "=" (one-character valid operator)
- Guard: reject if "==" detected
6. Check for "&" (one-character valid operator)
- Guard: reject if "&&" detected
Without proper ordering, byte<=255 would parse as byte < =255, causing silent misparsing where = is treated as the operand instead of recognizing <= as a single operator.
Multiple Parsing Paths Consideration#
parse_operator vs parse_type_and_operator#
The original background context mentioned two places where operators are parsed:
parse_operator- Standalone operator between type and valueparse_type_and_operator- Attached operator likeuquad&0xff
Research shows that parse_type_and_operator uses a different approach: it employs char('&') from nom, which only matches a single character and cannot match doubled operators by design. Therefore, it does not require the same explicit guards:
// From parse_type_and_operator (lines 1576-1580)
let (input, attached_op) = opt(alt((
// Parse &mask format
map(pair(char('&'), parse_number), |(_, mask)| {
Operator::BitwiseAndMask(mask.unsigned_abs())
}),
// Parse standalone & (for backward compatibility)
map(char('&'), |_| Operator::BitwiseAnd),
)))(input)?;
The char('&') combinator inherently prevents matching && because it consumes exactly one character. This demonstrates an important principle: the need for doubled operator guards depends on the parsing approach used. Sequential if let checks with tag() require explicit guards, while nom combinators like char() provide implicit protection.
Architectural Lessons#
When to Apply This Pattern#
The doubled operator rejection pattern should be applied when:
- Multiple character sequences share a common prefix (e.g.,
=,==,===) - Sequential matching is used instead of alternatives (manual
if letinstead ofalt()) - Silent partial matching would produce valid but incorrect AST
- The same token appears in multiple parsing contexts (requires consistent validation across all paths)
Alternative Approaches Considered#
The parser could use nom's alt() combinator, but this was rejected because:
- No ordering control:
alt()tries parsers in order but doesn't enforce semantic constraints - Less explicit error handling: Harder to distinguish between "no operator found" vs "invalid operator syntax"
- Reduced flexibility: Manual checks allow custom validation logic beyond simple token matching
The manual lookahead approach provides explicit control over validation and error reporting at the cost of some code duplication.
Relationship to Parser-Evaluator Architecture#
The libmagic-rs parser follows the Parser-Evaluator Architecture, which consists of:
- Preprocessing - Remove comments, handle line continuations
- Parsing - Convert lines to AST structures
- Hierarchy building - Construct parent-child relationships from indentation
- Evaluation - Execute rules against file buffers
The doubled operator rejection pattern operates in the parsing stage, ensuring that only valid operators reach the AST. The Operator enum defines all valid operator variants:
pub enum Operator {
Equal,
NotEqual,
LessThan,
GreaterThan,
LessEqual,
GreaterEqual,
BitwiseAnd,
BitwiseAndMask(u64),
// No variants for && or === because they are invalid
}
By rejecting invalid operator syntax at parse time, the pattern ensures the evaluator never encounters malformed operators in the AST, maintaining clear separation between parsing validation and evaluation logic.
Historical Context#
Evolution of Operator Support#
PR #4 (merged October 2, 2025) introduced the initial parser infrastructure with basic operator parsing, including:
- Equality operators (
=,==) - Inequality operators (
!=,<>) - Bitwise AND (
&) - "Rejects ambiguous or unsupported sequences" through "careful precedence and whitespace handling"
PR #104 (merged March 1, 2026) added comparison operators:
LessThan(<)GreaterThan(>)LessEqual(<=)GreaterEqual(>=)
This PR also included critical error handling fixes that addressed silent misparsing by replacing catch-all error patterns with explicit error variant matching. The doubled operator rejection guards were present from the initial implementation in PR #4, suggesting they were recognized as essential from the start of operator parsing development.
No Specific Bug Report Found#
Research did not find a specific issue or PR explicitly titled "fix doubled operator parsing bug." This suggests the pattern was implemented proactively as a defensive validation technique rather than in response to a discovered bug. The test case explicitly checking === and && indicates the developers anticipated these error patterns during initial implementation.
Usage Examples#
Valid Operator Syntax#
From test cases and builtin rules:
# Equality (both forms accepted)
0 belong 0x7f454c46 ELF 32-bit
0 belong =0x7f454c46 ELF 32-bit
0 belong ==0x7f454c46 ELF 32-bit
# Inequality
>4 byte !=0 Non-zero
>4 byte <>0 Non-zero (alternate syntax)
# Comparison
0 leshort <100 Small value
0 leshort >=1000 Large value
# Bitwise AND
4 byte &0x80 High bit set
# Bitwise AND with mask
0 lelong&0xf0000000 0x10000000 MIPS-II executable
Invalid Operator Syntax (Rejected)#
These inputs are explicitly rejected by the doubled operator guards:
# Triple equals - rejected by === guard
0 long ===0xcafebabe INVALID
# Double ampersand - rejected by && guard
4 byte &&0x80 INVALID
# Double equals after single = - rejected by == guard
0 long =0x0 Valid
0 long ==0x0 Valid
0 long ===0x0 INVALID
Related Patterns#
Enum Extension and Exhaustive Match Synchronization#
When extending the Operator enum, developers must:
- Add new variants to
src/parser/ast.rs - Update
parse_operatortoken ordering and guards insrc/parser/grammar.rs - Modify evaluation logic in evaluator module
- Update strength scoring
- Update build-time serialization in
build.rsandsrc/build_helpers.rs - Add property test strategies
This synchronization ensures new operators receive proper validation guards and don't introduce silent misparsing bugs.
Parser-Evaluator Architecture#
The separation between parsing and evaluation ensures validation happens at parse time. The doubled operator rejection pattern is a parser responsibility, producing a clean AST that the evaluator can trust contains only valid operators.
Relevant Code Files#
| File Path | Description | Key Functions/Types |
|---|---|---|
src/parser/grammar.rs (2,448 lines) | Main parser implementation with operator validation | parse_operator (lines 173-248), parse_type_and_operator (lines 1549-1643), operator test cases (lines 891-899) |
src/parser/ast.rs | AST type definitions | Operator enum (lines 106-117), MagicRule struct (lines 175-192) |
src/parser/mod.rs | Parser module entry point | Three-stage parsing pipeline (lines 191-194) |
src/builtin_rules.magic | Example magic rules | Valid operator usage patterns |
Related Topics#
- Type System And Operator Coverage - Overview of supported types and operators, token ordering requirements
- Enum Extension And Exhaustive Match Synchronization - Pattern for maintaining consistency when adding new operators
- Parser-Evaluator Architecture - High-level architecture separating parsing from evaluation
- Magic File Format Specification - The domain-specific language syntax being parsed
- Nom Parser Combinators - The underlying parsing library used for tokenization