Parser Doubled Operator Rejection Pattern#

The Parser Doubled Operator Rejection Pattern is a validation technique in the libmagic-rs parser that prevents silent misparsing of malformed operator syntax by explicitly rejecting doubled operator sequences like &&, ===, and similar invalid token combinations. Implemented in the parse_operator function of src/parser/grammar.rs, this pattern uses manual lookahead checking via .starts_with() to detect when additional operator characters follow a successfully parsed operator token, returning an error rather than allowing the parser to continue with incorrect tokenization.

The pattern emerged from the need to prevent ambiguous tokenization in the magic rule format, where operators like & (bitwise AND) and = (equality) must be distinguished from their doubled variants that might appear due to typographical errors or confusion with other programming language syntax (e.g., && for logical AND in C, === for strict equality in JavaScript). Without these guards, input like 0 uquad && 0xff could be silently misparsed: the first & consumed as a standalone operator, the second & parsed separately, and 0xff treated as the comparison value—producing a syntactically valid but semantically incorrect rule.

This architectural pattern represents reusable knowledge for parser validation: when the same token type can appear in multiple parsing contexts, all code paths must implement consistent validation guards to prevent partial matching from causing silent correctness bugs. The pattern is particularly critical in domain-specific languages like magic file format, where subtle parsing differences can cause incorrect file type identification that is difficult to debug.

Problem: Silent Misparsing Through Partial Token Matching#

The Misparsing Scenario#

The magic rule format follows the syntax:

offset type [operator] [mask] value message

For example:

Valid: 0 long = 0xcafebabe Java class file
Valid: 4 byte & 0x80 (compressed)
Invalid: 0 long === 0xcafebabe (triple equals not supported)
Invalid: 4 byte && 0x80 (double ampersand not supported)

Without explicit guards, a naive parser implementation could partially match invalid sequences:

Example 1: Triple Equals (===)

Input: 0 long === 0xcafebabe
Without guard: Parser matches ==, leaves = 0xcafebabe as remaining input
Result: The remaining = could be misinterpreted as a new token or silently consume part of the value
Expected: Parse error rejecting === as invalid operator syntax

Example 2: Double Ampersand (&&)

Input: 4 byte && 0x80
Without guard: Parser matches &, leaves & 0x80 as remaining input
Result: Second & parsed as separate standalone operator, 0x80 becomes value
Expected: Parse error rejecting && as invalid operator syntax

These silent failures are particularly dangerous in file format detection because the parser produces valid AST structures that lead to incorrect file type identification, making bugs difficult to trace back to the parsing stage.

Why Partial Matching Occurs#

The parser uses sequential if let checks rather than nom's alt() combinator to maintain explicit precedence control. This approach requires manual validation because each operator check consumes only the characters it recognizes:

// Without guard - VULNERABLE TO MISPARSING
if let Ok((remaining, _)) = tag("&")(input) {
    return Ok((remaining, Operator::BitwiseAnd));
}
// If input is "&&0xff", this matches "&", leaving "&0xff" unparsed

The remaining input &0xff then continues through the parser, potentially being processed again as a separate operator, creating incorrect rule structure.

Solution: Manual Lookahead Guards#

Implementation Pattern#

The doubled operator rejection pattern adds explicit validation after successful operator matching:

if let Ok((remaining, _)) = tag::<&str, &str, nom::error::Error<&str>>("&")(input) {
    // Check that we don't have another '&' following (to reject "&&")
    if remaining.starts_with('&') {
        return Err(nom::Err::Error(nom::error::Error::new(
            input,
            nom::error::ErrorKind::Tag,
        )));
    }
    let (remaining, _) = multispace0(remaining)?;
    return Ok((remaining, Operator::BitwiseAnd));
}

This implementation:

Matches the valid operator using tag("&") to consume a single & character
Checks remaining input via remaining.starts_with('&') to detect doubled operators
Returns error immediately if invalid sequence detected, preventing misparsing
Continues normally if validation passes, consuming optional whitespace

Complete Implementation in parse_operator#

The parse_operator function implements three doubled operator guards:

Guard 1: Reject === after matching ==
Lines 178-184:

if let Ok((remaining, _)) = tag::<&str, &str, nom::error::Error<&str>>("==")(input) {
    // Check that we don't have another '=' following (to reject "===")
    if remaining.starts_with('=') {
        return Err(nom::Err::Error(nom::error::Error::new(
            input,
            nom::error::ErrorKind::Tag,
        )));
    }
    let (remaining, _) = multispace0(remaining)?;
    return Ok((remaining, Operator::Equal));
}

Guard 2: Reject == after matching single =
Lines 209-219:

if let Ok((remaining, _)) = tag::<&str, &str, nom::error::Error<&str>>("=")(input) {
    // Check that we don't have another '=' following (to reject "==")
    if remaining.starts_with('=') {
        return Err(nom::Err::Error(nom::error::Error::new(
            input,
            nom::error::ErrorKind::Tag,
        )));
    }
    let (remaining, _) = multispace0(remaining)?;
    return Ok((remaining, Operator::Equal));
}

Guard 3: Reject && after matching single &
Lines 221-228:

if let Ok((remaining, _)) = tag::<&str, &str, nom::error::Error<&str>>("&")(input) {
    // Check that we don't have another '&' following (to reject "&&")
    if remaining.starts_with('&') {
        return Err(nom::Err::Error(nom::error::Error::new(
            input,
            nom::error::ErrorKind::Tag,
        )));
    }
    let (remaining, _) = multispace0(remaining)?;
    return Ok((remaining, Operator::BitwiseAnd));
}

Test Coverage#

The test test_parse_operator_invalid_input explicitly verifies the guards work correctly:

#[test]
fn test_parse_operator_invalid_input() {
    // Should fail on invalid operators
    assert!(parse_operator("").is_err());
    assert!(parse_operator("abc").is_err());
    assert!(parse_operator("123").is_err());
    assert!(parse_operator("!").is_err());
    assert!(parse_operator("===").is_err()); // Too many equals
    assert!(parse_operator("&&").is_err()); // Double ampersand not supported
}

This test ensures the guards reject doubled operators as expected while still allowing valid sequences like == and single = to parse correctly.

Relationship to Token Ordering Pattern#

The doubled operator rejection pattern works in conjunction with the token ordering pattern, which requires parsing longer operators before shorter prefixes:

Token Ordering Requirements:

Parse == before single =
Parse != and <> before checking for invalid sequences
Parse <= and >= before single < and >

This ordering was implemented in PR #104 when comparison operators were added. The pattern is:

Parsing Sequence (lines 173-248):
1. Check for "==" (two-character valid operator)
   - Guard: reject if "===" detected
2. Check for "!=" and "<>" (two-character valid operators)
3. Check for "<=" and ">=" (two-character valid operators)
4. Check for single "<" and ">" (one-character valid operators)
5. Check for single "=" (one-character valid operator)
   - Guard: reject if "==" detected
6. Check for "&" (one-character valid operator)
   - Guard: reject if "&&" detected

Without proper ordering, byte<=255 would parse as byte < =255, causing silent misparsing where = is treated as the operand instead of recognizing <= as a single operator.

Multiple Parsing Paths Consideration#

parse_operator vs parse_type_and_operator#

The original background context mentioned two places where operators are parsed:

parse_operator - Standalone operator between type and value
parse_type_and_operator - Attached operator like uquad&0xff

Research shows that parse_type_and_operator uses a different approach: it employs char('&') from nom, which only matches a single character and cannot match doubled operators by design. Therefore, it does not require the same explicit guards:

// From parse_type_and_operator (lines 1576-1580)
let (input, attached_op) = opt(alt((
    // Parse &mask format
    map(pair(char('&'), parse_number), |(_, mask)| {
        Operator::BitwiseAndMask(mask.unsigned_abs())
    }),
    // Parse standalone & (for backward compatibility)
    map(char('&'), |_| Operator::BitwiseAnd),
)))(input)?;

The char('&') combinator inherently prevents matching && because it consumes exactly one character. This demonstrates an important principle: the need for doubled operator guards depends on the parsing approach used. Sequential if let checks with tag() require explicit guards, while nom combinators like char() provide implicit protection.

Architectural Lessons#

When to Apply This Pattern#

The doubled operator rejection pattern should be applied when:

Multiple character sequences share a common prefix (e.g., =, ==, ===)
Sequential matching is used instead of alternatives (manual if let instead of alt())
Silent partial matching would produce valid but incorrect AST
The same token appears in multiple parsing contexts (requires consistent validation across all paths)

Alternative Approaches Considered#

The parser could use nom's alt() combinator, but this was rejected because:

No ordering control: alt() tries parsers in order but doesn't enforce semantic constraints
Less explicit error handling: Harder to distinguish between "no operator found" vs "invalid operator syntax"
Reduced flexibility: Manual checks allow custom validation logic beyond simple token matching

The manual lookahead approach provides explicit control over validation and error reporting at the cost of some code duplication.

Relationship to Parser-Evaluator Architecture#

The libmagic-rs parser follows the Parser-Evaluator Architecture, which consists of:

Preprocessing - Remove comments, handle line continuations
Parsing - Convert lines to AST structures
Hierarchy building - Construct parent-child relationships from indentation
Evaluation - Execute rules against file buffers

The doubled operator rejection pattern operates in the parsing stage, ensuring that only valid operators reach the AST. The Operator enum defines all valid operator variants:

pub enum Operator {
    Equal,
    NotEqual,
    LessThan,
    GreaterThan,
    LessEqual,
    GreaterEqual,
    BitwiseAnd,
    BitwiseAndMask(u64),
    // No variants for && or === because they are invalid
}

By rejecting invalid operator syntax at parse time, the pattern ensures the evaluator never encounters malformed operators in the AST, maintaining clear separation between parsing validation and evaluation logic.

Historical Context#

Evolution of Operator Support#

PR #4 (merged October 2, 2025) introduced the initial parser infrastructure with basic operator parsing, including:

Equality operators (=, ==)
Inequality operators (!=, <>)
Bitwise AND (&)
"Rejects ambiguous or unsupported sequences" through "careful precedence and whitespace handling"

PR #104 (merged March 1, 2026) added comparison operators:

LessThan (<)
GreaterThan (>)
LessEqual (<=)
GreaterEqual (>=)

This PR also included critical error handling fixes that addressed silent misparsing by replacing catch-all error patterns with explicit error variant matching. The doubled operator rejection guards were present from the initial implementation in PR #4, suggesting they were recognized as essential from the start of operator parsing development.

No Specific Bug Report Found#

Research did not find a specific issue or PR explicitly titled "fix doubled operator parsing bug." This suggests the pattern was implemented proactively as a defensive validation technique rather than in response to a discovered bug. The test case explicitly checking === and && indicates the developers anticipated these error patterns during initial implementation.

Usage Examples#

Valid Operator Syntax#

From test cases and builtin rules:

# Equality (both forms accepted)
0 belong 0x7f454c46 ELF 32-bit
0 belong =0x7f454c46 ELF 32-bit
0 belong ==0x7f454c46 ELF 32-bit

# Inequality
>4 byte !=0 Non-zero
>4 byte <>0 Non-zero (alternate syntax)

# Comparison
0 leshort <100 Small value
0 leshort >=1000 Large value

# Bitwise AND
4 byte &0x80 High bit set

# Bitwise AND with mask
0 lelong&0xf0000000 0x10000000 MIPS-II executable

Invalid Operator Syntax (Rejected)#

These inputs are explicitly rejected by the doubled operator guards:

# Triple equals - rejected by === guard
0 long ===0xcafebabe INVALID

# Double ampersand - rejected by && guard 
4 byte &&0x80 INVALID

# Double equals after single = - rejected by == guard
0 long =0x0 Valid
0 long ==0x0 Valid 
0 long ===0x0 INVALID

Enum Extension and Exhaustive Match Synchronization#

When extending the Operator enum, developers must:

Add new variants to src/parser/ast.rs
Update parse_operator token ordering and guards in src/parser/grammar.rs
Modify evaluation logic in evaluator module
Update strength scoring
Update build-time serialization in build.rs and src/build_helpers.rs
Add property test strategies

This synchronization ensures new operators receive proper validation guards and don't introduce silent misparsing bugs.

Parser-Evaluator Architecture#

The separation between parsing and evaluation ensures validation happens at parse time. The doubled operator rejection pattern is a parser responsibility, producing a clean AST that the evaluator can trust contains only valid operators.

Relevant Code Files#

File Path	Description	Key Functions/Types
`src/parser/grammar.rs` (2,448 lines)	Main parser implementation with operator validation	`parse_operator` (lines 173-248), `parse_type_and_operator` (lines 1549-1643), operator test cases (lines 891-899)
`src/parser/ast.rs`	AST type definitions	`Operator` enum (lines 106-117), `MagicRule` struct (lines 175-192)
`src/parser/mod.rs`	Parser module entry point	Three-stage parsing pipeline (lines 191-194)
`src/builtin_rules.magic`	Example magic rules	Valid operator usage patterns

Type System And Operator Coverage - Overview of supported types and operators, token ordering requirements
Enum Extension And Exhaustive Match Synchronization - Pattern for maintaining consistency when adding new operators
Parser-Evaluator Architecture - High-level architecture separating parsing from evaluation
Magic File Format Specification - The domain-specific language syntax being parsed
Nom Parser Combinators - The underlying parsing library used for tokenization