Checked Arithmetic For Buffer Offset Safety#

Checked Arithmetic For Buffer Offset Safety is a memory safety pattern in the libmagic-rs project that prevents integer overflow vulnerabilities in buffer offset calculations during file type detection. The pattern mandates using Rust's safe slice access via .get() with range expressions for all buffer reads in type-read functions, ensuring that offset arithmetic cannot overflow and bypass bounds checks.

This pattern forms a critical component of libmagic-rs's defense-in-depth security architecture, which forbids unsafe code at the workspace level and requires that all library code handle errors gracefully without panicking. When processing potentially malicious file input, unchecked integer arithmetic in offset calculations could allow attackers to craft files that trigger overflow, wrapping offset values to bypass bounds checks and access arbitrary memory. By using .get() with range syntax, the pattern leverages Rust's built-in overflow protection—range construction saturates on overflow, creating invalid ranges that .get() rejects by returning None.

The pattern applies to all type-read functions in src/evaluator/types.rs: read_byte, read_short, read_long, read_quad, and read_string. It represents durable architectural knowledge that must be followed by all current and future type readers to maintain the project's memory safety guarantees.

The Problem: Integer Overflow in Buffer Offset Arithmetic#

Buffer reads in file type detection require calculating end offsets from a starting position and type size. The natural expression offset + type_size poses a security risk when offset approaches the maximum value for usize. In Rust, integer overflow behavior differs between debug and release builds, creating two distinct failure modes that both violate libmagic-rs's safety requirements.

Debug Build Behavior#

In debug builds, Rust checks arithmetic operations for overflow and panics when overflow occurs. For a type-read function, this means:

// Vulnerable pattern (DO NOT USE)
let end_offset = offset + 2; // Panics in debug if offset > usize::MAX - 2
let bytes = buffer.get(offset..end_offset)?;

This violates libmagic-rs's no-panic requirement for library code. A panic converts a predictable error condition into an uncontrolled program termination, preventing graceful error handling and making the library unsuitable for production use in long-running services.

Release Build Behavior#

In release builds (compiled without debug assertions), integer overflow wraps around according to two's complement arithmetic. For unsigned integers, usize::MAX + 1 becomes 0, and usize::MAX + N becomes N - 1. This creates a critical security vulnerability:

// Example: usize is 64-bit, offset is usize::MAX (18,446,744,073,709,551,615)
let offset = usize::MAX;
let end_offset = offset + 2; // Wraps to 1 in release build

// Bounds check incorrectly succeeds because end_offset (1) < buffer.len()
if end_offset <= buffer.len() {
    // Attempt to access buffer[usize::MAX..1] - undefined behavior
    let bytes = &buffer[offset..end_offset];
}

An attacker who controls file content or magic rule parameters can craft inputs with carefully chosen large offsets to trigger this overflow. The wrapped offset bypasses bounds checks, potentially allowing reads from arbitrary memory locations.

Attack Vector#

In libmagic-rs, offsets can be influenced by:

File content: Indirect offsets computed from values read from files
Magic rule parameters: Offset specifications in custom magic database entries
Computed offsets: Results of offset calculations based on previous reads

A malicious actor could create a crafted file or magic rule that sets an offset near usize::MAX, causing subsequent arithmetic to overflow and bypass safety checks.

The Solution: Range-Based Slice Access#

The solution leverages Rust's built-in overflow protection in range construction combined with safe slice indexing via the .get() method. Instead of computing the end offset explicitly and then checking bounds, the pattern constructs a range expression directly and uses .get() to validate it.

Core Pattern#

Type-read functions use one of two patterns depending on implementation needs:

Pattern 1: Direct Range Construction (used by read_short and read_long before PR #133)

pub fn read_short(
    buffer: &[u8],
    offset: usize,
    endian: Endianness,
    signed: bool,
) -> Result<Value, TypeReadError> {
    let bytes = buffer
        .get(offset..offset + 2) // Safe: range saturates on overflow
        .ok_or(TypeReadError::BufferOverrun {
            offset,
            buffer_len: buffer.len(),
        })?;

    // Endianness handling...
}

Pattern 2: Explicit Checked Addition (used by read_short, read_long, and read_quad)

pub fn read_quad(
    buffer: &[u8],
    offset: usize,
    endian: Endianness,
    signed: bool,
) -> Result<Value, TypeReadError> {
    let end = offset.checked_add(8).ok_or(TypeReadError::BufferOverrun {
        offset,
        buffer_len: buffer.len(),
    })?;
    let bytes = buffer
        .get(offset..end)
        .ok_or(TypeReadError::BufferOverrun {
            offset,
            buffer_len: buffer.len(),
        })?;

    // Endianness handling...
}

Both patterns provide equivalent safety guarantees but differ in their approach:

Pattern 1 relies on Rust's implicit range saturation: when offset + N overflows, the range constructor saturates, creating an invalid range that .get() rejects
Pattern 2 uses explicit checked_add() to detect overflow before range construction, making the overflow check visible in the code

Pattern 2 is preferred for clarity—it makes the overflow protection explicit rather than relying on implicit saturation behavior. The explicit check also satisfies the arithmetic_side_effects lint, which warns about potentially overflowing operations.

The key safety properties (shared by both patterns):

Overflow protection: Overflow is detected either through explicit checked_add() or implicit range saturation
Bounds validation: .get() checks whether the range is valid and within buffer bounds, returning None for invalid ranges
Error conversion: The pattern converts None to a structured TypeReadError::BufferOverrun error
No panics: All error conditions return Result types, enabling graceful error handling
No unsafe code: The pattern operates entirely within Rust's safe subset

How Overflow Protection Works#

Both patterns protect against integer overflow, but through different mechanisms:

Implicit Range Saturation (Pattern 1)

Rust's range expressions use saturating arithmetic internally. When constructing offset..offset + N:

If offset + N would overflow, the end value saturates at usize::MAX
This creates a range where start >= end (since offset is also near usize::MAX)
Such ranges are empty by definition and invalid for indexing
.get() recognizes the invalid range and returns None

Explicit Checked Addition (Pattern 2)

The checked_add() method explicitly checks for overflow:

offset.checked_add(N) returns Some(end) if the addition succeeds
Returns None if the addition would overflow
The overflow is caught before range construction
The explicit check makes the safety property visible in code and satisfies linter warnings

Implementation in Type-Read Functions#

The pattern is implemented consistently across all type-read functions in src/evaluator/types.rs, with each function adapted to the specific type size and requirements.

read_byte: Single-Byte Access#

read_byte reads a single byte with optional signed/unsigned interpretation:

pub fn read_byte(buffer: &[u8], offset: usize, signed: bool) -> Result<Value, TypeReadError> {
    buffer
        .get(offset)
        .map(|&byte| {
            if signed {
                Value::Int(i64::from(byte as i8))
            } else {
                Value::Uint(u64::from(byte))
            }
        })
        .ok_or(TypeReadError::BufferOverrun {
            offset,
            buffer_len: buffer.len(),
        })
}

Single-byte access uses .get(offset) rather than a range, as no arithmetic is required. The bounds check is inherent in the .get() method.

read_short: 16-bit Access#

read_short reads two bytes with endianness handling:

pub fn read_short(
    buffer: &[u8],
    offset: usize,
    endian: Endianness,
    signed: bool,
) -> Result<Value, TypeReadError> {
    let end = offset.checked_add(2).ok_or(TypeReadError::BufferOverrun {
        offset,
        buffer_len: buffer.len(),
    })?;
    let bytes = buffer
        .get(offset..end)
        .ok_or(TypeReadError::BufferOverrun {
            offset,
            buffer_len: buffer.len(),
        })?;

    let value = match endian {
        Endianness::Little => LittleEndian::read_u16(bytes),
        Endianness::Big => BigEndian::read_u16(bytes),
        Endianness::Native => NativeEndian::read_u16(bytes),
    };

    if signed {
        Ok(Value::Int(i64::from(value as i16)))
    } else {
        Ok(Value::Uint(u64::from(value)))
    }
}

The explicit checked_add(2) detects overflow before constructing the range, ensuring that offset > usize::MAX - 2 is caught and converted to a BufferOverrun error.

read_long: 32-bit Access#

read_long follows the same pattern for four-byte reads:

pub fn read_long(
    buffer: &[u8],
    offset: usize,
    endian: Endianness,
    signed: bool,
) -> Result<Value, TypeReadError> {
    let end = offset.checked_add(4).ok_or(TypeReadError::BufferOverrun {
        offset,
        buffer_len: buffer.len(),
    })?;
    let bytes = buffer
        .get(offset..end)
        .ok_or(TypeReadError::BufferOverrun {
            offset,
            buffer_len: buffer.len(),
        })?;

    let value = match endian {
        Endianness::Little => LittleEndian::read_u32(bytes),
        Endianness::Big => BigEndian::read_u32(bytes),
        Endianness::Native => NativeEndian::read_u32(bytes),
    };

    if signed {
        Ok(Value::Int(i64::from(value as i32)))
    } else {
        Ok(Value::Uint(u64::from(value)))
    }
}

read_quad: 64-bit Access#

read_quad reads eight bytes for 64-bit integer types:

pub fn read_quad(
    buffer: &[u8],
    offset: usize,
    endian: Endianness,
    signed: bool,
) -> Result<Value, TypeReadError> {
    let end = offset.checked_add(8).ok_or(TypeReadError::BufferOverrun {
        offset,
        buffer_len: buffer.len(),
    })?;
    let bytes = buffer
        .get(offset..end)
        .ok_or(TypeReadError::BufferOverrun {
            offset,
            buffer_len: buffer.len(),
        })?;

    let value = match endian {
        Endianness::Little => LittleEndian::read_u64(bytes),
        Endianness::Big => BigEndian::read_u64(bytes),
        Endianness::Native => NativeEndian::read_u64(bytes),
    };

    if signed {
        #[allow(clippy::cast_possible_wrap)]
        Ok(Value::Int(value as i64))
    } else {
        Ok(Value::Uint(value))
    }
}

The quad reader uses the explicit checked_add pattern consistently with read_short and read_long. This two-stage approach—explicit overflow check followed by bounds-checked slice access—makes the safety properties clear and satisfies the arithmetic_side_effects lint.

read_string: Variable-Length Access#

read_string uses a two-stage approach due to its variable-length nature:

pub fn read_string(
    buffer: &[u8],
    offset: usize,
    max_length: Option<usize>,
) -> Result<Value, TypeReadError> {
    // First: check if offset is within buffer bounds
    if offset >= buffer.len() {
        return Err(TypeReadError::BufferOverrun {
            offset,
            buffer_len: buffer.len(),
        });
    }

    // Second: slice from offset to end (safe because bounds pre-checked)
    let remaining_buffer = &buffer[offset..];

    // Find null terminator within max_length constraint
    let read_length = if let Some(max_len) = max_length {
        let search_len = std::cmp::min(max_len, remaining_buffer.len());
        memchr::memchr(0, &remaining_buffer[..search_len]).unwrap_or(search_len)
    } else {
        memchr::memchr(0, remaining_buffer).unwrap_or(remaining_buffer.len())
    };

    let string_bytes = &remaining_buffer[..read_length];
    let string_value = String::from_utf8_lossy(string_bytes).into_owned();

    Ok(Value::String(string_value))
}

String reading pre-checks the initial offset, then uses safe slicing for the remaining buffer. The memchr crate provides SIMD-accelerated null-terminator scanning within the validated buffer region.

Security Guarantees#

The checked arithmetic pattern provides multiple layers of security assurance, documented in the module-level security documentation:

Integer Overflow Protection#

The pattern prevents integer overflow in offset calculations through range saturation. Overflow attempts create invalid ranges that .get() rejects, converting potential vulnerabilities into predictable error conditions.

Memory Safety Without Unsafe Code#

All buffer access operates within Rust's safe subset. The workspace-level unsafe_code = "forbid" lint enforces this at compile time, making it impossible to bypass safety checks through unsafe blocks.

No-Panic Guarantee#

Error conditions return Result<Value, TypeReadError> rather than panicking. This is enforced by workspace lints: panic = "deny" and unwrap_used = "deny".

Consistent Error Reporting#

All bounds check failures return the same error structure:

TypeReadError::BufferOverrun {
    offset: usize, // The attempted offset
    buffer_len: usize, // The actual buffer length
}

This consistent error type enables uniform error handling throughout the evaluator, where BufferOverrun errors are treated as non-critical and allow evaluation to continue with remaining rules.

Graceful Degradation#

When a type-read function fails due to buffer overrun, the evaluator's error handling strategy treats this as a rule evaluation failure rather than a fatal error. File type detection continues with other rules, allowing the system to extract as much information as possible even when some rules fail.

Project Context: Memory Safety Policy#

The checked arithmetic pattern is one component of libmagic-rs's comprehensive memory safety policy, which establishes multiple defensive layers:

Workspace-Level Safety Enforcement#

The project's Cargo.toml configures workspace-wide lints that enforce safety requirements at compile time:

[workspace.lints.rust]
unsafe_code = "forbid" # Cannot be overridden at any scope
warnings = "deny" # Zero warnings policy

[workspace.lints.clippy]
indexing_slicing = "warn" # Discourages unchecked indexing
arithmetic_side_effects = "warn" # Flags overflow risks
panic = "deny" # Forbids panic calls
unwrap_used = "deny" # Forbids unwrap()

The forbid level is stronger than deny—it cannot be overridden with #[allow(unsafe_code)] at any inner scope, creating a compile-time barrier against introducing unsafe code.

Additional Safety Patterns#

The memory safety policy extends beyond checked arithmetic:

Bounds-checked buffer access: All buffer reads use .get() methods returning Option
Result-based error handling: No unwrap() or expect() in library code
Type safety: Strong typing prevents incorrect interpretations of buffer data
Resource limits: Configurable timeouts and rule count limits prevent denial-of-service attacks

Historical Context#

The earliest comprehensive safety patterns were established in PR #4 (October 2025), which introduced overflow protection in parser functions and safe buffer helpers. The type-read functions have maintained these safety properties throughout the project's evolution, with the pattern codified as architectural knowledge for all future development.

Future Applicability#

The checked arithmetic pattern represents a durable architectural requirement that must be followed by all type readers. The current TypeKind enum supports Byte, Short, Long, Quad, and String types, with future extensions requiring additional readers.

Planned Type Readers#

Future type readers must follow the same pattern:

Floating-point (float): Would use checked_add(4) for IEEE 754 single precision
Double-precision (double): Would use checked_add(8) for IEEE 754 double precision
Date types: Variable size depending on format, would follow read_string pattern

Pattern Requirements#

Any new type reader must:

Use explicit checked_add() for offset arithmetic (preferred) or rely on implicit range saturation
Use .get() with range expressions for multi-byte access
Return TypeReadError::BufferOverrun on bounds check failure
Return Result<Value, TypeReadError> for error handling
Include documentation of security guarantees
Avoid unwrap(), expect(), or panic calls

The explicit checked_add() pattern (Pattern 2) is preferred for clarity and linter compatibility, as demonstrated by read_quad, read_short, and read_long.

Enforcement#

The pattern is enforced through:

Code review: All type reader changes require review for safety properties
Linting: Workspace lints catch common violations (indexing_slicing, arithmetic_side_effects)
Testing: Unit tests verify overflow behavior and error handling
Documentation: Architectural knowledge base articles like this one

The workspace-level unsafe_code = "forbid" policy ensures that no future contributor can bypass these safety requirements through unsafe code.

Relevant Code Files#

File	Purpose	Key Elements
`src/evaluator/types.rs`	Type-read functions	`read_byte`, `read_short`, `read_long`, `read_quad`, `read_string`, `TypeReadError` enum
`Cargo.toml` (lines 32-140)	Workspace lint enforcement	`unsafe_code = "forbid"`, `indexing_slicing`, `arithmetic_side_effects`, `panic = "deny"`, `unwrap_used = "deny"`

Memory Safety And Unsafe Code Policy - Comprehensive memory safety requirements and enforcement
Evaluation Configuration And Resource Limits - Error handling strategy and graceful degradation
Overflow protection in parser functions (PR #4) - Historical context for safety pattern adoption