Documents
Pstring Max_Length Bounds Validation Pattern
Pstring Max_Length Bounds Validation Pattern
Type
Topic
Status
Published
Created
Mar 9, 2026
Updated
Mar 9, 2026
Created by
Dosu Bot
Updated by
Dosu Bot

Pstring Max_Length Bounds Validation Pattern#

Lead Section#

The Pstring Max_Length Bounds Validation Pattern is a deliberate implementation design in libmagic-rs that validates Pascal-style length-prefixed string (pstring) bounds against a capped length constraint rather than the raw length byte value. This pattern ensures memory-safe buffer access while maintaining compatibility with GNU file's behavior for handling truncated or malformed data.

The core principle: when a max_length parameter is specified, bounds checking validates against min(length_byte, max_length) rather than the raw length byte. This allows magic rules to succeed even when the length byte indicates more data than actually exists in the buffer, provided the capped portion is available. For example, if a length byte claims 10 bytes but max_length caps to 3 and 3+ bytes are available, the read succeeds and returns the first 3 bytes.

This design matches GNU libmagic's behavior and enables file type detection on truncated files, partially downloaded data, and files where length fields may reference more data than exists. The pattern represents reusable architectural knowledge for implementing robust string type handling in file format parsers.

Background: Pascal Strings (pstring)#

A PString (Pascal string) is a length-prefixed string format where the first byte contains the string length (0-255), followed by that many bytes of string data. Unlike C-style null-terminated strings, Pascal strings store length explicitly and can contain null bytes within the string data.

Structure:

  • Length byte: 1 byte indicating string length (0-255)
  • String data: The number of bytes specified by the length byte

Example magic file syntax:

0 pstring =JPEG JPEG image (Pascal string)

The max_length parameter is an optional constraint that caps the maximum bytes read from the string data, regardless of what the length byte claims.

The Validation Pattern#

Design Principle#

The implementation validates pstring bounds using the minimum of two values:

  • length_byte: The value read from the first byte (0-255)
  • max_length: An optional parameter capping the maximum string read
let actual_length = if let Some(max_len) = max_length {
    std::cmp::min(string_length, max_len)
} else {
    string_length
};

Rationale#

The code includes an explicit NOTE comment explaining this behavior:

"We intentionally validate bounds against actual_length (after capping), not against the raw length byte. This matches GNU file's behavior: if the length byte claims 10 bytes but max_length caps to 3 and 3+ bytes exist, the read succeeds. Validating against the raw length byte would reject valid magic rules where max_length is used precisely to handle truncated data."

This design handles real-world scenarios where:

  1. Files may be truncated or partially downloaded
  2. Length bytes may reference more data than actually exists
  3. Magic rules use max_length as a safety mechanism for potentially malicious or malformed length fields
  4. Format detection should succeed on incomplete but valid-enough data

Implementation Details#

The read_pstring() function in src/evaluator/types/string.rs follows this validation sequence:

  1. Read length byte: buffer.get(offset) reads the first byte at the given offset, returning TypeReadError::BufferOverrun if out of bounds

  2. Calculate capped length: Apply min(length_byte, max_length) to get actual_length

  3. Validate string data bounds: Use checked arithmetic to verify offset + 1 + actual_length ≤ buffer.len()

  4. Extract and convert: Safe slicing &buffer[string_start..string_end] followed by String::from_utf8_lossy() to handle invalid UTF-8

Behavioral Examples#

Example 1: Truncated file with max_length

  • Length byte: 10
  • max_length: 5
  • Available buffer: 5 bytes after length byte
  • Result: Success - reads and returns 5 bytes

This is validated by the test case test_read_pstring_max_length_caps_when_buffer_short.

Example 2: Truncated file without max_length

  • Length byte: 10
  • max_length: None
  • Available buffer: 3 bytes after length byte
  • Result: Failure - returns TypeReadError::BufferOverrun

This is validated by the test case test_read_pstring_buffer_overrun_length_exceeds_data.

Example 3: Well-formed string

  • Length byte: 5
  • max_length: 10
  • Available buffer: 10 bytes after length byte
  • Result: Success - reads and returns 5 bytes (capped by length byte)

GNU File Compatibility#

When a pstring length byte exceeds max_length, libmagic-rs truncates to max_length rather than returning an error. This truncation strategy matches GNU file's behavior for handling:

  • Corrupted or truncated files: Files damaged in transit or storage
  • Partially downloaded files: Network transfers interrupted mid-stream
  • Malicious length fields: Files with intentionally oversized length claims
  • Format probing with incomplete data: Detecting file types from file headers alone

By validating against the capped length, libmagic-rs maintains the same detection capabilities as GNU file on real-world imperfect data.

Architectural Context#

Type System Integration#

PString is part of libmagic-rs's broader type system, which includes:

All type implementations share the same TypeReadError enum with BufferOverrun and UnsupportedType variants.

Reusable Safety Pattern#

The pattern of using checked arithmetic (checked_add()) for all offset calculations is consistent across all type implementations. This prevents integer overflow vulnerabilities when computing buffer ranges:

let end = offset.checked_add(SIZE).ok_or(TypeReadError::BufferOverrun { ... })?;
let bytes = buffer.get(offset..end).ok_or(TypeReadError::BufferOverrun { ... })?;

PString applies this same pattern, adapted for variable-length data.

Multi-Byte Length Prefix Extensions#

Libmagic-rs extends pstring support beyond the basic 1-byte length prefix to achieve full GNU libmagic compatibility:

Each variant applies the same validation pattern: bounds checking validates offset + prefix_width + min(length_value, max_length) ≤ buffer.len(), where prefix_width is 1, 2, or 4 bytes respectively.

Testing and Validation#

The implementation includes comprehensive test coverage ensuring:

  • Edge cases: Empty strings (length byte = 0), maximum length (255), truncated files
  • Boundary conditions: File too short, exact fit, extra data
  • Overflow protection: test_read_pstring_offset_overflow verifies offset=usize::MAX is caught by checked_add
  • UTF-8 handling: Valid sequences, invalid bytes, mixed content with lossy conversion
  • Operator behavior: All comparison operators (=, !=, <, >, <=, >=) across different string types

Usage Example#

In a magic file, a pstring rule with max_length might look like:

# Match JPEG files by reading Pascal-style length-prefixed marker
0 pstring/10 =JFIF JPEG image with JFIF header

The /10 suffix specifies max_length=10, meaning:

  • Read the length byte at offset 0
  • Cap the read to minimum of (length_byte, 10)
  • If at least that many bytes are available, read them and compare to "JFIF"
  • This succeeds even if the length byte claims more than 10 bytes

Design Trade-offs#

Benefits#

  1. Robustness: Handles real-world truncated and malformed files gracefully
  2. Compatibility: Matches GNU file behavior for established magic rule semantics
  3. Security: max_length provides a cap on potentially malicious length fields
  4. Memory safety: Validated against capped length prevents buffer overruns

Considerations#

  1. Semantic ambiguity: A successful read doesn't guarantee the length byte was accurate
  2. Partial data matching: Magic rules may match on incomplete strings
  3. Testing complexity: Requires careful test design to validate capped vs. raw length behavior
  4. Documentation burden: The pattern is non-obvious and requires explicit explanation

Relevant Code Files#

FileDescriptionURL
src/evaluator/types/string.rsImplementation of read_pstring() with max_length bounds validationView
src/evaluator/types/mod.rsType system module with shared TypeReadError enumView
src/evaluator/types/numeric.rsNumeric type implementations demonstrating reusable checked arithmetic patternView
src/parser/ast.rsAST definition of TypeKind::PString with optional max_length fieldView

See Also#