Documents
Type Signedness Defaults And Unsigned Type Variants
Type Signedness Defaults And Unsigned Type Variants
Type
Topic
Status
Published
Created
Mar 1, 2026
Updated
Mar 2, 2026
Created by
Dosu Bot
Updated by
Dosu Bot

Type Signedness Defaults And Unsigned Type Variants#

The libmagic-rs type system implements explicit signedness control through TypeKind enum variants with signed: bool fields. This design provides libmagic-compatible type interpretation for multi-byte integers (Short, Long, and Quad types), enabling correct detection of file formats whose magic numbers have high bits set. The parser recognizes endian-prefixed type names (beshort, leshort, belong, lelong, bequad, lequad) and maps them to TypeKind variants with appropriate endianness and signedness settings.

All TypeKind variants (Byte, Short, Long, Quad) include explicit signed: bool fields that control whether values are interpreted via signed casts (i8/i16/i32/i64) or unsigned conversions (u8/u16/u32/u64). The read_byte function returns Value::Uint(u64) for unsigned bytes and Value::Int(i64) for signed bytes. This distinction matters for formats like JPEG (0xffd8) and PNG (0x89504e47), where unsigned interpretation prevents misreading high-bit magic numbers as negative values.

The signedness field cascades through multiple subsystems: type reading functions in src/evaluator/types.rs, strength scoring in src/evaluator/strength.rs, serialization in build.rs and src/build_helpers.rs, and property test strategies in tests/property_tests.rs. Rust's exhaustive pattern matching ensures all code paths handle signedness consistently.

TypeKind Enum Structure#

The TypeKind enum in src/parser/ast.rs defines five data type variants for interpreting bytes in magic rules:

pub enum TypeKind {
    Byte {
        signed: bool,
    },
    Short {
        endian: Endianness,
        signed: bool,
    },
    Long {
        endian: Endianness,
        signed: bool,
    },
    Quad {
        endian: Endianness,
        signed: bool,
    },
    String {
        max_length: Option<usize>,
    },
}

Variant Details#

Byte: Single 8-bit byte with signed: bool field. When signed: true, values are cast to i8 and returned as Value::Int(i64). When signed: false, values remain u8 and return as Value::Uint(u64).

Short: 16-bit integer with endian: Endianness and signed: bool fields. When signed: true, values are cast to i16 and returned as Value::Int(i64). When signed: false, values remain u16 and return as Value::Uint(u64).

Long: 32-bit integer with endian: Endianness and signed: bool fields. Similar to Short, signed: true triggers casting to i32 and returns Value::Int(i64), while signed: false uses u32 and returns Value::Uint(u64).

Quad: 64-bit integer with endian: Endianness and signed: bool fields. When signed: true, values are cast to i64 and returned as Value::Int(i64). When signed: false, values remain u64 and return as Value::Uint(u64).

String: Variable-length string with optional max_length constraint.

The Endianness enum provides three byte order options: Little (LSB first), Big (MSB first), and Native (system-dependent).

Parser Grammar and Type Name Mapping#

The parser in src/parser/grammar.rs recognizes eight type name keywords using nom's alt() and tag() combinators. Type names are listed from longest to shortest to prevent premature matching of prefixes:

let (input, type_name) = alt((
    tag("lelong"),
    tag("belong"),
    tag("leshort"),
    tag("beshort"),
    tag("long"),
    tag("short"),
    tag("byte"),
    tag("string"),
))
.parse(input)?;

Type Name to TypeKind Mapping#

The parser maps type names to TypeKind variants with the following signedness and endianness patterns:

Type NameTypeKind VariantEndiannessSignedness
byteTypeKind::ByteN/Asigned: true
ubyteTypeKind::ByteN/Asigned: false
shortTypeKind::ShortNativesigned: true
ushortTypeKind::ShortNativesigned: false
leshortTypeKind::ShortLittlesigned: true
uleshortTypeKind::ShortLittlesigned: false
beshortTypeKind::ShortBigsigned: true
ubeshortTypeKind::ShortBigsigned: false
longTypeKind::LongNativesigned: true
ulongTypeKind::LongNativesigned: false
lelongTypeKind::LongLittlesigned: true
ulelongTypeKind::LongLittlesigned: false
belongTypeKind::LongBigsigned: true
ubelongTypeKind::LongBigsigned: false
quadTypeKind::QuadNativesigned: true
uquadTypeKind::QuadNativesigned: false
lequadTypeKind::QuadLittlesigned: true
ulequadTypeKind::QuadLittlesigned: false
bequadTypeKind::QuadBigsigned: true
ubequadTypeKind::QuadBigsigned: false
stringTypeKind::StringN/AN/A

Following libmagic conventions, unprefixed type names (byte, short, long, quad, beshort, belong, leshort, lelong, bequad, lequad) default to signed interpretation. The u-prefixed variants (ubyte, ushort, ulong, uquad, ubeshort, ubelong, uleshort, ulelong, ubequad, ulequad) explicitly request unsigned interpretation.

Type Reading and Signed/Unsigned Conversion#

The read_typed_value function in src/evaluator/types.rs dispatches to specialized reading functions based on TypeKind:

match type_kind {
    TypeKind::Byte { signed } => read_byte(buffer, offset, *signed),
    TypeKind::Short { endian, signed } => read_short(buffer, offset, *endian, *signed),
    TypeKind::Long { endian, signed } => read_long(buffer, offset, *endian, *signed),
    TypeKind::Quad { endian, signed } => read_quad(buffer, offset, *endian, *signed),
    TypeKind::String { max_length } => read_string(buffer, offset, *max_length),
}

Signedness Conversion Pattern#

All four integer reading functions (read_byte, read_short, read_long, read_quad) follow an identical pattern:

  1. Read raw bytes as unsigned (u8, u16, u32, or u64) using the appropriate endianness
  2. If signed == false: convert to u64 and return Value::Uint(u64)
  3. If signed == true: cast to signed type (value as i8 / value as i16 / value as i32 / value as i64), convert to i64, and return Value::Int(i64)

The casting approach reinterprets bit patterns as signed integers using two's complement. For example:

Return Value Types#

Functions return variants of the Value enum:

  • read_byte: Returns Value::Int(i64) if signed: true, otherwise Value::Uint(u64)
  • read_short: Returns Value::Int(i64) if signed: true, otherwise Value::Uint(u64)
  • read_long: Returns Value::Int(i64) if signed: true, otherwise Value::Uint(u64)
  • read_quad: Returns Value::Int(i64) if signed: true, otherwise Value::Uint(u64)
  • read_string: Returns Value::String(String)

Built-in Magic Rules and High-Bit Magic Numbers#

The src/builtin_rules.magic file contains detection rules for common file formats. Several formats require unsigned type interpretation because their magic numbers have high bits set, which would be misread as negative values with signed interpretation.

JPEG Detection#

Line 47:

0 ubeshort 0xffd8 JPEG image data

Uses ubeshort (big-endian unsigned short) to match the JPEG start-of-image marker 0xFFD8. Both bytes have high bits set (0xFF = 255 decimal), requiring unsigned interpretation.

PNG Detection#

Line 50:

0 ubelong 0x89504e47 PNG image data

Uses ubelong (big-endian unsigned long) to match the PNG signature 0x89504E47 (byte sequence 0x89 'P' 'N' 'G'). The first byte 0x89 (137 decimal) exceeds the signed byte maximum of 127, making unsigned interpretation essential.

GZIP Detection#

Line 40:

0 beshort 0x1f8b gzip compressed data

The GZIP magic 0x1F8B has bit 7 set in the second byte (0x8B = 139 decimal), requiring unsigned handling.

Magic rule entries follow the structure: offset type value message, where hierarchical rules use > prefix for nesting.

Cascading Implementation Impacts#

The signed: bool field in all TypeKind variants (Byte, Short, Long, Quad) requires exhaustive pattern match updates across multiple modules. Rust's compiler enforces this through exhaustiveness checking, preventing runtime failures.

Core Subsystems#

AST Definition: src/parser/ast.rs lines 80-104 defines the TypeKind enum with signed fields in all integer variants (Byte, Short, Long, Quad).

Parser Grammar: src/parser/grammar.rs lines 1460-1488 maps type name strings to TypeKind variants with explicit signed values.

Type Reading: src/evaluator/types.rs lines 122-209 implements read_byte, read_short, read_long, and read_quad with signed parameter handling.

Strength Scoring: src/evaluator/strength.rs lines 72-86 matches on TypeKind variants (signedness doesn't affect score, but pattern must match structure). Quad types receive a base strength of 16, the highest among integer types.

Build System Duplication#

The most critical synchronization requirement involves code generation:

Both functions must be updated identically because Rust build scripts cannot import from the crate being built. The duplication exists to enable comprehensive testing of build logic through the #[cfg(any(test, doc))] conditionally-compiled module. Both serialization functions handle all four integer types (Byte, Short, Long, Quad).

Example serialization output:

TypeKind::Byte { signed: false }
TypeKind::Short { endian: Endianness::Big, signed: false }
TypeKind::Long { endian: Endianness::Little, signed: true }
TypeKind::Quad { endian: Endianness::Big, signed: false }

Testing Infrastructure#

Property Tests: tests/property_tests.rs lines 28-54 contains the arb_type_kind() strategy that generates all four integer type variants (Byte, Short, Long, Quad) with any::<bool>() for signedness, producing all combinations of endianness and signedness.

Unit Tests: src/evaluator/types.rs contains dedicated tests comparing signed vs unsigned interpretation (e.g., 0xFFFF as 65,535 unsigned vs -1 signed). Quad type tests verify reading of 64-bit values including values above i64::MAX.

Build Helper Tests: src/build_helpers.rs lines 439-467 validates serialization for Byte, Short, Long, and Quad variants.

Design Rationale and Compatibility#

libmagic Compatibility Requirements#

The signedness control design addresses several libmagic compatibility challenges:

High-bit magic numbers: File formats like JPEG (0xFFD8), PNG (0x89504E47), and GZIP (0x1F8B) use magic numbers with high bits set. Signed interpretation would incorrectly read these as negative values, breaking detection.

Two's complement interpretation: The casting pattern (value as i16 / value as i32) correctly handles two's complement representation, where 0xFFFF becomes -1 for signed shorts and 65,535 for unsigned shorts.

Cross-type comparison safety: The operator evaluation system uses i128 intermediate values to safely compare mixed Value::Int and Value::Uint operands without overflow.

Current Implementation Status#

The parser supports both signed and unsigned type variants through u-prefixed type names (ubyte, ushort, ubeshort, ulong, ubelong, ulelong, uleshort, uquad, ubequad, ulequad). Unprefixed type names default to signed interpretation following libmagic conventions. The built-in rules use unsigned types (ubeshort, ubelong) for file formats with high-bit magic numbers like JPEG and PNG.

Memory Safety and Type Safety#

Safe buffer access: All reading functions use .get() for bounds checking instead of direct indexing, preventing panics.

Exhaustive matching: Rust's compiler enforces exhaustive pattern matching on TypeKind variants, ensuring all code paths handle signedness fields.

Type preservation: The Value enum preserves signedness through the evaluation pipeline, with Value::Int(i64) for signed values and Value::Uint(u64) for unsigned values.

Relevant Code Files#

FilePurposeKey Content
src/parser/ast.rsAST definitionsTypeKind enum with signed: bool fields in all integer variants (Byte, Short, Long, Quad)
src/parser/grammar.rsParser implementationType name parsing and mapping to TypeKind variants
src/evaluator/types.rsType reading functionsread_byte, read_short, read_long, read_quad with signedness handling
src/evaluator/operators.rsOperator evaluationCross-type integer coercion via i128
src/evaluator/strength.rsMatch scoringTypeKind pattern matching for strength calculation
build.rsBuild scriptserialize_type_kind code generation function
src/build_helpers.rsBuild testingDuplicate serialize_type_kind for test coverage
tests/property_tests.rsProperty testingarb_type_kind() strategy with signedness generation
src/builtin_rules.magicMagic rulesDetection rules for JPEG, PNG, GZIP, and other formats

See Also#