Documents
Date And Timestamp Type Support
Date And Timestamp Type Support
Type
Topic
Status
Published
Created
Mar 7, 2026
Updated
Mar 8, 2026
Created by
Dosu Bot
Updated by
Dosu Bot

Date and Timestamp Type Support#

Status: Implemented in v0.5.0+

Date and timestamp type support extends the libmagic-rs type system to enable file format detection based on embedded temporal data. The implementation adds 14 magic file keywords supporting both 32-bit (Date) and 64-bit (QDate) Unix timestamps with configurable endianness (native, big-endian, little-endian, and middle-endian) and timezone semantics (UTC vs local time). This feature allows magic rules to match files based on creation dates, modification timestamps, and other temporal metadata commonly found in archive formats, filesystem images, and binary executables.

Implemented in Issue #41, the implementation integrates with the existing type system architecture, following patterns established by 32-bit Long and 64-bit Quad integer types. Date values are read as Unix timestamps (seconds since January 1, 1970 UTC) and formatted as human-readable strings using the chrono crate, matching GNU file command output format. The implementation is part of the broader Type System Expansion initiative to achieve full libmagic compatibility.

Type Keywords#

The implementation supports 14 date/timestamp type keywords organized into two bit-width families. These keywords follow a systematic naming convention that encodes byte order (endianness) and timezone interpretation directly into the type name.

32-bit Date Types#

Eight keywords represent 4-byte Unix timestamps (Date variant):

KeywordEndiannessTimezoneDescription
dateNativeUTCSystem byte order, Coordinated Universal Time
ldateNativeLocalSystem byte order, local timezone
bedateBigUTCBig-endian, UTC
beldateBigLocalBig-endian, local timezone
ledateLittleUTCLittle-endian, UTC
leldateLittleLocalLittle-endian, local timezone
medateMiddleUTCMiddle-endian (PDP-11), UTC
meldateMiddleLocalMiddle-endian (PDP-11), local timezone

64-bit Date Types#

Six keywords represent 8-byte Unix timestamps (QDate variant). 64-bit date types support only native, big-endian, and little-endian byte orders (no middle-endian):

KeywordEndiannessTimezoneDescription
qdateNativeUTCSystem byte order, UTC
qldateNativeLocalSystem byte order, local timezone
beqdateBigUTCBig-endian, UTC
beqldateBigLocalBig-endian, local timezone
leqdateLittleUTCLittle-endian, UTC
leqldateLittleLocalLittle-endian, local timezone

Naming Convention#

The keyword naming follows a systematic pattern:

  • Base type: date (32-bit) or qdate (64-bit, "quad date")
  • Endian prefix: be (big-endian), le (little-endian), me (middle-endian for 32-bit only), or none (native)
  • Timezone marker: l before date indicates local time; absence means UTC

For example, beldate decodes as: big-endian (be) + local time (l) + 32-bit date (date).

Type System Architecture#

TypeKind Enum Extension#

The TypeKind enum in src/parser/ast.rs includes two new variants. The design follows the pattern of existing Long and Quad integer types, which include endianness and signedness fields:

pub enum TypeKind {
    // ... existing variants (Byte, Short, Long, Quad, Float, Double, String) ...

    /// 32-bit Unix timestamp
    Date {
        /// Byte order
        endian: Endianness,
        /// true = UTC, false = local time
        utc: bool,
    },

    /// 64-bit Unix timestamp
    QDate {
        /// Byte order
        endian: Endianness,
        /// true = UTC, false = local time
        utc: bool,
    },
}

The structure mirrors integer types but uses the utc: bool field (where true indicates UTC, false indicates local time) instead of signed: bool to indicate timezone interpretation.

Endianness Enum Extension#

The implementation adds a Middle variant to the existing Endianness enum to support PDP-11 byte order. The enum now has four variants: Little, Big, Native, and Middle.

pub enum Endianness {
    Little, // Little-endian byte order
    Big, // Big-endian byte order
    Native, // Native system byte order
    Middle, // Middle-endian (PDP-11) byte order
}

Note: The implementation in PR #165 does not yet include middle-endian support for Date/QDate types. Middle-endian is planned for 32-bit Date types but is not included in the current implementation.

Timezone Semantics#

Timezone handling is explicitly encoded in type names via the l suffix. This design allows magic rules to explicitly specify the expected timezone for timestamp comparisons, critical for correctly matching files with region-specific temporal metadata.

UTC Variants#

Base types without the l suffix interpret timestamps as seconds since Unix epoch in Coordinated Universal Time. UTC variants include: date, bedate, ledate, medate, qdate, beqdate, leqdate.

The evaluator processes these types using chrono::Utc::timestamp_opt() for conversion from Unix timestamps to formatted dates.

Local Time Variants#

Types with l suffix before date interpret timestamps as seconds since epoch in the system's local timezone, requiring timezone conversion during evaluation. Local variants include: ldate, beldate, leldate, meldate, qldate, beqldate, leqldate.

The evaluator processes these types using chrono::Local::timestamp_opt() for conversion, which applies the system's configured timezone offset.

Endianness Handling#

Date types support four byte order variants for cross-platform file format compatibility.

Native Endianness#

System-dependent byte order matches the target architecture's native representation. Keywords: date, ldate, qdate, qldate. Native endianness enables efficient processing when the file's byte order matches the host system.

Big-Endian#

Most significant byte first. Common in network protocols (TCP/IP headers), Mac OS X file formats, and many archive formats (TAR, CPIO). Keywords: bedate, beldate, beqdate, beqldate.

Little-Endian#

Least significant byte first. Common in x86/x64 architectures, Windows binaries (PE format), and Intel-based file formats. Keywords: ledate, leldate, leqdate, leqldate.

Middle-Endian (PDP-11)#

ARM/PDP-11 byte order where bytes are swapped in pairs within 32-bit words. For a 32-bit value 0xAABBCCDD, bytes are stored as [0xBB, 0xAA, 0xDD, 0xCC]. Keywords: medate, meldate.

Middle-endian is planned for 32-bit dates but not yet implemented in the current version. 64-bit date types do not include middle-endian variants.

The evaluator uses the byteorder crate for byte order conversion, implementing LittleEndian::read_u32(), BigEndian::read_u32(), and NativeEndian::read_u32(). Middle-endian support requires implementing a custom read_middle_endian_u32() helper function since the byteorder crate does not provide this variant.

Implementation#

Parser Integration#

In src/parser/types.rs, the parse_type_keyword() function's alt combinator recognizes all 14 date type specifiers. Following the existing pattern of ordering longer prefixes before shorter ones, the implementation includes:

// Date type family (longest to shortest to prevent prefix matching)
alt((
    tag("beqldate"), // 8 chars
    tag("leqldate"), // 8 chars
    tag("beqdate"), // 7 chars
    tag("leqdate"), // 7 chars
    tag("beldate"), // 7 chars
    tag("leldate"), // 7 chars
    tag("qldate"), // 6 chars
    tag("bedate"), // 6 chars
    tag("ldate"), // 5 chars
    tag("ledate"), // 6 chars
    tag("qdate"), // 5 chars
    tag("date"), // 4 chars
))

The type_keyword_to_kind() mapping function includes 14 new match arms to convert keyword strings to TypeKind variants with appropriate endianness and timezone settings.

Evaluator Type Reading#

Following the pattern established by read_long() and read_quad() in src/evaluator/types/numeric.rs, the implementation adds new functions in src/evaluator/types/date.rs:

read_date() - 32-bit timestamp reading:

The function uses offset.checked_add(4) for overflow-safe range construction and buffer.get(offset..end) for bounds-checked access. The implementation:

  1. Reads 4 bytes from the buffer at the specified offset
  2. Dispatches on endianness to convert bytes to u32 (Little/Big/Native)
  3. Casts u32 to i64 for signed timestamp interpretation
  4. Converts to DateTime using chrono::Utc::timestamp_opt() or chrono::Local::timestamp_opt()
  5. Formats using "%a %b %e %H:%M:%S %Y" (e.g., "Wed Feb 14 12:34:56 2026")
  6. Returns Value::String with the formatted date

read_qdate() - 64-bit timestamp reading:

The function follows the same pattern but reads 8 bytes and interprets the value as i64.

The read_typed_value() dispatcher in src/evaluator/types/mod.rs includes match arms for TypeKind::Date and TypeKind::QDate.

Chrono Integration#

The implementation uses chrono = "0.4.41" with features ["std", "clock"] as added to Cargo.toml. The chrono library provides Unix timestamp to DateTime conversion, UTC and local timezone handling, and strftime-compatible date formatting.

Value Representation#

String Representation#

Date values are stored as Value::String with formatted dates rather than numeric timestamps. This design simplifies the implementation and enables existing string comparison operators (=, !=, <, >) to work on formatted dates without modification.

Output Format#

Timestamps are formatted using "%a %b %e %H:%M:%S %Y", matching GNU file command output. Example formatted values:

  • "Wed Feb 14 12:34:56 2026" - Future date
  • "Thu Jan 1 00:00:00 1970" - Unix epoch (timestamp 0)
  • "Tue Jan 19 03:14:07 2038" - 32-bit timestamp overflow boundary (2147483647)

The format breaks down as:

  • %a - Abbreviated weekday name (Mon, Tue, Wed, etc.)
  • %b - Abbreviated month name (Jan, Feb, Mar, etc.)
  • %e - Day of month, space-padded (1-31)
  • %H:%M:%S - 24-hour time
  • %Y - Four-digit year

Alternative Considerations#

The design notes mention considering an optional Value::Timestamp(i64) variant to preserve numeric timestamp values for numeric comparisons, or storing both timestamp and formatted string. For the current implementation, string representation is sufficient for equality checks and lexicographic comparisons. Future enhancements could add numeric comparison support if use cases require range-based timestamp matching.

Strength Scoring#

Strength scores prioritize more specific types for accurate file detection. Date types follow the bit-width pattern established by existing numeric types:

  • Date (32-bit): Strength score 15 (same as Long and Float)
  • QDate (64-bit): Strength score 16 (same as Quad and Double)

The scoring system assigns:

  • String types: 20-25 points (most specific)
  • 64-bit types (Quad/Double/QDate): 16 points
  • 32-bit types (Long/Float/Date): 15 points
  • 16-bit types (Short): 10 points
  • 8-bit types (Byte): 5 points

Timezone semantics (UTC vs local) do not affect strength scores, as they represent the same underlying information with different interpretations. Similarly, endianness variants receive the same strength score since byte order does not change type specificity.

Type System Integration#

Date types integrate with the existing Parser-Evaluator Architecture following established patterns:

  1. AST Layer: New TypeKind variants with endianness and timezone fields
  2. Parser Layer: Recognition of 14 keywords via nested alt() combinators with ordering conventions
  3. Evaluator Layer: Type reading functions returning formatted Value::String
  4. Strength Scoring: Assignment of 15-16 scores based on bit width
  5. Operator Compatibility: String comparison operators work on formatted timestamps

The implementation follows Enum Extension and Exhaustive Match Synchronization patterns, requiring synchronized updates across multiple files:

  • AST definition (src/parser/ast.rs) - TypeKind enum and Display trait
  • Parser grammar (src/parser/types.rs) - Keyword recognition and mapping
  • Operator evaluation (src/evaluator/types/mod.rs) - Type reading dispatch
  • Strength scoring (src/evaluator/strength.rs) - Score assignment
  • Build serialization (src/parser/codegen.rs) - Codegen support
  • Test coverage (src/evaluator/types/tests.rs, src/parser/ast.rs) - Test case generation

Testing Requirements#

Comprehensive test coverage includes:

Parser Tests

  • All 14 date type specifiers recognized correctly
  • Keyword ordering prevents prefix matching issues
  • Error handling for malformed type specifications

Endianness Tests

  • Each byte order option (Native/Little/Big) produces correct values
  • Endianness combinations with timezone variants

Timezone Tests

  • UTC vs local time formatting differs appropriately
  • System timezone configuration affects local time output
  • Timezone boundary conditions (DST transitions, UTC offsets)

Buffer Safety Tests

  • Buffer overrun error handling for out-of-bounds reads
  • Offset overflow detection via checked_add()
  • Partial buffer reads near buffer end

Known Timestamps

  • Unix epoch (0) formats as "Thu Jan 1 00:00:00 1970"
  • Y2K timestamp (946684800) formats correctly
  • 32-bit overflow (2147483647) formats as "Tue Jan 19 03:14:07 2038"

Edge Cases

  • Negative timestamps (pre-1970 dates)
  • Far future dates beyond typical timestamp ranges
  • Leap seconds and other calendar anomalies

Integration Tests

  • Magic rules with date types match real file formats
  • TAR archive timestamp detection
  • ELF binary build timestamp matching

Use Cases#

Date and timestamp types enable file format detection based on temporal metadata embedded in binary file formats.

Archive Formats#

Archive formats commonly embed creation and modification timestamps in file headers:

  • TAR archives: Each file entry includes a 12-byte octal modification time field at offset 136
  • ZIP files: MSDOS-format creation/modification dates in central directory entries
  • CPIO archives: Device-independent archive format with ASCII or binary timestamp encodings

Magic rules using date types can validate timestamp consistency, detect corrupted archives with invalid dates, or match archives from specific time periods.

Filesystem Images#

Filesystem metadata structures include temporal information:

  • Ext2/3/4 superblocks: Creation time, last mount time, last write time at fixed offsets
  • NTFS filesystem timestamps: FILETIME structures with 100-nanosecond precision
  • ISO 9660 creation dates: ASCII-encoded date/time descriptors in volume descriptors

Date types enable detection of filesystem images and validation of superblock metadata integrity.

Binary Executables#

Compiled binaries often include build timestamps:

  • ELF binaries: Build timestamps in .note sections or custom headers
  • PE (Windows) executables: TimeDateStamp field in COFF header at offset 8
  • Mach-O binaries: Load command timestamps in segment headers

These timestamps help identify compiler versions, detect tampered binaries, or match executables built in specific time ranges.

Document Formats#

Document formats embed creation and modification metadata:

  • PDF documents: CreationDate and ModDate in document information dictionary
  • Office documents: OLE property sets with creation/modification timestamps
  • Database files: SQLite header modification counter and version-valid-for number

Date type matching enables document provenance verification and version identification.

  • Type System and Operator Coverage - The broader type system including integer, float, and string types with operator support
  • Parser-Evaluator Architecture - The three-layer design pattern for type system extension
  • Enum Extension and Exhaustive Match Synchronization - Patterns for adding new TypeKind variants across multiple modules
  • Endianness Handling - Byte order conversion patterns used across all multi-byte types
  • Strength-Based Rule Prioritization - How type specificity affects file detection accuracy and rule ordering

Implementation Status#

AspectStatus
Current StateImplemented in v0.5.0+
TrackingIssue #41 (Resolved)
Target Releasev0.5.0+
Dependencieschrono 0.4.41 with features ["std", "clock"]
BlockersNone

Relevant Code Files#

FilePurposeStatus
src/parser/ast.rsTypeKind enum - includes Date/QDate variantsUpdated
src/parser/types.rsType keyword parsing and mappingUpdated
src/evaluator/types/mod.rsType reading dispatcherUpdated
src/evaluator/types/date.rsDate type reading functions (read_date, read_qdate)Implemented
src/evaluator/strength.rsStrength scoring calculationUpdated
src/parser/codegen.rsBuild-time serializationUpdated
src/evaluator/types/tests.rsType reading test coverageUpdated
Cargo.tomlDependencies - includes chrono 0.4.41Updated
Date And Timestamp Type Support | Dosu