Date and Timestamp Type Support#
Status: Implemented in v0.5.0+
Date and timestamp type support extends the libmagic-rs type system to enable file format detection based on embedded temporal data. The implementation adds 14 magic file keywords supporting both 32-bit (Date) and 64-bit (QDate) Unix timestamps with configurable endianness (native, big-endian, little-endian, and middle-endian) and timezone semantics (UTC vs local time). This feature allows magic rules to match files based on creation dates, modification timestamps, and other temporal metadata commonly found in archive formats, filesystem images, and binary executables.
Implemented in Issue #41, the implementation integrates with the existing type system architecture, following patterns established by 32-bit Long and 64-bit Quad integer types. Date values are read as Unix timestamps (seconds since January 1, 1970 UTC) and formatted as human-readable strings using the chrono crate, matching GNU file command output format. The implementation is part of the broader Type System Expansion initiative to achieve full libmagic compatibility.
Type Keywords#
The implementation supports 14 date/timestamp type keywords organized into two bit-width families. These keywords follow a systematic naming convention that encodes byte order (endianness) and timezone interpretation directly into the type name.
32-bit Date Types#
Eight keywords represent 4-byte Unix timestamps (Date variant):
| Keyword | Endianness | Timezone | Description |
|---|---|---|---|
date | Native | UTC | System byte order, Coordinated Universal Time |
ldate | Native | Local | System byte order, local timezone |
bedate | Big | UTC | Big-endian, UTC |
beldate | Big | Local | Big-endian, local timezone |
ledate | Little | UTC | Little-endian, UTC |
leldate | Little | Local | Little-endian, local timezone |
medate | Middle | UTC | Middle-endian (PDP-11), UTC |
meldate | Middle | Local | Middle-endian (PDP-11), local timezone |
64-bit Date Types#
Six keywords represent 8-byte Unix timestamps (QDate variant). 64-bit date types support only native, big-endian, and little-endian byte orders (no middle-endian):
| Keyword | Endianness | Timezone | Description |
|---|---|---|---|
qdate | Native | UTC | System byte order, UTC |
qldate | Native | Local | System byte order, local timezone |
beqdate | Big | UTC | Big-endian, UTC |
beqldate | Big | Local | Big-endian, local timezone |
leqdate | Little | UTC | Little-endian, UTC |
leqldate | Little | Local | Little-endian, local timezone |
Naming Convention#
The keyword naming follows a systematic pattern:
- Base type:
date(32-bit) orqdate(64-bit, "quad date") - Endian prefix:
be(big-endian),le(little-endian),me(middle-endian for 32-bit only), or none (native) - Timezone marker:
lbeforedateindicates local time; absence means UTC
For example, beldate decodes as: big-endian (be) + local time (l) + 32-bit date (date).
Type System Architecture#
TypeKind Enum Extension#
The TypeKind enum in src/parser/ast.rs includes two new variants. The design follows the pattern of existing Long and Quad integer types, which include endianness and signedness fields:
pub enum TypeKind {
// ... existing variants (Byte, Short, Long, Quad, Float, Double, String) ...
/// 32-bit Unix timestamp
Date {
/// Byte order
endian: Endianness,
/// true = UTC, false = local time
utc: bool,
},
/// 64-bit Unix timestamp
QDate {
/// Byte order
endian: Endianness,
/// true = UTC, false = local time
utc: bool,
},
}
The structure mirrors integer types but uses the utc: bool field (where true indicates UTC, false indicates local time) instead of signed: bool to indicate timezone interpretation.
Endianness Enum Extension#
The implementation adds a Middle variant to the existing Endianness enum to support PDP-11 byte order. The enum now has four variants: Little, Big, Native, and Middle.
pub enum Endianness {
Little, // Little-endian byte order
Big, // Big-endian byte order
Native, // Native system byte order
Middle, // Middle-endian (PDP-11) byte order
}
Note: The implementation in PR #165 does not yet include middle-endian support for Date/QDate types. Middle-endian is planned for 32-bit Date types but is not included in the current implementation.
Timezone Semantics#
Timezone handling is explicitly encoded in type names via the l suffix. This design allows magic rules to explicitly specify the expected timezone for timestamp comparisons, critical for correctly matching files with region-specific temporal metadata.
UTC Variants#
Base types without the l suffix interpret timestamps as seconds since Unix epoch in Coordinated Universal Time. UTC variants include: date, bedate, ledate, medate, qdate, beqdate, leqdate.
The evaluator processes these types using chrono::Utc::timestamp_opt() for conversion from Unix timestamps to formatted dates.
Local Time Variants#
Types with l suffix before date interpret timestamps as seconds since epoch in the system's local timezone, requiring timezone conversion during evaluation. Local variants include: ldate, beldate, leldate, meldate, qldate, beqldate, leqldate.
The evaluator processes these types using chrono::Local::timestamp_opt() for conversion, which applies the system's configured timezone offset.
Endianness Handling#
Date types support four byte order variants for cross-platform file format compatibility.
Native Endianness#
System-dependent byte order matches the target architecture's native representation. Keywords: date, ldate, qdate, qldate. Native endianness enables efficient processing when the file's byte order matches the host system.
Big-Endian#
Most significant byte first. Common in network protocols (TCP/IP headers), Mac OS X file formats, and many archive formats (TAR, CPIO). Keywords: bedate, beldate, beqdate, beqldate.
Little-Endian#
Least significant byte first. Common in x86/x64 architectures, Windows binaries (PE format), and Intel-based file formats. Keywords: ledate, leldate, leqdate, leqldate.
Middle-Endian (PDP-11)#
ARM/PDP-11 byte order where bytes are swapped in pairs within 32-bit words. For a 32-bit value 0xAABBCCDD, bytes are stored as [0xBB, 0xAA, 0xDD, 0xCC]. Keywords: medate, meldate.
Middle-endian is planned for 32-bit dates but not yet implemented in the current version. 64-bit date types do not include middle-endian variants.
The evaluator uses the byteorder crate for byte order conversion, implementing LittleEndian::read_u32(), BigEndian::read_u32(), and NativeEndian::read_u32(). Middle-endian support requires implementing a custom read_middle_endian_u32() helper function since the byteorder crate does not provide this variant.
Implementation#
Parser Integration#
In src/parser/types.rs, the parse_type_keyword() function's alt combinator recognizes all 14 date type specifiers. Following the existing pattern of ordering longer prefixes before shorter ones, the implementation includes:
// Date type family (longest to shortest to prevent prefix matching)
alt((
tag("beqldate"), // 8 chars
tag("leqldate"), // 8 chars
tag("beqdate"), // 7 chars
tag("leqdate"), // 7 chars
tag("beldate"), // 7 chars
tag("leldate"), // 7 chars
tag("qldate"), // 6 chars
tag("bedate"), // 6 chars
tag("ldate"), // 5 chars
tag("ledate"), // 6 chars
tag("qdate"), // 5 chars
tag("date"), // 4 chars
))
The type_keyword_to_kind() mapping function includes 14 new match arms to convert keyword strings to TypeKind variants with appropriate endianness and timezone settings.
Evaluator Type Reading#
Following the pattern established by read_long() and read_quad() in src/evaluator/types/numeric.rs, the implementation adds new functions in src/evaluator/types/date.rs:
read_date() - 32-bit timestamp reading:
The function uses offset.checked_add(4) for overflow-safe range construction and buffer.get(offset..end) for bounds-checked access. The implementation:
- Reads 4 bytes from the buffer at the specified offset
- Dispatches on endianness to convert bytes to
u32(Little/Big/Native) - Casts
u32toi64for signed timestamp interpretation - Converts to
DateTimeusingchrono::Utc::timestamp_opt()orchrono::Local::timestamp_opt() - Formats using
"%a %b %e %H:%M:%S %Y"(e.g., "Wed Feb 14 12:34:56 2026") - Returns
Value::Stringwith the formatted date
read_qdate() - 64-bit timestamp reading:
The function follows the same pattern but reads 8 bytes and interprets the value as i64.
The read_typed_value() dispatcher in src/evaluator/types/mod.rs includes match arms for TypeKind::Date and TypeKind::QDate.
Chrono Integration#
The implementation uses chrono = "0.4.41" with features ["std", "clock"] as added to Cargo.toml. The chrono library provides Unix timestamp to DateTime conversion, UTC and local timezone handling, and strftime-compatible date formatting.
Value Representation#
String Representation#
Date values are stored as Value::String with formatted dates rather than numeric timestamps. This design simplifies the implementation and enables existing string comparison operators (=, !=, <, >) to work on formatted dates without modification.
Output Format#
Timestamps are formatted using "%a %b %e %H:%M:%S %Y", matching GNU file command output. Example formatted values:
"Wed Feb 14 12:34:56 2026"- Future date"Thu Jan 1 00:00:00 1970"- Unix epoch (timestamp 0)"Tue Jan 19 03:14:07 2038"- 32-bit timestamp overflow boundary (2147483647)
The format breaks down as:
%a- Abbreviated weekday name (Mon, Tue, Wed, etc.)%b- Abbreviated month name (Jan, Feb, Mar, etc.)%e- Day of month, space-padded (1-31)%H:%M:%S- 24-hour time%Y- Four-digit year
Alternative Considerations#
The design notes mention considering an optional Value::Timestamp(i64) variant to preserve numeric timestamp values for numeric comparisons, or storing both timestamp and formatted string. For the current implementation, string representation is sufficient for equality checks and lexicographic comparisons. Future enhancements could add numeric comparison support if use cases require range-based timestamp matching.
Strength Scoring#
Strength scores prioritize more specific types for accurate file detection. Date types follow the bit-width pattern established by existing numeric types:
Date(32-bit): Strength score 15 (same asLongandFloat)QDate(64-bit): Strength score 16 (same asQuadandDouble)
The scoring system assigns:
- String types: 20-25 points (most specific)
- 64-bit types (Quad/Double/QDate): 16 points
- 32-bit types (Long/Float/Date): 15 points
- 16-bit types (Short): 10 points
- 8-bit types (Byte): 5 points
Timezone semantics (UTC vs local) do not affect strength scores, as they represent the same underlying information with different interpretations. Similarly, endianness variants receive the same strength score since byte order does not change type specificity.
Type System Integration#
Date types integrate with the existing Parser-Evaluator Architecture following established patterns:
- AST Layer: New
TypeKindvariants with endianness and timezone fields - Parser Layer: Recognition of 14 keywords via nested
alt()combinators with ordering conventions - Evaluator Layer: Type reading functions returning formatted
Value::String - Strength Scoring: Assignment of 15-16 scores based on bit width
- Operator Compatibility: String comparison operators work on formatted timestamps
The implementation follows Enum Extension and Exhaustive Match Synchronization patterns, requiring synchronized updates across multiple files:
- AST definition (
src/parser/ast.rs) - TypeKind enum and Display trait - Parser grammar (
src/parser/types.rs) - Keyword recognition and mapping - Operator evaluation (
src/evaluator/types/mod.rs) - Type reading dispatch - Strength scoring (
src/evaluator/strength.rs) - Score assignment - Build serialization (
src/parser/codegen.rs) - Codegen support - Test coverage (
src/evaluator/types/tests.rs,src/parser/ast.rs) - Test case generation
Testing Requirements#
Comprehensive test coverage includes:
Parser Tests
- All 14 date type specifiers recognized correctly
- Keyword ordering prevents prefix matching issues
- Error handling for malformed type specifications
Endianness Tests
- Each byte order option (Native/Little/Big) produces correct values
- Endianness combinations with timezone variants
Timezone Tests
- UTC vs local time formatting differs appropriately
- System timezone configuration affects local time output
- Timezone boundary conditions (DST transitions, UTC offsets)
Buffer Safety Tests
- Buffer overrun error handling for out-of-bounds reads
- Offset overflow detection via
checked_add() - Partial buffer reads near buffer end
Known Timestamps
- Unix epoch (0) formats as "Thu Jan 1 00:00:00 1970"
- Y2K timestamp (946684800) formats correctly
- 32-bit overflow (2147483647) formats as "Tue Jan 19 03:14:07 2038"
Edge Cases
- Negative timestamps (pre-1970 dates)
- Far future dates beyond typical timestamp ranges
- Leap seconds and other calendar anomalies
Integration Tests
- Magic rules with date types match real file formats
- TAR archive timestamp detection
- ELF binary build timestamp matching
Use Cases#
Date and timestamp types enable file format detection based on temporal metadata embedded in binary file formats.
Archive Formats#
Archive formats commonly embed creation and modification timestamps in file headers:
- TAR archives: Each file entry includes a 12-byte octal modification time field at offset 136
- ZIP files: MSDOS-format creation/modification dates in central directory entries
- CPIO archives: Device-independent archive format with ASCII or binary timestamp encodings
Magic rules using date types can validate timestamp consistency, detect corrupted archives with invalid dates, or match archives from specific time periods.
Filesystem Images#
Filesystem metadata structures include temporal information:
- Ext2/3/4 superblocks: Creation time, last mount time, last write time at fixed offsets
- NTFS filesystem timestamps: FILETIME structures with 100-nanosecond precision
- ISO 9660 creation dates: ASCII-encoded date/time descriptors in volume descriptors
Date types enable detection of filesystem images and validation of superblock metadata integrity.
Binary Executables#
Compiled binaries often include build timestamps:
- ELF binaries: Build timestamps in
.notesections or custom headers - PE (Windows) executables: TimeDateStamp field in COFF header at offset 8
- Mach-O binaries: Load command timestamps in segment headers
These timestamps help identify compiler versions, detect tampered binaries, or match executables built in specific time ranges.
Document Formats#
Document formats embed creation and modification metadata:
- PDF documents: CreationDate and ModDate in document information dictionary
- Office documents: OLE property sets with creation/modification timestamps
- Database files: SQLite header modification counter and version-valid-for number
Date type matching enables document provenance verification and version identification.
Related Topics#
- Type System and Operator Coverage - The broader type system including integer, float, and string types with operator support
- Parser-Evaluator Architecture - The three-layer design pattern for type system extension
- Enum Extension and Exhaustive Match Synchronization - Patterns for adding new
TypeKindvariants across multiple modules - Endianness Handling - Byte order conversion patterns used across all multi-byte types
- Strength-Based Rule Prioritization - How type specificity affects file detection accuracy and rule ordering
Implementation Status#
| Aspect | Status |
|---|---|
| Current State | Implemented in v0.5.0+ |
| Tracking | Issue #41 (Resolved) |
| Target Release | v0.5.0+ |
| Dependencies | chrono 0.4.41 with features ["std", "clock"] |
| Blockers | None |
Relevant Code Files#
| File | Purpose | Status |
|---|---|---|
src/parser/ast.rs | TypeKind enum - includes Date/QDate variants | Updated |
src/parser/types.rs | Type keyword parsing and mapping | Updated |
src/evaluator/types/mod.rs | Type reading dispatcher | Updated |
src/evaluator/types/date.rs | Date type reading functions (read_date, read_qdate) | Implemented |
src/evaluator/strength.rs | Strength scoring calculation | Updated |
src/parser/codegen.rs | Build-time serialization | Updated |
src/evaluator/types/tests.rs | Type reading test coverage | Updated |
Cargo.toml | Dependencies - includes chrono 0.4.41 | Updated |