Magic File Format#
Magic files define rules for identifying file types through byte-level patterns. This chapter documents the magic file format supported by libmagic-rs.
Overview#
Magic files contain rules that describe file formats by specifying byte patterns at specific offsets. Each rule consists of:
- Offset - Where to look in the file
- Type - How to interpret the bytes
- Value - What to match against
- Message - Description to display on match
Basic Format#
offset type value message
Example:
0 string PK ZIP archive data
This rule matches files starting with "PK" and labels them as "ZIP archive data".
Basic Syntax#
Rule Structure#
[level>]offset type [operator]value message
| Component | Required | Description |
|---|---|---|
level> | No | Indentation level for nested rules |
offset | Yes | Where to read data |
type | Yes | Data type to read |
operator | No | Comparison operator (default: =) |
value | Yes | Expected value |
message | Yes | Description text |
Comments#
Lines starting with # are comments:
# This is a comment
0 string PK ZIP archive
Whitespace#
- Fields are separated by whitespace (spaces or tabs)
- Leading whitespace indicates rule nesting level
- Trailing whitespace is ignored
Offset Specifications#
Absolute Offset#
Direct byte position from file start:
0 string \x7fELF ELF executable
16 short 2 (shared object)
Hexadecimal Offset#
Use 0x prefix for hex offsets:
0x0 string MZ DOS executable
0x3c long >0 (PE offset present)
Negative Offset (From End)#
Read from end of file:
-4 string .ZIP ZIP file (end marker)
Indirect Offset#
Read pointer value and use as offset:
# Read 4-byte pointer at offset 60, then check that location
(0x3c.l) string PE\0\0 PE executable
Indirect offset syntax:
(base.type)- Read pointer at base, interpret as type(base.type+adj)- Add adjustment to pointer value
Types for indirect offsets:
.b- byte (1 byte).s- short (2 bytes).l- long (4 bytes).q- quad (8 bytes)
Relative Offset#
Offset relative to previous match:
0 string PK\x03\x04 ZIP archive
&2 short >0 (with data)
The & prefix indicates relative offset.
Type Specifications#
Integer Types#
| Type | Size | Endianness |
|---|---|---|
byte | 1 byte | N/A |
short | 2 bytes | native |
leshort | 2 bytes | little-endian |
beshort | 2 bytes | big-endian |
long | 4 bytes | native |
lelong | 4 bytes | little-endian |
belong | 4 bytes | big-endian |
quad | 8 bytes | native |
lequad | 8 bytes | little-endian |
bequad | 8 bytes | big-endian |
All integer types have unsigned variants prefixed with u:
ubyte,ushort,uleshort,ubeshortulong,ulelong,ubelonguquad,ulequad,ubequad
Examples:
0 byte 0x7f (byte match)
0 leshort 0x5a4d DOS MZ signature
0 belong 0xcafebabe Java class file
0 lequad 0x1234567890abcdef (64-bit little-endian)
8 uquad >0x8000000000000000 (unsigned 64-bit check)
Floating-Point Types#
| Type | Size | Endianness | IEEE 754 |
|---|---|---|---|
float | 4 bytes | native | 32-bit |
befloat | 4 bytes | big-endian | 32-bit |
lefloat | 4 bytes | little-endian | 32-bit |
double | 8 bytes | native | 64-bit |
bedouble | 8 bytes | big-endian | 64-bit |
ledouble | 8 bytes | little-endian | 64-bit |
Floating-point types follow IEEE 754 standard. Unlike integer types, float types do not have signed or unsigned variants (the IEEE 754 format handles sign internally).
Examples:
0 lefloat =3.14159 File with float value pi
0 bedouble >1.0 Double value greater than 1.0
Float comparison behavior:
- Equality: Uses epsilon-aware comparison (
f64::EPSILONtolerance) - Ordering: Uses IEEE 754 semantics via
partial_cmp - NaN:
NaN != NaN, comparisons with NaN always return false - Infinity: Positive and negative infinity are properly ordered
Date/Timestamp Types#
| Type | Size | Endianness | UTC/Local | Description |
|---|---|---|---|---|
date | 4 bytes | native | UTC | 32-bit Unix timestamp (signed seconds since epoch), formatted as UTC |
ldate | 4 bytes | native | Local | 32-bit Unix timestamp, formatted as local time |
bedate | 4 bytes | big-endian | UTC | 32-bit Unix timestamp, big-endian byte order, UTC |
beldate | 4 bytes | big-endian | Local | 32-bit Unix timestamp, big-endian byte order, local time |
ledate | 4 bytes | little-endian | UTC | 32-bit Unix timestamp, little-endian byte order, UTC |
leldate | 4 bytes | little-endian | Local | 32-bit Unix timestamp, little-endian byte order, local time |
qdate | 8 bytes | native | UTC | 64-bit Unix timestamp (signed seconds since epoch), formatted as UTC |
qldate | 8 bytes | native | Local | 64-bit Unix timestamp, formatted as local time |
beqdate | 8 bytes | big-endian | UTC | 64-bit Unix timestamp, big-endian byte order, UTC |
beqldate | 8 bytes | big-endian | Local | 64-bit Unix timestamp, big-endian byte order, local time |
leqdate | 8 bytes | little-endian | UTC | 64-bit Unix timestamp, little-endian byte order, UTC |
leqldate | 8 bytes | little-endian | Local | 64-bit Unix timestamp, little-endian byte order, local time |
Timestamp values are formatted as strings matching GNU file output format: "Www Mmm DD HH:MM YYYY"
Examples:
# Match file modified at Unix epoch
0 date =0 File created at epoch
# Check timestamp in file header (big-endian)
8 bedate >946684800 File created after 2000-01-01
# 64-bit timestamp (little-endian, local time)
16 leqldate x \b, timestamp %s
String Types#
Match literal string data:
0 string %PDF PDF document
0 string GIF89a GIF image data
String escape sequences:
\x00- hex byte\n- newline\t- tab\\- backslash
Pascal String Type#
Pascal string (pstring) is a length-prefixed string type. The length prefix can be 1, 2, or 4 bytes depending on the suffix flag. Unlike C strings, Pascal strings are not null-terminated.
Length Prefix Width#
The default pstring type uses a 1-byte length prefix (0-255 range). Use suffix flags to specify different prefix widths:
| Suffix | Width | Endianness | Range |
|---|---|---|---|
/B | 1 byte | N/A | 0-255 (default) |
/H | 2 bytes | big-endian | 0-65535 |
/h | 2 bytes | little-endian | 0-65535 |
/L | 4 bytes | big-endian | 0-4294967295 |
/l | 4 bytes | little-endian | 0-4294967295 |
Self-Inclusive Length (/J Flag)#
The /J flag indicates that the stored length value includes the size of the length prefix itself (JPEG-style). This flag can be combined with any width variant.
Examples#
Basic pstring with default 1-byte prefix:
0 pstring =JPEG JPEG image (Pascal string)
2-byte big-endian length prefix:
0 pstring/H =JPEG JPEG image (2-byte BE prefix)
4-byte little-endian length prefix:
0 pstring/l x \b, name: %s
Self-inclusive length with 2-byte big-endian prefix:
0 pstring/HJ x \b, JPEG-style length
Self-inclusive length with default 1-byte prefix:
0 pstring/J x \b, self-inclusive length
The optional max_length parameter caps the length value:
0 pstring x \b, name: %s
String Flags (Not Yet Implemented)#
Note: String flags are documented for libmagic compatibility reference but are not yet implemented in libmagic-rs.
| Flag | Description |
|---|---|
/c | Case-insensitive match |
/w | Whitespace-insensitive |
/b | Match at word boundary |
Example:
0 string/c <!doctype HTML document
Operators#
Comparison Operators#
| Operator | Description | Example |
|---|---|---|
= | Equal (default) | 0 long =0xcafebabe |
!= | Not equal | 4 byte !=0 |
> | Greater than | 8 long >1000 |
< | Less than | 8 long <100 |
>= | Greater than or equal | 8 long >=1000 |
<= | Less than or equal | 8 long <=100 |
& | Bitwise AND | 4 byte &0x80 |
^ | Bitwise XOR (not yet implemented) | 4 byte ^0xff |
Bitwise AND with Mask#
Test specific bits:
# Check if bit 7 is set
4 byte &0x80 (compressed)
# Check if lower nibble is 0x0f
4 byte &0x0f=0x0f (all bits set)
Negation#
Prefix operator with ! for negation:
# Match if NOT equal to zero
4 long !0 (non-zero)
Values#
Numeric Values#
# Decimal
0 long 1234
# Hexadecimal
0 long 0x4d5a
# Octal
0 byte 0177
String Values#
# Plain string
0 string RIFF
# With escape sequences
0 string PK\x03\x04
# Unicode (as bytes)
0 string \xff\xfe
Special Values#
| Value | Description |
|---|---|
x | Match any value (always true) |
Example:
0 string PK ZIP archive
>4 short x version %d
The x value matches anything and %d formats the matched value.
Nested Rules#
Rules can be nested to create hierarchical matches. Deeper matches indicate more specific identification.
Indentation Levels#
Use > prefix for nested rules:
0 string \x7fELF ELF
>4 byte 1 32-bit
>4 byte 2 64-bit
>5 byte 1 LSB
>5 byte 2 MSB
Evaluation:
- Check offset 0 for ELF magic
- If matched, check offset 4 for bit size
- If matched, check offset 5 for endianness
Multiple Nesting Levels#
0 string \x7fELF ELF
>4 byte 2 64-bit
>>5 byte 1 LSB
>>>16 short 2 (shared object)
>>>16 short 3 (executable)
Continuation Messages#
Use \b (backspace) to suppress space before message:
0 string GIF8 GIF image data
>4 byte 7a \b, version 87a
>4 byte 9a \b, version 89a
Output: GIF image data, version 89a
Examples#
ELF Executable#
# ELF (Executable and Linkable Format)
0 string \x7fELF ELF
>4 byte 1 32-bit
>4 byte 2 64-bit
>5 byte 1 LSB
>5 byte 2 MSB
>16 leshort 2 (executable)
>16 leshort 3 (shared object)
ZIP Archive#
# ZIP archive
0 string PK\x03\x04 ZIP archive data
>4 leshort x \b, version %d.%d to extract
>6 leshort &0x0001 \b, encrypted
>6 leshort &0x0008 \b, with data descriptor
JPEG Image#
# JPEG
0 string \xff\xd8\xff JPEG image data
>3 byte 0xe0 \b, JFIF standard
>3 byte 0xe1 \b, Exif format
PDF Document#
# PDF
0 string %PDF- PDF document
>5 string 1. \b, version 1.x
>5 string 2. \b, version 2.x
PE Executable#
# DOS MZ executable with PE header
0 string MZ DOS executable
>0x3c lelong >0 (PE offset)
>(0x3c.l) string PE\0\0 PE executable
GZIP Compressed#
# GZIP
0 string \x1f\x8b gzip compressed data
>2 byte 8 \b, deflated
>3 byte &0x01 \b, ASCII text
>3 byte &0x02 \b, with header CRC
>3 byte &0x04 \b, with extra field
>3 byte &0x08 \b, with original name
>3 byte &0x10 \b, with comment
PNG Image#
# PNG
0 string \x89PNG\r\n\x1a\n PNG image data
>16 belong x \b, %d x
>20 belong x %d
>24 byte 0 \b, grayscale
>24 byte 2 \b, RGB
>24 byte 3 \b, palette
>24 byte 4 \b, grayscale+alpha
>24 byte 6 \b, RGBA
Floating-Point Values#
# Check for specific float value
0 lefloat =3.14159 File with float value pi
# Float comparison
0 float >1.0 Float value greater than 1.0
# Double precision
0 bedouble =0.45455 PNG image with gamma 0.45455
Best Practices#
1. Order Rules by Specificity#
Put more specific rules first:
# Good: Specific before general
0 string PK\x03\x04 ZIP archive
0 string PK (generic PK signature)
# Bad: General catches all
0 string PK (generic PK signature)
0 string PK\x03\x04 ZIP archive # Never reached
2. Use Nested Rules for Details#
# Good: Hierarchical structure
0 string \x7fELF ELF
>4 byte 2 64-bit
>>5 byte 1 LSB
# Bad: Flat rules
0 string \x7fELF ELF
4 byte 2 64-bit
5 byte 1 LSB
3. Document Complex Rules#
# JPEG with Exif metadata
# The Exif APP1 marker (0xFFE1) contains camera metadata
0 string \xff\xd8\xff JPEG image data
>3 byte 0xe1 \b, Exif format
4. Test Edge Cases#
Consider:
- Empty files
- Truncated files
- Minimum valid file size
- Maximum offset values
5. Use Appropriate Types#
# Good: Match exact size needed
0 leshort 0x5a4d DOS executable
# Bad: Over-reading
0 lelong x (reads 4 bytes when 2 needed)
6. Handle Endianness Explicitly#
# Good: Explicit endianness
0 lelong 0xcafebabe (little-endian)
0 belong 0xcafebabe (big-endian)
# Risky: Native endianness
0 long 0xcafebabe (platform-dependent)
Supported Features#
Currently Supported#
- Absolute offsets
- Relative offsets
- Indirect offsets (basic)
- Byte, short, long, quad types (8-bit, 16-bit, 32-bit, 64-bit integers)
- Float and double types (32-bit and 64-bit IEEE 754 floating-point)
- Date and qdate types (32-bit and 64-bit Unix timestamps)
- String and pstring types (null-terminated and length-prefixed strings)
- Comparison operators (equal, not-equal, less-than, greater-than, less-equal, greater-equal)
- Bitwise AND operator
- Nested rules
- Comments
Not Yet Supported#
- Regex patterns
- 128-bit integer types
- Use/name directives
- Default rules
Recently Added#
- Strength modifiers: The
!:strengthdirective for adjusting rule priority - 64-bit integers:
quadtype family (quad,uquad,lequad,ulequad,bequad,ubequad) - Floating-point types:
floatanddoubletype families (float,befloat,lefloat,double,bedouble,ledouble) with IEEE 754 semantics and epsilon-aware equality
Troubleshooting#
Rule Not Matching#
- Check offset is correct (0-indexed)
- Verify endianness matches file format
- Test with
hexdump -C file | head - Ensure no conflicting rules
Unexpected Results#
- Check rule order (first match wins)
- Verify nested rule levels
- Test with simpler rules first
Performance Issues#
- Avoid unnecessary string searches
- Use specific offsets over searches
- Order rules by likelihood of match
See Also#
- magic(5) - Original magic format
- file(1) - GNU file command
- API Reference - libmagic-rs API documentation