Magic File Format#

Magic files define rules for identifying file types through byte-level patterns.

Overview#

Magic files contain rules that describe file formats by specifying byte patterns at specific offsets. Each rule consists of:

Offset - Where to look in the file
Type - How to interpret the bytes
Value - What to match against
Message - Description to display on match

Basic Format#

offset type value message

Example:

0 string PK ZIP archive data

This rule matches files starting with "PK" and labels them as "ZIP archive data".

Basic Syntax#

Rule Structure#

[level>]offset type [operator]value message

Component	Required	Description
`level>`	No	Indentation level for nested rules
`offset`	Yes	Where to read data
`type`	Yes	Data type to read
`operator`	No	Comparison operator (default: `=`)
`value`	Yes	Expected value
`message`	Yes	Description text

Comments#

Lines starting with # are comments:

# This is a comment
0 string PK ZIP archive

Whitespace#

Fields are separated by whitespace (spaces or tabs)
Leading whitespace indicates rule nesting level
Trailing whitespace is ignored

Offset Specifications#

Absolute Offset#

Direct byte position from file start:

0 string \x7fELF ELF executable
16 short 2 (shared object)

Hexadecimal Offset#

Use 0x prefix for hex offsets:

0x0 string MZ DOS executable
0x3c long >0 (PE offset present)

Negative Offset (From End)#

Read from end of file:

-4 string .ZIP ZIP file (end marker)

Indirect Offset#

Read pointer value and use as offset:

# Read 4-byte pointer at offset 60, then check that location
(0x3c.l) string PE\0\0 PE executable

Indirect offset syntax:

(base.type) - Read pointer at base, interpret as type
(base.type+adj) - Add adjustment to pointer value

Types for indirect offsets:

.b - byte (1 byte)
.s - short (2 bytes)
.l - long (4 bytes)
.q - quad (8 bytes)

Relative Offset#

Offset relative to previous match:

0 string PK\x03\x04 ZIP archive
&2 short >0 (with data)

The & prefix indicates relative offset.

Type Specifications#

Integer Types#

Type	Size	Endianness
`byte`	1 byte	N/A
`short`	2 bytes	native
`leshort`	2 bytes	little-endian
`beshort`	2 bytes	big-endian
`long`	4 bytes	native
`lelong`	4 bytes	little-endian
`belong`	4 bytes	big-endian
`quad`	8 bytes	native
`lequad`	8 bytes	little-endian
`bequad`	8 bytes	big-endian

All integer types have unsigned variants prefixed with u:

ubyte, ushort, uleshort, ubeshort
ulong, ulelong, ubelong
uquad, ulequad, ubequad

Examples:

0 byte 0x7f (byte match)
0 leshort 0x5a4d DOS MZ signature
0 belong 0xcafebabe Java class file
0 lequad 0x1234567890abcdef (64-bit little-endian)
8 uquad >0x8000000000000000 (unsigned 64-bit check)

Floating-Point Types#

Type	Size	Endianness	IEEE 754
`float`	4 bytes	native	32-bit
`befloat`	4 bytes	big-endian	32-bit
`lefloat`	4 bytes	little-endian	32-bit
`double`	8 bytes	native	64-bit
`bedouble`	8 bytes	big-endian	64-bit
`ledouble`	8 bytes	little-endian	64-bit

Floating-point types follow IEEE 754 standard. Unlike integer types, float types do not have signed or unsigned variants (the IEEE 754 format handles sign internally).

Examples:

0 lefloat =3.14159 File with float value pi
0 bedouble >1.0 Double value greater than 1.0

Float comparison behavior:

Equality: Uses epsilon-aware comparison (f64::EPSILON tolerance)
Ordering: Uses IEEE 754 semantics via partial_cmp
NaN: NaN != NaN, comparisons with NaN always return false
Infinity: Positive and negative infinity are properly ordered

Date/Timestamp Types#

Type	Size	Endianness	UTC/Local	Description
`date`	4 bytes	native	UTC	32-bit Unix timestamp (signed seconds since epoch), formatted as UTC
`ldate`	4 bytes	native	Local	32-bit Unix timestamp, formatted as local time
`bedate`	4 bytes	big-endian	UTC	32-bit Unix timestamp, big-endian byte order, UTC
`beldate`	4 bytes	big-endian	Local	32-bit Unix timestamp, big-endian byte order, local time
`ledate`	4 bytes	little-endian	UTC	32-bit Unix timestamp, little-endian byte order, UTC
`leldate`	4 bytes	little-endian	Local	32-bit Unix timestamp, little-endian byte order, local time
`qdate`	8 bytes	native	UTC	64-bit Unix timestamp (signed seconds since epoch), formatted as UTC
`qldate`	8 bytes	native	Local	64-bit Unix timestamp, formatted as local time
`beqdate`	8 bytes	big-endian	UTC	64-bit Unix timestamp, big-endian byte order, UTC
`beqldate`	8 bytes	big-endian	Local	64-bit Unix timestamp, big-endian byte order, local time
`leqdate`	8 bytes	little-endian	UTC	64-bit Unix timestamp, little-endian byte order, UTC
`leqldate`	8 bytes	little-endian	Local	64-bit Unix timestamp, little-endian byte order, local time

Timestamp values are formatted as strings matching GNU file output format: "Www Mmm DD HH:MM YYYY"

Examples:

# Match file modified at Unix epoch
0 date =0 File created at epoch

# Check timestamp in file header (big-endian)
8 bedate >946684800 File created after 2000-01-01

# 64-bit timestamp (little-endian, local time)
16 leqldate x \b, timestamp %s

String Types#

Match literal string data:

0 string %PDF PDF document
0 string GIF89a GIF image data

String escape sequences:

\x00 - hex byte
\n - newline
\t - tab
\\ - backslash

Pascal String Type#

Pascal string (pstring) is a length-prefixed string type. The length prefix can be 1, 2, or 4 bytes depending on the suffix flag. Unlike C strings, Pascal strings are not null-terminated.

Length Prefix Width#

The default pstring type uses a 1-byte length prefix (0-255 range). Use suffix flags to specify different prefix widths:

Suffix	Width	Endianness	Range
`/B`	1 byte	N/A	0-255 (default)
`/H`	2 bytes	big-endian	0-65535
`/h`	2 bytes	little-endian	0-65535
`/L`	4 bytes	big-endian	0-4294967295
`/l`	4 bytes	little-endian	0-4294967295

Self-Inclusive Length (`/J` Flag)#

The /J flag indicates that the stored length value includes the size of the length prefix itself (JPEG-style). This flag can be combined with any width variant.

Examples#

Basic pstring with default 1-byte prefix:

0 pstring =JPEG JPEG image (Pascal string)

2-byte big-endian length prefix:

0 pstring/H =JPEG JPEG image (2-byte BE prefix)

4-byte little-endian length prefix:

0 pstring/l x \b, name: %s

Self-inclusive length with 2-byte big-endian prefix:

0 pstring/HJ x \b, JPEG-style length

Self-inclusive length with default 1-byte prefix:

0 pstring/J x \b, self-inclusive length

The optional max_length parameter caps the length value:

0 pstring x \b, name: %s

String Flags#

String flags are now implemented (issue #234, landed in PR #288), providing libmagic-compatible string comparison semantics.

Flag	Description
`/c`	Case-insensitive (lowercase pattern chars trigger fold)
`/C`	Case-insensitive (uppercase pattern chars trigger fold)
`/w`	Whitespace-optional (pattern whitespace matches zero or more)
`/W`	Whitespace-required-compact (at least one, greedy consume)
`/T`	Trim leading/trailing ASCII whitespace from pattern
`/f`	Full-word match (post-match word boundary check)
`/b`	Force binary test (hint for MIME output)
`/t`	Force text test (hint for MIME output)

Note: /c and /C are asymmetric — the pattern character controls fold direction. With /c, only lowercase pattern chars cause the file byte to be folded to lowercase. With /C, only uppercase pattern chars cause the file byte to be folded to uppercase. See GOTCHAS section S6.5 for details on mixed-case behavior. /B (uppercase) is not a string flag; it is reserved for pstring length-width specification and is rejected on string types.

Examples:

# Case-insensitive match
0 string/c <!doctype HTML document

# Whitespace-optional (matches "ab", "a b", "a b")
0 string/w a b Pattern with flexible whitespace

# Combined flags
0 string/cw <!doctype html> HTML document (case and space insensitive)

# Full-word boundary check
0 string/f int C int keyword (not "integer")

# Trim leading/trailing whitespace from the pattern (`/T` = STRING_TRIM)
0 string/T " hello " Hello marker (matches "hello" without surrounding spaces)

# Binary-mode hint (`/b` = STRING_BINTEST) -- parsed and stored; MIME-output
# wiring deferred to the `!:mime` evaluation work
24 string/b FTCOMP FTCOMP compressed archive

# Text-mode hint (`/t` = STRING_TEXTTEST) -- parsed and stored; MIME-output
# wiring deferred to the `!:mime` evaluation work
0 string/t #!/bin/sh POSIX shell script text

Note on /T empty patterns: string/T " " trims to an empty pattern. The evaluator treats this as no-match (with a warn! log) rather than letting it silently match every file. Fix the rule.

Search Flags#

Search flags are specified as /flags after the range in search types: search/N/<flags>. libmagic-rs implements the full search-type flag semantics (issue #235).

Search flags share most semantics with string flags. Eight flags (/c, /C, /w, /W, /T, /f, /t, /b) carry the same comparison-altering or metadata-hint meanings as their string-type counterparts. The ninth flag, /s, is search-specific: it controls where the previous-match anchor lands for relative-offset children.

Flag	Description
`/s`	Start anchor: sets the previous-match anchor to match-START instead of match-END for relative-offset children
`/c`	Case-insensitive (lowercase): pattern lowercase letters match both cases in buffer
`/C`	Case-insensitive (uppercase): pattern uppercase letters match both cases in buffer
`/w`	Optional whitespace: pattern whitespace matches zero-or-more buffer whitespace
`/W`	Compact whitespace: pattern whitespace requires ≥1 buffer whitespace
`/T`	Trim whitespace: leading/trailing whitespace in pattern is ignored
`/f`	Full word: post-match word boundary check (same semantics as string type)
`/t`	Text test hint: MIME output hint (parsed, no comparison effect)
`/b`	Binary test hint: MIME output hint (parsed, no comparison effect)

Performance note: Flags /c, /C, /w, /W, /T, /f force byte-by-byte comparison, while /s, /t, /b preserve the fast SIMD-accelerated search path (via memchr::memmem::find).

/s anchor semantics: By default, a search match advances the previous-match anchor to the byte just past the matched pattern (match-END). With /s, the anchor lands on the first byte of the match (match-START). This is required for file formats that place magic signatures in trailers or use relative-offset children that reference the signature start (TGA footer, sfnt name table).

Examples:

# TGA footer with start-anchor (images:114)
# The magic string "TRUEVISION-XFILE.\0" is in the trailer; /s lets
# relative-offset children resolve against the signature's start position
0 search/4261301/s TRUEVISION-XFILE.\0 TGA image data
>-8 lelong x \b, offset %d

# Python shebang with optional whitespace (commands:20)
# Pattern has one space; /w allows zero or more whitespace in the file
0 search/1/w #!\040/usr/bin/python Python script text executable

# BinHex with binary hint (macintosh:17)
# /b is parsed and stored; comparison-time MIME effect deferred to !:mime
0 search/2652/b (This\ file\ must\ be\ converted\ with\ BinHex BinHex binary text

Note on /T empty patterns: the /N range is mandatory, so the example must carry a window like /256. A rule such as search/256/T " " (or any search/N/T with a whitespace-only pattern) trims to an empty pattern, and the evaluator treats that as no-match (with a warn! log) rather than letting it silently match every offset. Fix the rule. Bare search/T does not reach the evaluator at all -- it is a parse error before the trim ever runs.

Operators#

Comparison Operators#

Operator	Description	Example
`=`	Equal (default)	`0 long =0xcafebabe`
`!=`	Not equal	`4 byte !=0`
`>`	Greater than	`8 long >1000`
`<`	Less than	`8 long <100`
`>=`	Greater than or equal	`8 long >=1000`
`<=`	Less than or equal	`8 long <=100`
`&`	Bitwise AND	`4 byte &0x80`
`^`	Bitwise XOR (not yet implemented)	`4 byte ^0xff`

Bitwise AND with Mask#

Test specific bits:

# Check if bit 7 is set
4 byte &0x80 (compressed)

# Check if lower nibble is 0x0f
4 byte &0x0f=0x0f (all bits set)

Negation#

Prefix operator with ! for negation:

# Match if NOT equal to zero
4 long !0 (non-zero)

Values#

Numeric Values#

# Decimal
0 long 1234

# Hexadecimal
0 long 0x4d5a

# Octal
0 byte 0177

String Values#

# Plain string
0 string RIFF

# With escape sequences
0 string PK\x03\x04

# Unicode (as bytes)
0 string \xff\xfe

Special Values#

Value	Description
`x`	Match any value (always true)

Example:

0 string PK ZIP archive
>4 short x version %d

The x value matches anything and %d formats the matched value.

Nested Rules#

Rules can be nested to create hierarchical matches. Deeper matches indicate more specific identification.

Indentation Levels#

Use > prefix for nested rules:

0 string \x7fELF ELF
>4 byte 1 32-bit
>4 byte 2 64-bit
>5 byte 1 LSB
>5 byte 2 MSB

Evaluation:

Check offset 0 for ELF magic
If matched, check offset 4 for bit size
If matched, check offset 5 for endianness

Multiple Nesting Levels#

0 string \x7fELF ELF
>4 byte 2 64-bit
>>5 byte 1 LSB
>>>16 short 2 (shared object)
>>>16 short 3 (executable)

Continuation Messages#

Use \b (backspace) to suppress space before message:

0 string GIF8 GIF image data
>4 byte 7a \b, version 87a
>4 byte 9a \b, version 89a

Output: GIF image data, version 89a

Examples#

ELF Executable#

# ELF (Executable and Linkable Format)
0 string \x7fELF ELF
>4 byte 1 32-bit
>4 byte 2 64-bit
>5 byte 1 LSB
>5 byte 2 MSB
>16 leshort 2 (executable)
>16 leshort 3 (shared object)

ZIP Archive#

# ZIP archive
0 string PK\x03\x04 ZIP archive data
>4 leshort x \b, version %d.%d to extract
>6 leshort &0x0001 \b, encrypted
>6 leshort &0x0008 \b, with data descriptor

JPEG Image#

# JPEG
0 string \xff\xd8\xff JPEG image data
>3 byte 0xe0 \b, JFIF standard
>3 byte 0xe1 \b, Exif format

PDF Document#

# PDF
0 string %PDF- PDF document
>5 string 1. \b, version 1.x
>5 string 2. \b, version 2.x

PE Executable#

# DOS MZ executable with PE header
0 string MZ DOS executable
>0x3c lelong >0 (PE offset)
>(0x3c.l) string PE\0\0 PE executable

GZIP Compressed#

# GZIP
0 string \x1f\x8b gzip compressed data
>2 byte 8 \b, deflated
>3 byte &0x01 \b, ASCII text
>3 byte &0x02 \b, with header CRC
>3 byte &0x04 \b, with extra field
>3 byte &0x08 \b, with original name
>3 byte &0x10 \b, with comment

PNG Image#

# PNG
0 string \x89PNG\r\n\x1a\n PNG image data
>16 belong x \b, %d x
>20 belong x %d
>24 byte 0 \b, grayscale
>24 byte 2 \b, RGB
>24 byte 3 \b, palette
>24 byte 4 \b, grayscale+alpha
>24 byte 6 \b, RGBA

Floating-Point Values#

# Check for specific float value
0 lefloat =3.14159 File with float value pi

# Float comparison
0 float >1.0 Float value greater than 1.0

# Double precision
0 bedouble =0.45455 PNG image with gamma 0.45455

Meta-types / Control Directives#

Meta-types are pseudo-types that do not read bytes from the buffer. Instead, they control the evaluation flow: defining named subroutines, invoking them, providing fallbacks when no sibling matched, resetting per-level match state, or re-applying the entire rule database at a resolved offset.

Keyword	Syntax	Description
`name <id>`	`0 name part2`	Defines a named subroutine block; children are the subroutine body
`use <id>`	`>0 use part2`	Invokes a named subroutine at the resolved offset
`default`	`0 default x Fallback`	Fires only when no sibling at the same level has matched
`clear`	`0 clear`	Resets the per-level sibling-matched flag
`indirect`	`8 indirect x`	Re-applies the full rule database at the resolved offset
`offset`	`0 offset x at_offset %lld`	Emits the resolved file position as a `Value::Uint` for printf-style substitution

`name` and `use` — Named Subroutines#

name <id> defines a named subroutine block at the top level; its children are the subroutine body. use <id> invokes that subroutine at a given offset.

# Define a reusable subroutine
0 name part2
>0 search/64 ABC found_ABC
>>&0 byte x followed_by 0x%x

# Top-level rule that invokes the subroutine
0 string TEST Testfmt
>0 use part2
>64 use part2

Top-level name blocks are hoisted out of the flat rule list at parse time into a NameTable keyed by identifier. Duplicate names retain the first definition and emit a warning. name rules nested inside another rule's children are not well-defined in magic(5) and are scrubbed at load time.

`default` — Fallback Rule#

A default rule at a given level fires only when none of its siblings at the same level have matched. The operator is conventionally x (any-value), and the value column is ignored.

0 byte 0xAA Real-Match
0 default x DEFAULT-FALLBACK

Against a buffer starting with 0xAA, only Real-Match fires. Against a buffer starting with any other byte, DEFAULT-FALLBACK fires.

`clear` — Reset Sibling-Matched Flag#

A clear directive resets the per-level "sibling matched" flag, so a subsequent default at the same level can fire again even after an earlier sibling matched. Pair with EvaluationConfig::with_stop_at_first_match(false) to walk all top-level siblings.

0 byte 0xAA Match-A
0 default x DEFAULT-SKIPPED
0 clear
0 default x DEFAULT-FIRES

Against a buffer starting with 0xAA: Match-A fires, DEFAULT-SKIPPED is suppressed (a sibling matched), clear resets the flag, and DEFAULT-FIRES fires.

`indirect` — Re-apply Root Rules at a Resolved Offset#

An indirect rule resolves its offset, slices the buffer at that point, and re-applies the full rule database against the sub-buffer. Recursion is bounded by EvaluationConfig::max_recursion_depth.

0 byte 0x42 Inner-Match
8 indirect x

Against a 16-byte buffer with buf[8] = 0x42: the top-level byte rule at offset 0 does not match, and the indirect rule re-applies the root rules at offset 8 — where buf[8] = 0x42 matches the inner byte rule, producing Inner-Match.

Best Practices#

1. Order Rules by Specificity#

Put more specific rules first:

# Good: Specific before general
0 string PK\x03\x04 ZIP archive
0 string PK (generic PK signature)

# Bad: General catches all
0 string PK (generic PK signature)
0 string PK\x03\x04 ZIP archive # Never reached

2. Use Nested Rules for Details#

# Good: Hierarchical structure
0 string \x7fELF ELF
>4 byte 2 64-bit
>>5 byte 1 LSB

# Bad: Flat rules
0 string \x7fELF ELF
4 byte 2 64-bit
5 byte 1 LSB

3. Document Complex Rules#

# JPEG with Exif metadata
# The Exif APP1 marker (0xFFE1) contains camera metadata
0 string \xff\xd8\xff JPEG image data
>3 byte 0xe1 \b, Exif format

4. Test Edge Cases#

Consider:

Empty files
Truncated files
Minimum valid file size
Maximum offset values

5. Use Appropriate Types#

# Good: Match exact size needed
0 leshort 0x5a4d DOS executable

# Bad: Over-reading
0 lelong x (reads 4 bytes when 2 needed)

6. Handle Endianness Explicitly#

# Good: Explicit endianness
0 lelong 0xcafebabe (little-endian)
0 belong 0xcafebabe (big-endian)

# Risky: Native endianness
0 long 0xcafebabe (platform-dependent)

Supported Features#

Currently Supported#

Absolute offsets
Relative offsets
Indirect offsets (basic)
Byte, short, long, quad types (8-bit, 16-bit, 32-bit, 64-bit integers)
Float and double types (32-bit and 64-bit IEEE 754 floating-point)
Date and qdate types (32-bit and 64-bit Unix timestamps)
String and pstring types (null-terminated and length-prefixed strings)
Comparison operators (equal, not-equal, less-than, greater-than, less-equal, greater-equal)
Bitwise AND operator
Nested rules
Comments

Not Yet Supported#

Regex patterns
128-bit integer types

Recently Added#

Strength modifiers: The !:strength directive for adjusting rule priority
64-bit integers: quad type family (quad, uquad, lequad, ulequad, bequad, ubequad)
Floating-point types: float and double type families (float, befloat, lefloat, double, bedouble, ledouble) with IEEE 754 semantics and epsilon-aware equality

Troubleshooting#

Rule Not Matching#

Check offset is correct (0-indexed)
Verify endianness matches file format
Test with hexdump -C file | head
Ensure no conflicting rules

Unexpected Results#

Check rule order (first match wins)
Verify nested rule levels
Test with simpler rules first

Performance Issues#

Avoid unnecessary string searches
Use specific offsets over searches
Order rules by likelihood of match

Magic File Format#

Overview#

Basic Format#

Basic Syntax#

Rule Structure#

Comments#

Whitespace#

Offset Specifications#

Absolute Offset#

Hexadecimal Offset#

Negative Offset (From End)#

Indirect Offset#

Relative Offset#

Type Specifications#

Integer Types#

Floating-Point Types#

Date/Timestamp Types#

String Types#

Pascal String Type#

Length Prefix Width#

Self-Inclusive Length (/J Flag)#

Examples#

String Flags#

Search Flags#

Operators#

Comparison Operators#

Bitwise AND with Mask#

Negation#

Values#

Numeric Values#

String Values#

Special Values#

Nested Rules#

Indentation Levels#

Multiple Nesting Levels#

Continuation Messages#

Examples#

ELF Executable#

ZIP Archive#

JPEG Image#

PDF Document#

PE Executable#

GZIP Compressed#

PNG Image#

Floating-Point Values#

Meta-types / Control Directives#

name and use — Named Subroutines#

default — Fallback Rule#

clear — Reset Sibling-Matched Flag#

indirect — Re-apply Root Rules at a Resolved Offset#

Best Practices#

1. Order Rules by Specificity#

2. Use Nested Rules for Details#

3. Document Complex Rules#

4. Test Edge Cases#

5. Use Appropriate Types#

6. Handle Endianness Explicitly#

Supported Features#

Currently Supported#

Not Yet Supported#

Recently Added#

Troubleshooting#

Rule Not Matching#

Unexpected Results#

Performance Issues#

See Also#

Self-Inclusive Length (`/J` Flag)#

`name` and `use` — Named Subroutines#

`default` — Fallback Rule#

`clear` — Reset Sibling-Matched Flag#

`indirect` — Re-apply Root Rules at a Resolved Offset#