Documents
magic-format
magic-format
Type
External
Status
Published
Created
Mar 1, 2026
Updated
Mar 30, 2026
Updated by
Dosu Bot

Magic File Format#

Magic files define rules for identifying file types through byte-level patterns. This chapter documents the magic file format supported by libmagic-rs.

Overview#

Magic files contain rules that describe file formats by specifying byte patterns at specific offsets. Each rule consists of:

  1. Offset - Where to look in the file
  2. Type - How to interpret the bytes
  3. Value - What to match against
  4. Message - Description to display on match

Basic Format#

offset type value message

Example:

0 string PK ZIP archive data

This rule matches files starting with "PK" and labels them as "ZIP archive data".

Basic Syntax#

Rule Structure#

[level>]offset type [operator]value message
ComponentRequiredDescription
level>NoIndentation level for nested rules
offsetYesWhere to read data
typeYesData type to read
operatorNoComparison operator (default: =)
valueYesExpected value
messageYesDescription text

Comments#

Lines starting with # are comments:

# This is a comment
0 string PK ZIP archive

Whitespace#

  • Fields are separated by whitespace (spaces or tabs)
  • Leading whitespace indicates rule nesting level
  • Trailing whitespace is ignored

Offset Specifications#

Absolute Offset#

Direct byte position from file start:

0 string \x7fELF ELF executable
16 short 2 (shared object)

Hexadecimal Offset#

Use 0x prefix for hex offsets:

0x0 string MZ DOS executable
0x3c long >0 (PE offset present)

Negative Offset (From End)#

Read from end of file:

-4 string .ZIP ZIP file (end marker)

Indirect Offset#

Read pointer value and use as offset:

# Read 4-byte pointer at offset 60, then check that location
(0x3c.l) string PE\0\0 PE executable

Indirect offset syntax:

  • (base.type) - Read pointer at base, interpret as type
  • (base.type+adj) - Add adjustment to pointer value

Types for indirect offsets:

  • .b - byte (1 byte)
  • .s - short (2 bytes)
  • .l - long (4 bytes)
  • .q - quad (8 bytes)

Relative Offset#

Offset relative to previous match:

0 string PK\x03\x04 ZIP archive
&2 short >0 (with data)

The & prefix indicates relative offset.

Type Specifications#

Integer Types#

TypeSizeEndianness
byte1 byteN/A
short2 bytesnative
leshort2 byteslittle-endian
beshort2 bytesbig-endian
long4 bytesnative
lelong4 byteslittle-endian
belong4 bytesbig-endian
quad8 bytesnative
lequad8 byteslittle-endian
bequad8 bytesbig-endian

All integer types have unsigned variants prefixed with u:

  • ubyte, ushort, uleshort, ubeshort
  • ulong, ulelong, ubelong
  • uquad, ulequad, ubequad

Examples:

0 byte 0x7f (byte match)
0 leshort 0x5a4d DOS MZ signature
0 belong 0xcafebabe Java class file
0 lequad 0x1234567890abcdef (64-bit little-endian)
8 uquad >0x8000000000000000 (unsigned 64-bit check)

Floating-Point Types#

TypeSizeEndiannessIEEE 754
float4 bytesnative32-bit
befloat4 bytesbig-endian32-bit
lefloat4 byteslittle-endian32-bit
double8 bytesnative64-bit
bedouble8 bytesbig-endian64-bit
ledouble8 byteslittle-endian64-bit

Floating-point types follow IEEE 754 standard. Unlike integer types, float types do not have signed or unsigned variants (the IEEE 754 format handles sign internally).

Examples:

0 lefloat =3.14159 File with float value pi
0 bedouble >1.0 Double value greater than 1.0

Float comparison behavior:

  • Equality: Uses epsilon-aware comparison (f64::EPSILON tolerance)
  • Ordering: Uses IEEE 754 semantics via partial_cmp
  • NaN: NaN != NaN, comparisons with NaN always return false
  • Infinity: Positive and negative infinity are properly ordered

Date/Timestamp Types#

TypeSizeEndiannessUTC/LocalDescription
date4 bytesnativeUTC32-bit Unix timestamp (signed seconds since epoch), formatted as UTC
ldate4 bytesnativeLocal32-bit Unix timestamp, formatted as local time
bedate4 bytesbig-endianUTC32-bit Unix timestamp, big-endian byte order, UTC
beldate4 bytesbig-endianLocal32-bit Unix timestamp, big-endian byte order, local time
ledate4 byteslittle-endianUTC32-bit Unix timestamp, little-endian byte order, UTC
leldate4 byteslittle-endianLocal32-bit Unix timestamp, little-endian byte order, local time
qdate8 bytesnativeUTC64-bit Unix timestamp (signed seconds since epoch), formatted as UTC
qldate8 bytesnativeLocal64-bit Unix timestamp, formatted as local time
beqdate8 bytesbig-endianUTC64-bit Unix timestamp, big-endian byte order, UTC
beqldate8 bytesbig-endianLocal64-bit Unix timestamp, big-endian byte order, local time
leqdate8 byteslittle-endianUTC64-bit Unix timestamp, little-endian byte order, UTC
leqldate8 byteslittle-endianLocal64-bit Unix timestamp, little-endian byte order, local time

Timestamp values are formatted as strings matching GNU file output format: "Www Mmm DD HH:MM YYYY"

Examples:

# Match file modified at Unix epoch
0 date =0 File created at epoch

# Check timestamp in file header (big-endian)
8 bedate >946684800 File created after 2000-01-01

# 64-bit timestamp (little-endian, local time)
16 leqldate x \b, timestamp %s

String Types#

Match literal string data:

0 string %PDF PDF document
0 string GIF89a GIF image data

String escape sequences:

  • \x00 - hex byte
  • \n - newline
  • \t - tab
  • \\ - backslash

Pascal String Type#

Pascal string (pstring) is a length-prefixed string type. The length prefix can be 1, 2, or 4 bytes depending on the suffix flag. Unlike C strings, Pascal strings are not null-terminated.

Length Prefix Width#

The default pstring type uses a 1-byte length prefix (0-255 range). Use suffix flags to specify different prefix widths:

SuffixWidthEndiannessRange
/B1 byteN/A0-255 (default)
/H2 bytesbig-endian0-65535
/h2 byteslittle-endian0-65535
/L4 bytesbig-endian0-4294967295
/l4 byteslittle-endian0-4294967295

Self-Inclusive Length (/J Flag)#

The /J flag indicates that the stored length value includes the size of the length prefix itself (JPEG-style). This flag can be combined with any width variant.

Examples#

Basic pstring with default 1-byte prefix:

0 pstring =JPEG JPEG image (Pascal string)

2-byte big-endian length prefix:

0 pstring/H =JPEG JPEG image (2-byte BE prefix)

4-byte little-endian length prefix:

0 pstring/l x \b, name: %s

Self-inclusive length with 2-byte big-endian prefix:

0 pstring/HJ x \b, JPEG-style length

Self-inclusive length with default 1-byte prefix:

0 pstring/J x \b, self-inclusive length

The optional max_length parameter caps the length value:

0 pstring x \b, name: %s

String Flags (Not Yet Implemented)#

Note: String flags are documented for libmagic compatibility reference but are not yet implemented in libmagic-rs.

FlagDescription
/cCase-insensitive match
/wWhitespace-insensitive
/bMatch at word boundary

Example:

0 string/c <!doctype HTML document

Operators#

Comparison Operators#

OperatorDescriptionExample
=Equal (default)0 long =0xcafebabe
!=Not equal4 byte !=0
>Greater than8 long >1000
<Less than8 long <100
>=Greater than or equal8 long >=1000
<=Less than or equal8 long <=100
&Bitwise AND4 byte &0x80
^Bitwise XOR (not yet implemented)4 byte ^0xff

Bitwise AND with Mask#

Test specific bits:

# Check if bit 7 is set
4 byte &0x80 (compressed)

# Check if lower nibble is 0x0f
4 byte &0x0f=0x0f (all bits set)

Negation#

Prefix operator with ! for negation:

# Match if NOT equal to zero
4 long !0 (non-zero)

Values#

Numeric Values#

# Decimal
0 long 1234

# Hexadecimal
0 long 0x4d5a

# Octal
0 byte 0177

String Values#

# Plain string
0 string RIFF

# With escape sequences
0 string PK\x03\x04

# Unicode (as bytes)
0 string \xff\xfe

Special Values#

ValueDescription
xMatch any value (always true)

Example:

0 string PK ZIP archive
>4 short x version %d

The x value matches anything and %d formats the matched value.

Nested Rules#

Rules can be nested to create hierarchical matches. Deeper matches indicate more specific identification.

Indentation Levels#

Use > prefix for nested rules:

0 string \x7fELF ELF
>4 byte 1 32-bit
>4 byte 2 64-bit
>5 byte 1 LSB
>5 byte 2 MSB

Evaluation:

  1. Check offset 0 for ELF magic
  2. If matched, check offset 4 for bit size
  3. If matched, check offset 5 for endianness

Multiple Nesting Levels#

0 string \x7fELF ELF
>4 byte 2 64-bit
>>5 byte 1 LSB
>>>16 short 2 (shared object)
>>>16 short 3 (executable)

Continuation Messages#

Use \b (backspace) to suppress space before message:

0 string GIF8 GIF image data
>4 byte 7a \b, version 87a
>4 byte 9a \b, version 89a

Output: GIF image data, version 89a

Examples#

ELF Executable#

# ELF (Executable and Linkable Format)
0 string \x7fELF ELF
>4 byte 1 32-bit
>4 byte 2 64-bit
>5 byte 1 LSB
>5 byte 2 MSB
>16 leshort 2 (executable)
>16 leshort 3 (shared object)

ZIP Archive#

# ZIP archive
0 string PK\x03\x04 ZIP archive data
>4 leshort x \b, version %d.%d to extract
>6 leshort &0x0001 \b, encrypted
>6 leshort &0x0008 \b, with data descriptor

JPEG Image#

# JPEG
0 string \xff\xd8\xff JPEG image data
>3 byte 0xe0 \b, JFIF standard
>3 byte 0xe1 \b, Exif format

PDF Document#

# PDF
0 string %PDF- PDF document
>5 string 1. \b, version 1.x
>5 string 2. \b, version 2.x

PE Executable#

# DOS MZ executable with PE header
0 string MZ DOS executable
>0x3c lelong >0 (PE offset)
>(0x3c.l) string PE\0\0 PE executable

GZIP Compressed#

# GZIP
0 string \x1f\x8b gzip compressed data
>2 byte 8 \b, deflated
>3 byte &0x01 \b, ASCII text
>3 byte &0x02 \b, with header CRC
>3 byte &0x04 \b, with extra field
>3 byte &0x08 \b, with original name
>3 byte &0x10 \b, with comment

PNG Image#

# PNG
0 string \x89PNG\r\n\x1a\n PNG image data
>16 belong x \b, %d x
>20 belong x %d
>24 byte 0 \b, grayscale
>24 byte 2 \b, RGB
>24 byte 3 \b, palette
>24 byte 4 \b, grayscale+alpha
>24 byte 6 \b, RGBA

Floating-Point Values#

# Check for specific float value
0 lefloat =3.14159 File with float value pi

# Float comparison
0 float >1.0 Float value greater than 1.0

# Double precision
0 bedouble =0.45455 PNG image with gamma 0.45455

Best Practices#

1. Order Rules by Specificity#

Put more specific rules first:

# Good: Specific before general
0 string PK\x03\x04 ZIP archive
0 string PK (generic PK signature)

# Bad: General catches all
0 string PK (generic PK signature)
0 string PK\x03\x04 ZIP archive # Never reached

2. Use Nested Rules for Details#

# Good: Hierarchical structure
0 string \x7fELF ELF
>4 byte 2 64-bit
>>5 byte 1 LSB

# Bad: Flat rules
0 string \x7fELF ELF
4 byte 2 64-bit
5 byte 1 LSB

3. Document Complex Rules#

# JPEG with Exif metadata
# The Exif APP1 marker (0xFFE1) contains camera metadata
0 string \xff\xd8\xff JPEG image data
>3 byte 0xe1 \b, Exif format

4. Test Edge Cases#

Consider:

  • Empty files
  • Truncated files
  • Minimum valid file size
  • Maximum offset values

5. Use Appropriate Types#

# Good: Match exact size needed
0 leshort 0x5a4d DOS executable

# Bad: Over-reading
0 lelong x (reads 4 bytes when 2 needed)

6. Handle Endianness Explicitly#

# Good: Explicit endianness
0 lelong 0xcafebabe (little-endian)
0 belong 0xcafebabe (big-endian)

# Risky: Native endianness
0 long 0xcafebabe (platform-dependent)

Supported Features#

Currently Supported#

  • Absolute offsets
  • Relative offsets
  • Indirect offsets (basic)
  • Byte, short, long, quad types (8-bit, 16-bit, 32-bit, 64-bit integers)
  • Float and double types (32-bit and 64-bit IEEE 754 floating-point)
  • Date and qdate types (32-bit and 64-bit Unix timestamps)
  • String and pstring types (null-terminated and length-prefixed strings)
  • Comparison operators (equal, not-equal, less-than, greater-than, less-equal, greater-equal)
  • Bitwise AND operator
  • Nested rules
  • Comments

Not Yet Supported#

  • Regex patterns
  • 128-bit integer types
  • Use/name directives
  • Default rules

Recently Added#

  • Strength modifiers: The !:strength directive for adjusting rule priority
  • 64-bit integers: quad type family (quad, uquad, lequad, ulequad, bequad, ubequad)
  • Floating-point types: float and double type families (float, befloat, lefloat, double, bedouble, ledouble) with IEEE 754 semantics and epsilon-aware equality

Troubleshooting#

Rule Not Matching#

  1. Check offset is correct (0-indexed)
  2. Verify endianness matches file format
  3. Test with hexdump -C file | head
  4. Ensure no conflicting rules

Unexpected Results#

  1. Check rule order (first match wins)
  2. Verify nested rule levels
  3. Test with simpler rules first

Performance Issues#

  1. Avoid unnecessary string searches
  2. Use specific offsets over searches
  3. Order rules by likelihood of match

See Also#