Documents
MAGIC_FORMAT
MAGIC_FORMAT
Type
External
Status
Published
Created
Mar 1, 2026
Updated
Mar 25, 2026
Updated by
Dosu Bot

Magic File Format Guide#

A comprehensive guide to the magic file format used by libmagic-rs.

Table of Contents#


Overview#

Magic files contain rules that describe file formats by specifying byte patterns at specific offsets. Each rule consists of:

  1. Offset - Where to look in the file
  2. Type - How to interpret the bytes
  3. Value - What to match against
  4. Message - Description to display on match

Basic Format#

offset type value message

Example:

0 string PK ZIP archive data

This rule matches files starting with "PK" and labels them as "ZIP archive data".


Basic Syntax#

Rule Structure#

[level>]offset type [operator]value message
ComponentRequiredDescription
level>NoIndentation level for nested rules
offsetYesWhere to read data
typeYesData type to read
operatorNoComparison operator (default: =)
valueYesExpected value
messageYesDescription text

Comments#

Lines starting with # are comments:

# This is a comment
0 string PK ZIP archive

Whitespace#

  • Fields are separated by whitespace (spaces or tabs)
  • Leading whitespace indicates rule nesting level
  • Trailing whitespace is ignored

Offset Specifications#

Absolute Offset#

Direct byte position from file start:

0 string \x7fELF ELF executable
16 short 2 (shared object)

Hexadecimal Offset#

Use 0x prefix for hex offsets:

0x0 string MZ DOS executable
0x3c long >0 (PE offset present)

Negative Offset (From End)#

Read from end of file:

-4 string .ZIP ZIP file (end marker)

Indirect Offset#

Read pointer value and use as offset:

# Read 4-byte pointer at offset 60, then check that location
(0x3c.l) string PE\0\0 PE executable

Indirect offset syntax:

  • (base.type) - Read pointer at base, interpret as type
  • (base.type+adj) - Add adjustment to pointer value

Types for indirect offsets:

  • .b - byte (1 byte)
  • .s - short (2 bytes)
  • .l - long (4 bytes)
  • .q - quad (8 bytes)

Relative Offset#

Offset relative to previous match:

0 string PK\x03\x04 ZIP archive
&2 short >0 (with data)

The & prefix indicates relative offset.


Type Specifications#

Integer Types#

TypeSizeEndianness
byte1 byteN/A
short2 bytesnative
leshort2 byteslittle-endian
beshort2 bytesbig-endian
long4 bytesnative
lelong4 byteslittle-endian
belong4 bytesbig-endian
quad8 bytesnative
lequad8 byteslittle-endian
bequad8 bytesbig-endian

All integer types have unsigned variants prefixed with u:

  • ubyte, ushort, uleshort, ubeshort
  • ulong, ulelong, ubelong
  • uquad, ulequad, ubequad

Examples:

0 byte 0x7f (byte match)
0 leshort 0x5a4d DOS MZ signature
0 belong 0xcafebabe Java class file
0 lequad 0x1234567890abcdef (64-bit little-endian)
8 uquad >0x8000000000000000 (unsigned 64-bit check)

String Types#

Match literal string data:

0 string %PDF PDF document
0 string GIF89a GIF image data

String escape sequences:

  • \x00 - hex byte
  • \n - newline
  • \t - tab
  • \\ - backslash

Pascal String (pstring)

Length-prefixed string type where a length prefix (1, 2, or 4 bytes) specifies the number of bytes of string data that follow. Unlike C strings, Pascal strings are not null-terminated.

The length prefix width is controlled by suffix flags:

SuffixLength Prefix WidthByte Order
/B1 byte (default)N/A
/H2 bytesbig-endian
/h2 byteslittle-endian
/L4 bytesbig-endian
/l4 byteslittle-endian

The /J flag indicates JPEG-style self-inclusive length where the stored length value includes the size of the length prefix itself. This flag can be combined with any width suffix (/HJ, /lJ, etc.) or used alone (/J defaults to 1-byte width).

Examples:

0 pstring =JPEG JPEG image (1-byte prefix, default)
0 pstring/B =JPEG JPEG image (1-byte prefix, explicit)
0 pstring/H =JPEG JPEG image (2-byte big-endian prefix)
0 pstring/h =JPEG JPEG image (2-byte little-endian prefix)
0 pstring/L =JPEG JPEG image (4-byte big-endian prefix)
0 pstring/l =JPEG JPEG image (4-byte little-endian prefix)
0 pstring/HJ =JPEG JPEG image (2-byte BE, self-inclusive length)

If max_length is specified in the magic file (not shown in the basic syntax), it caps the length value to prevent reading excessive data.

String Flags#

Flags for string type:

FlagDescription
/cCase-insensitive match
/wWhitespace-insensitive
/bMatch at word boundary

Example:

0 string/c <!doctype HTML document

Flags for pstring type are documented in the Pascal String section above.

Date/Timestamp Types#

Date and timestamp types read Unix timestamps (signed seconds since epoch) and format them as human-readable strings.

32-bit timestamps (4 bytes):

TypeSizeEndiannessTimezone
date4 bytesnativeUTC
ldate4 bytesnativelocal time
bedate4 bytesbig-endianUTC
beldate4 bytesbig-endianlocal time
ledate4 byteslittle-endianUTC
leldate4 byteslittle-endianlocal time

64-bit timestamps (8 bytes):

TypeSizeEndiannessTimezone
qdate8 bytesnativeUTC
qldate8 bytesnativelocal time
beqdate8 bytesbig-endianUTC
beqldate8 bytesbig-endianlocal time
leqdate8 byteslittle-endianUTC
leqldate8 byteslittle-endianlocal time

All timestamp values are formatted as strings in the format "Www Mmm DD HH:MM:SS YYYY" to match GNU file output.

Example:

0 ldate x Unix timestamp: %s

Operators#

Comparison Operators#

OperatorDescriptionExample
=Equal (default)0 long =0xcafebabe
!Not equal4 byte !0
<Less than8 long <100
>Greater than8 long >1000
<=Less than or equal8 long <=100
>=Greater than or equal8 long >=1000
&Bitwise AND4 byte &0x80
^Bitwise XOR4 byte ^0xff
~Bitwise NOT4 byte ~0xff
xMatch any value4 byte x

Bitwise AND with Mask#

Test specific bits:

# Check if bit 7 is set
4 byte &0x80 (compressed)

# Check if lower nibble is 0x0f
4 byte &0x0f=0x0f (all bits set)

Negation#

Prefix operator with ! for negation:

# Match if NOT equal to zero
4 long !0 (non-zero)

Values#

Numeric Values#

# Decimal
0 long 1234

# Hexadecimal
0 long 0x4d5a

# Octal
0 byte 0177

String Values#

# Plain string
0 string RIFF

# With escape sequences
0 string PK\x03\x04

# Unicode (as bytes)
0 string \xff\xfe

Any-Value Operator#

The x operator matches unconditionally at the given offset. It is typically used in child rules to extract and format a value without testing it:

Example:

0 string PK ZIP archive
>4 short x version %d

The x value matches anything and %d formats the matched value.


Nested Rules#

Rules can be nested to create hierarchical matches. Deeper matches indicate more specific identification.

Indentation Levels#

Use > prefix for nested rules:

0 string \x7fELF ELF
>4 byte 1 32-bit
>4 byte 2 64-bit
>5 byte 1 LSB
>5 byte 2 MSB

Evaluation:

  1. Check offset 0 for ELF magic
  2. If matched, check offset 4 for bit size
  3. If matched, check offset 5 for endianness

Multiple Nesting Levels#

0 string \x7fELF ELF
>4 byte 2 64-bit
>>5 byte 1 LSB
>>>16 short 2 (shared object)
>>>16 short 3 (executable)

Continuation Messages#

Use \b (backspace) to suppress space before message:

0 string GIF8 GIF image data
>4 byte 7a \b, version 87a
>4 byte 9a \b, version 89a

Output: GIF image data, version 89a


Examples#

ELF Executable#

# ELF (Executable and Linkable Format)
0 string \x7fELF ELF
>4 byte 1 32-bit
>4 byte 2 64-bit
>5 byte 1 LSB
>5 byte 2 MSB
>16 leshort 2 (executable)
>16 leshort 3 (shared object)

ZIP Archive#

# ZIP archive
0 string PK\x03\x04 ZIP archive data
>4 leshort x \b, version %d.%d to extract
>6 leshort &0x0001 \b, encrypted
>6 leshort &0x0008 \b, with data descriptor

JPEG Image#

# JPEG
0 string \xff\xd8\xff JPEG image data
>3 byte 0xe0 \b, JFIF standard
>3 byte 0xe1 \b, Exif format

PDF Document#

# PDF
0 string %PDF- PDF document
>5 string 1. \b, version 1.x
>5 string 2. \b, version 2.x

PE Executable#

# DOS MZ executable with PE header
0 string MZ DOS executable
>0x3c lelong >0 (PE offset)
>(0x3c.l) string PE\0\0 PE executable

GZIP Compressed#

# GZIP
0 string \x1f\x8b gzip compressed data
>2 byte 8 \b, deflated
>3 byte &0x01 \b, ASCII text
>3 byte &0x02 \b, with header CRC
>3 byte &0x04 \b, with extra field
>3 byte &0x08 \b, with original name
>3 byte &0x10 \b, with comment

PNG Image#

# PNG
0 string \x89PNG\r\n\x1a\n PNG image data
>16 belong x \b, %d x
>20 belong x %d
>24 byte 0 \b, grayscale
>24 byte 2 \b, RGB
>24 byte 3 \b, palette
>24 byte 4 \b, grayscale+alpha
>24 byte 6 \b, RGBA

Best Practices#

1. Order Rules by Specificity#

Put more specific rules first:

# Good: Specific before general
0 string PK\x03\x04 ZIP archive
0 string PK (generic PK signature)

# Bad: General catches all
0 string PK (generic PK signature)
0 string PK\x03\x04 ZIP archive # Never reached

2. Use Nested Rules for Details#

# Good: Hierarchical structure
0 string \x7fELF ELF
>4 byte 2 64-bit
>>5 byte 1 LSB

# Bad: Flat rules
0 string \x7fELF ELF
4 byte 2 64-bit
5 byte 1 LSB

3. Document Complex Rules#

# JPEG with Exif metadata
# The Exif APP1 marker (0xFFE1) contains camera metadata
0 string \xff\xd8\xff JPEG image data
>3 byte 0xe1 \b, Exif format

4. Test Edge Cases#

Consider:

  • Empty files
  • Truncated files
  • Minimum valid file size
  • Maximum offset values

5. Use Appropriate Types#

# Good: Match exact size needed
0 leshort 0x5a4d DOS executable

# Bad: Over-reading
0 lelong x (reads 4 bytes when 2 needed)

6. Handle Endianness Explicitly#

# Good: Explicit endianness
0 lelong 0xcafebabe (little-endian)
0 belong 0xcafebabe (big-endian)

# Risky: Native endianness
0 long 0xcafebabe (platform-dependent)

Supported Features#

Currently Supported#

  • Absolute offsets
  • Relative offsets
  • Indirect offsets (basic)
  • Byte, short, long, quad types (8-bit, 16-bit, 32-bit, 64-bit integers)
  • String types (string, pstring)
  • Date and timestamp types (32-bit and 64-bit Unix timestamps)
  • Comparison operators (=, !, <, >, <=, >=)
  • Bitwise AND operator
  • Nested rules
  • Comments

Not Yet Supported#

  • Regex patterns
  • Float types
  • 128-bit integer types
  • Use/name directives
  • Default rules

Recently Added#

  • Pascal string type: pstring for length-prefixed strings
  • Date/timestamp types: date (32-bit) and qdate (64-bit) Unix timestamp types
  • Comparison operators: Full support for <, >, <=, >= operators
  • Strength modifiers: The !:strength directive for adjusting rule priority
  • 64-bit integers: quad type family (quad, uquad, lequad, ulequad, bequad, ubequad)

Troubleshooting#

Rule Not Matching#

  1. Check offset is correct (0-indexed)
  2. Verify endianness matches file format
  3. Test with hexdump -C file | head
  4. Ensure no conflicting rules

Unexpected Results#

  1. Check rule order (first match wins)
  2. Verify nested rule levels
  3. Test with simpler rules first

Performance Issues#

  1. Avoid unnecessary string searches
  2. Use specific offsets over searches
  3. Order rules by likelihood of match

See Also#