Sanitizer Rule Engine#

The Sanitizer Rule Engine is a core component of opnDossier that provides intelligent redaction of sensitive data from OPNsense configuration files. The engine implements a three-tier mode system (aggressive, moderate, and minimal) to balance privacy requirements with operational utility. At its core, the engine uses a dual-phase detection strategy: field-name pattern matching followed by value-content analysis, ensuring comprehensive coverage of sensitive information including passwords, API keys, IP addresses, MAC addresses, cryptographic material, and personally identifiable information.

The architecture is built around a Rule structure that combines field patterns (substring-matched against XML element names) with optional value detectors (content-based detection using regex patterns and validation functions). This design enables both explicit field-based redaction (e.g., any field named "password") and implicit content-based detection (e.g., detecting public IP addresses regardless of field name). The engine maintains referential integrity through a deterministic Mapper that ensures the same sensitive value is always replaced with the same redacted placeholder throughout a document.

The sanitizer is designed specifically for OPNsense firewall configurations, inspired by TKCERT's pfFocus tool for pfSense. It processes XML configuration files in a stream-based manner, preserving document structure while applying context-aware redaction to elements, attributes, character data, and comments.

Architecture#

Rule Structure#

A Rule is defined as a struct with seven fields:

Name (string): Unique identifier for the rule (e.g., "password", "public_ip")
Description (string): Human-readable explanation of what the rule redacts
Category (RuleCategory): Organizational category - one of CategoryCredentials, CategoryNetwork, CategoryIdentity, CategoryCrypto, or CategorySystem
Modes ([]Mode): Array of sanitization modes that activate this rule
FieldPatterns ([]string): List of field name patterns that trigger redaction via substring matching
ValueDetector (func(value string) bool): Optional function to detect sensitive content by analyzing the actual value
Redactor (func(mapper *Mapper, fieldName, value string) string): Function that performs the actual redaction transformation

Three-Tier Mode System#

The engine defines three sanitization modes:

ModeAggressive ("aggressive"): Redacts all sensitive data for public sharing, including credentials, cryptographic material, network topology (IPs, subnets, endpoints), identifiers (usernames, emails, hostnames, cloud IDs), and public keys. Active rules: all 20 builtin rules
ModeModerate ("moderate"): Redacts most sensitive data but preserves some network structure. Active rules: 10 rules (excludes certificates, private IPs, hostnames, usernames)
ModeMinimal ("minimal"): Redacts only the most sensitive credentials. Active rules: 7 rules (passwords, secrets, PSKs, SNMP communities, private keys, SSH keys, authserver config)

Mode-based rule activation is checked using slices.Contains against each rule's Modes array during evaluation.

Rule Categories#

Rules are organized into five categories:

CategoryCredentials: Passwords, secrets, PSKs, SNMP communities, OTP seeds
CategoryNetwork: IP addresses (public/private), MAC addresses, hostnames, subnets, endpoints
CategoryIdentity: Usernames, emails, domains, cloud provider IDs
CategoryCrypto: Private keys, certificates, public keys
CategorySystem: SSH authorized keys, system configuration

Complete Rule Catalog#

System Rules (All Modes)#

authserver_config: Pseudonymizes sensitive system/authserver LDAP configuration values. Matches patterns ["system.authserver.name", "system.authserver.host", "ldap_port", "ldap_basedn", "ldap_authcn", "ldap_extended_query", "ldap_attr_user", "ldap_binddn", "ldap_bindpw", "ldap_sync_memberof_groups", "ldap_sync_default_groups"]. Redacts via mapper.MapAuthServerValue(field, value) → field-specific pseudonyms (e.g., authserver-001, ldap-001.example.invalid, 55001, BindPw-001-NotReal!). Classification: Credential for ldap_bindpw, non-credential for other LDAP values.

Credential Rules (All Modes)#

The following credential rules are active in all three modes:

password: Matches patterns ["password", "passwd", "pass", "pwd"], redacts as [REDACTED-PASSWORD]
secret: Matches ["secret", "token", "apikey", "api_key", "api-key", "accesskey", "secretkey", "authkey", "auth_key", "otp_seed", "otpseed"], redacts as [REDACTED-SECRET]
psk: Matches ["psk", "preshared", "pre-shared", "ipsecpsk"], redacts as [REDACTED-PSK]
snmp_community: Matches ["community", "rocommunity", "rwcommunity"], redacts as [REDACTED-SNMP-COMMUNITY]

Crypto Rules#

private_key (all modes): Matches ["privatekey", "private_key", "prv", "privkey", "key", "openvpn.tls", "openvpn-server.tls", "openvpn-client.tls", "openvpn.statickeys", "statickeys", "tls_crypt", "tls_auth"], uses IsPrivateKey detector, redacts as [REDACTED-PRIVATE-KEY]. Note: The pattern "key" requires an exact match to avoid false positives on compound names like "sshkey" or "apikey". OpenVPN path-anchored patterns (openvpn.tls, openvpn-server.tls, openvpn-client.tls, openvpn.statickeys) avoid false-positive collisions with non-OpenVPN <tls> elements in contexts like Suricata IDS and IPsec charon syslogs; see GOTCHAS.md §11.3.
certificate (aggressive only): Matches ["cert", "certificate", "crt"], uses IsCertificate detector, redacts as [REDACTED-CERTIFICATE]

System Rules#

ssh_authorized_keys (all modes): Matches ["authorizedkeys", "authorized_keys", "sshkey", "ssh_key"], redacts as [REDACTED-SSH-KEY]

Network Rules#

public_ip (aggressive + moderate): Uses IsPublicIP detector, redacts via mapper.MapPublicIP(value) → [REDACTED-PUBLIC-IP-N]
private_ip_aggressive (aggressive only): Uses composite detector IsPrivateIP && IsIPv4, redacts via mapper.MapPrivateIP(value, false) → 10.0.0.N
mac_address (aggressive + moderate): Matches ["mac"], uses IsMAC detector, redacts via mapper.MapMAC(value) → XX:XX:XX:XX:XX:NN
hostname (aggressive only): Matches ["hostname", "domain", "althostnames", "hostnames"], uses IsHostname detector with email exclusion, redacts via mapper.MapHostname(value) → host-NNN.example.com. Note: When hostname fields contain email addresses, delegates to email mapping.
endpoint (aggressive only): Matches ["endpoint", "tunneladdress"], redacts as [REDACTED-ENDPOINT]
ip_address_field (aggressive only): Matches ["ipaddr", "ipaddrv6", "from", "to"], uses intelligent detection to only redact when value is actually an IP address, redacts public IPs via mapper.MapPublicIP(value) → [REDACTED-PUBLIC-IP-N] and private IPs via mapper.MapPrivateIP(value, false) → 10.0.0.N
subnet_field (aggressive only): Matches ["subnet", "subnetv6"], uses IsSubnet detector to only redact when value is actually a CIDR subnet, redacts as [REDACTED-SUBNET]

Identity Rules#

email (aggressive + moderate): Matches ["email"], uses IsEmail detector, redacts via mapper.MapEmail(value) → userN@example.com
username (aggressive only): Matches ["username", "user", "login", "uid"], excludes system users (root, admin, nobody, daemon, www, www-data, opnsense, unbound, dhcpd, sshd, ntp, proxy), redacts via mapper.MapUsername(value) → user-NNN
cloud_identifier (aggressive only): Matches ["dns_cf_account_id", "dns_cf_zone_id", "account_id", "zone_id"], redacts as [REDACTED-CLOUD-ID]

Crypto Rules (Aggressive Only)#

public_key (aggressive only): Matches ["pubkey", "pub_key"], uses IsBase64 detector, redacts as [REDACTED-PUBLIC-KEY]

Field Pattern Matching#

FieldPatterns use case-insensitive substring matching. The matching algorithm:

Pre-lowercases field patterns at engine construction time using strings.ToLower
Converts field name to lowercase using strings.ToLower
Performs substring search using strings.Contains - pattern can appear anywhere in the field name
Returns true if match found (empty patterns always match)

Examples:

Pattern "password" matches: password, Password, userPassword, passwd, mypassword123
Pattern "token" matches: token, apiToken, auth_token, tokenValue

Value Detection#

ValueDetectors are optional functions that analyze field values for sensitive content. The patterns.go file provides detector implementations using pre-compiled regex patterns:

IP Address Detectors#

IsIPv4: Regex pattern \b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b for dotted decimal notation
IsIPv6: Matches full and compressed IPv6 formats
IsPrivateIP: Checks RFC1918 ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16) and IPv6 unique local (fc00::/7) using net.ParseIP and net.IPNet.Contains
IsPublicIP: Excludes private, loopback (127.0.0.0/8), and link-local (169.254.0.0/16) ranges

Other Detectors#

IsMAC: Matches ([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2}) (colon or hyphen separators)
IsEmail: Standard email pattern with @ and TLD
IsHostname: Requires dot, rejects IPs, matches FQDN pattern ^[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$
IsPrivateKey: Checks for PEM format with "PRIVATE KEY" label or OpenVPN static-key envelope
IsOpenVPNStaticKey: Detects OpenVPN static-key PEM envelope (-----BEGIN OpenVPN Static key V1-----) used for --tls-auth / --tls-crypt HMAC keys
IsCertificate: Checks for PEM with "CERTIFICATE" or base64 data (minimum 40 characters)
IsSubnet: Validates IPv4 or IPv6 CIDR notation using net.ParseCIDR
IsBase64: Detects base64-encoded content for public key detection

Redaction Logic#

ShouldRedactField#

The RuleEngine.ShouldRedactField method determines if a field should be redacted based on its name:

Iterates through all rules in the engine
Checks if rule is active for current mode
For active rules, checks each FieldPattern against field name
Returns (true, *Rule) on first match, (false, nil) otherwise

ShouldRedactValue#

The RuleEngine.ShouldRedactValue method implements a two-phase detection strategy:

Phase 1: Field-based check

Calls ShouldRedactField(fieldName) first
If field name matches a pattern, returns (true, *Rule) immediately

Phase 2: Value-based check

If Phase 1 fails, iterates through rules
For each active rule with a ValueDetector:
- Executes detector function on value
- Returns (true, *Rule) on first positive detection

This ordering means field-name matching has priority over value detection.

Redact#

The engine.Redact method applies the actual transformation:

Receives the matched rule, field name, and value
Calls the rule's custom Redactor function
Passes the engine's Mapper for deterministic value replacement

Deterministic Mapping#

The Mapper component ensures consistent, deterministic redaction throughout a document. Key principle: same input → same output.

Mapping Formats#

Each data type has a specific mapping format:

Public IPs: [REDACTED-PUBLIC-IP-1], [REDACTED-PUBLIC-IP-2], ...
Private IPs (no structure): 10.0.0.1, 10.0.0.2, 10.0.0.3, ...
Hostnames: host-001.example.com, host-002.example.com, ...
Usernames: user-001, user-002, ...
MACs: XX:XX:XX:XX:XX:01, XX:XX:XX:XX:XX:02, ...
Emails: user1@example.com, user2@example.com, ...
AuthServer fields:
- authserver.name: authserver-001, authserver-002, ...
- authserver.host: ldap-001.example.invalid, ldap-002.example.invalid, ...
- authserver.ldap_port: 55001, 55002, ...
- authserver.ldap_bindpw: BindPw-001-NotReal!, BindPw-002-NotReal!, ...
- Other authserver LDAP fields (ldap_basedn, ldap_binddn, ldap_authcn, ldap_extended_query, ldap_attr_user, ldap_sync_memberof_groups, ldap_sync_default_groups): field-specific sequential pseudonyms

Consistency Guarantee#

The mapper maintains hash tables for each data type. When the same value appears multiple times:

mapper.MapPublicIP("8.8.8.8") → "[REDACTED-PUBLIC-IP-1]"
mapper.MapPublicIP("8.8.8.8") → "[REDACTED-PUBLIC-IP-1]" // Same!
mapper.MapPublicIP("1.1.1.1") → "[REDACTED-PUBLIC-IP-2]" // Different

Private IP Structure Preservation#

Private IPs support two modes:

Without structure (preserveStructure=false):
- 192.168.1.100 → 10.0.0.1
- 172.16.5.20 → 10.0.0.2
With structure (preserveStructure=true):
- 192.168.1.100 → 192.168.X.1 (preserves first two octets)
- 172.16.5.20 → 172.16.X.2

Thread Safety#

The Mapper uses sync.RWMutex for thread-safe concurrent access to mapping tables.

Sanitizer Application Flow#

The Sanitizer struct orchestrates redaction:

sanitizeValue calls engine.ShouldRedactValue
If redaction needed, calls engine.Redact
Tracks statistics (TotalFields, RedactedFields, SkippedFields, RedactionsByType)

XML Processing#

For XML sanitization, processes tokens stream-wise:

Builds full element paths for context
Checks both paths and individual names
Sanitizes element content, attributes, and comments
Preserves XML structure

Struct Processing#

For struct sanitization, uses reflection:

Traverses fields recursively
Handles nested structs and slices
Applies rules to string fields

Critical Implementation Gotchas#

1. Global Field Pattern Scanning and Exact Matching#

Problem: ShouldRedactField scans ALL rules' FieldPatterns globally. Adding a FieldPattern to any rule can break "should not match" test assertions for other rules. Additionally, the pattern "key" uses exact matching (case-insensitive) to prevent false positives on compound names like "sshkey", "apikey", "authkey".

Impact: When adding new field patterns, review all existing test cases that assert ShouldRedactField returns false. Be aware that most patterns use substring matching, but "key" requires an exact match.

2. Field-Name Priority in ShouldRedactValue#

Problem: ShouldRedactValue checks field-name rules first (via ShouldRedactField), then value-detector rules. Field-name path fires before value detectors.

Impact: If a field matches by name, its value detector is never consulted. Rules with both FieldPatterns and ValueDetector: field match triggers redaction immediately via FieldPatterns; ValueDetector is only for the value-only matching path.

3. Rule Ordering Contract (See GOTCHAS.md §19.1)#

Problem: ShouldRedactField returns on the first matching rule. Rule order determines precedence.

Critical Contracts:

authserver_config MUST precede password: Both match "ldap_bindpw" (authserver_config via exact field pattern, password via "pass" substring). authserver_config pseudonymizes; password flat-redacts to [REDACTED-PASSWORD]. If reordered, LDAP bind passwords silently switch from pseudonymized to flat-redacted with no error or warning.
email MUST precede hostname: Email addresses contain dots that match hostname patterns. The ordering ensures emails are mapped via MapEmail, not MapHostname.

Impact: Reordering rules in builtinRules() can silently change redaction behavior. See GOTCHAS.md §19.1 for full documentation.

4. System User Exemption#

System users are explicitly exempted in the username rule:

Exempted: root, admin, nobody, daemon, www, www-data, opnsense, unbound, dhcpd, sshd, ntp, proxy
These are never redacted even in aggressive mode

5. IPv6 Detection Limitations#

IPv6 detection has known edge cases:

Loopback ::1 is NOT detected
IPv4-mapped ::ffff:192.168.1.1 is NOT detected
All-zeros :: is NOT detected
Compressed forms like 2001:db8::1 ARE detected

6. Statistics Overcounting Prevention#

Problem: Guarded Redactors may return the original value unchanged (e.g., IsIP check to prevent false positives on "any", "lan").

Solution: Always compare before/after Redact() output and only increment RedactedFields if the value actually changed.

7. Deterministic Mapping Behavior#

Fresh NewRuleEngine creates a fresh NewMapper():

Mappings are deterministic within a session
First private IP → 10.0.0.1, second → 10.0.0.2
First hostname → host-001.example.com, second → host-002.example.com
Tests should assert exact expected values

Usage Examples#

Basic Sanitization#

import "github.com/EvilBit-Labs/opnDossier/internal/sanitizer"

// Create sanitizer with aggressive mode
s := sanitizer.New(sanitizer.ModeAggressive)

// Sanitize XML configuration
input := []byte(`<config><password>secret123</password></config>`)
output, stats, err := s.SanitizeXML(input)
// Output: <config><password>[REDACTED-PASSWORD]</password></config>

// Check statistics
fmt.Printf("Redacted %d of %d fields\n", stats.RedactedFields, stats.TotalFields)

Mode Comparison#

// Minimal mode - only credentials
minimal := sanitizer.New(sanitizer.ModeMinimal)
minOutput, _, _ := minimal.SanitizeXML(xmlData)

// Moderate mode - credentials + public network data
moderate := sanitizer.New(sanitizer.ModeModerate)
modOutput, _, _ := moderate.SanitizeXML(xmlData)

// Aggressive mode - everything
aggressive := sanitizer.New(sanitizer.ModeAggressive)
aggOutput, _, _ := aggressive.SanitizeXML(xmlData)

Custom Rule Engine#

// Create engine with custom mode
engine := sanitizer.NewRuleEngine(sanitizer.ModeModerate)

// Check if field should be redacted
shouldRedact, rule := engine.ShouldRedactField("password")
if shouldRedact {
    fmt.Printf("Field matched rule: %s\n", rule.Name)
}

// Check if value should be redacted
shouldRedact, rule = engine.ShouldRedactValue("srcip", "8.8.8.8")
if shouldRedact {
    redacted := engine.Redact(rule, "srcip", "8.8.8.8")
    fmt.Printf("Redacted to: %s\n", redacted)
}

Struct Sanitization#

type FirewallConfig struct {
    AdminPassword string
    AdminEmail string
    PublicIP string
}

config := FirewallConfig{
    AdminPassword: "secret123",
    AdminEmail: "admin@company.com",
    PublicIP: "203.0.113.1",
}

s := sanitizer.New(sanitizer.ModeAggressive)
sanitized, err := s.SanitizeStruct(config)
// Result:
// AdminPassword: "[REDACTED-PASSWORD]"
// AdminEmail: "user1@example.com"
// PublicIP: "[REDACTED-PUBLIC-IP-1]"

Relevant Code Files#

File	Purpose	Key Components
internal/sanitizer/rules.go	Rule definitions and engine	Rule struct, RuleEngine, ShouldRedactField, ShouldRedactValue, builtinRules
internal/sanitizer/sanitizer.go	Main sanitization logic	Sanitizer struct, SanitizeXML, SanitizeStruct, sanitizeValue
internal/sanitizer/patterns.go	Value detection patterns	IsIPv4, IsIPv6, IsPrivateIP, IsPublicIP, IsMAC, IsEmail, IsHostname, IsPrivateKey, IsCertificate
internal/sanitizer/mapper.go	Deterministic value replacement	Mapper struct, MapPublicIP, MapPrivateIP, MapHostname, MapUsername, MapEmail, MapMAC, MapAuthServerValue
internal/sanitizer/rules_test.go	Rule engine tests	Field matching tests, value detection tests, mode behavior tests
internal/sanitizer/patterns_test.go	Pattern detection tests	IP detection tests, MAC tests, email tests, hostname tests, crypto material tests
internal/sanitizer/mapper_test.go	Mapper consistency tests	Deterministic mapping tests, structure preservation tests
internal/sanitizer/sanitizer_test.go	Integration tests	XML sanitization tests, struct sanitization tests, mode comparison tests

XML Processing: The sanitizer uses Go's encoding/xml package for stream-based token processing
Regular Expressions: Pattern detection relies on pre-compiled regex patterns for performance
Reflection: Struct sanitization uses reflect package to traverse and modify struct fields
Thread Safety: Mapper uses sync.RWMutex for concurrent-safe access
OPNsense Configuration: The sanitizer is designed specifically for OPNsense XML configuration format
Privacy Engineering: Implements differential privacy concepts through tiered redaction modes