Sanitizer Rule Engine#
The Sanitizer Rule Engine is a core component of opnDossier that provides intelligent redaction of sensitive data from OPNsense configuration files. The engine implements a three-tier mode system (aggressive, moderate, and minimal) to balance privacy requirements with operational utility. At its core, the engine uses a dual-phase detection strategy: field-name pattern matching followed by value-content analysis, ensuring comprehensive coverage of sensitive information including passwords, API keys, IP addresses, MAC addresses, cryptographic material, and personally identifiable information.
The architecture is built around a Rule structure that combines field patterns (substring-matched against XML element names) with optional value detectors (content-based detection using regex patterns and validation functions). This design enables both explicit field-based redaction (e.g., any field named "password") and implicit content-based detection (e.g., detecting public IP addresses regardless of field name). The engine maintains referential integrity through a deterministic Mapper that ensures the same sensitive value is always replaced with the same redacted placeholder throughout a document.
The sanitizer is designed specifically for OPNsense firewall configurations, inspired by TKCERT's pfFocus tool for pfSense. It processes XML configuration files in a stream-based manner, preserving document structure while applying context-aware redaction to elements, attributes, character data, and comments.
Architecture#
Rule Structure#
A Rule is defined as a struct with seven fields:
- Name (string): Unique identifier for the rule (e.g., "password", "public_ip")
- Description (string): Human-readable explanation of what the rule redacts
- Category (RuleCategory): Organizational category - one of
CategoryCredentials,CategoryNetwork,CategoryIdentity,CategoryCrypto, orCategorySystem - Modes ([]Mode): Array of sanitization modes that activate this rule
- FieldPatterns ([]string): List of field name patterns that trigger redaction via substring matching
- ValueDetector (func(value string) bool): Optional function to detect sensitive content by analyzing the actual value
- Redactor (func(mapper *Mapper, fieldName, value string) string): Function that performs the actual redaction transformation
Three-Tier Mode System#
The engine defines three sanitization modes:
- ModeAggressive ("aggressive"): Redacts all sensitive data for public sharing, including credentials, cryptographic material, network topology (IPs, subnets, endpoints), identifiers (usernames, emails, hostnames, cloud IDs), and public keys. Active rules: all 20 builtin rules
- ModeModerate ("moderate"): Redacts most sensitive data but preserves some network structure. Active rules: 10 rules (excludes certificates, private IPs, hostnames, usernames)
- ModeMinimal ("minimal"): Redacts only the most sensitive credentials. Active rules: 7 rules (passwords, secrets, PSKs, SNMP communities, private keys, SSH keys, authserver config)
Mode-based rule activation is checked using slices.Contains against each rule's Modes array during evaluation.
Rule Categories#
Rules are organized into five categories:
- CategoryCredentials: Passwords, secrets, PSKs, SNMP communities, OTP seeds
- CategoryNetwork: IP addresses (public/private), MAC addresses, hostnames, subnets, endpoints
- CategoryIdentity: Usernames, emails, domains, cloud provider IDs
- CategoryCrypto: Private keys, certificates, public keys
- CategorySystem: SSH authorized keys, system configuration
Complete Rule Catalog#
System Rules (All Modes)#
- authserver_config: Pseudonymizes sensitive
system/authserverLDAP configuration values. Matches patterns ["system.authserver.name", "system.authserver.host", "ldap_port", "ldap_basedn", "ldap_authcn", "ldap_extended_query", "ldap_attr_user", "ldap_binddn", "ldap_bindpw", "ldap_sync_memberof_groups", "ldap_sync_default_groups"]. Redacts viamapper.MapAuthServerValue(field, value)→ field-specific pseudonyms (e.g.,authserver-001,ldap-001.example.invalid,55001,BindPw-001-NotReal!). Classification: Credential forldap_bindpw, non-credential for other LDAP values.
Credential Rules (All Modes)#
The following credential rules are active in all three modes:
- password: Matches patterns ["password", "passwd", "pass", "pwd"], redacts as
[REDACTED-PASSWORD] - secret: Matches ["secret", "token", "apikey", "api_key", "api-key", "accesskey", "secretkey", "authkey", "auth_key", "otp_seed", "otpseed"], redacts as
[REDACTED-SECRET] - psk: Matches ["psk", "preshared", "pre-shared", "ipsecpsk"], redacts as
[REDACTED-PSK] - snmp_community: Matches ["community", "rocommunity", "rwcommunity"], redacts as
[REDACTED-SNMP-COMMUNITY]
Crypto Rules#
- private_key (all modes): Matches ["privatekey", "private_key", "prv", "privkey", "key", "openvpn.tls", "openvpn-server.tls", "openvpn-client.tls", "openvpn.statickeys", "statickeys", "tls_crypt", "tls_auth"], uses IsPrivateKey detector, redacts as
[REDACTED-PRIVATE-KEY]. Note: The pattern "key" requires an exact match to avoid false positives on compound names like "sshkey" or "apikey". OpenVPN path-anchored patterns (openvpn.tls, openvpn-server.tls, openvpn-client.tls, openvpn.statickeys) avoid false-positive collisions with non-OpenVPN<tls>elements in contexts like Suricata IDS and IPsec charon syslogs; see GOTCHAS.md §11.3. - certificate (aggressive only): Matches ["cert", "certificate", "crt"], uses IsCertificate detector, redacts as
[REDACTED-CERTIFICATE]
System Rules#
- ssh_authorized_keys (all modes): Matches ["authorizedkeys", "authorized_keys", "sshkey", "ssh_key"], redacts as
[REDACTED-SSH-KEY]
Network Rules#
- public_ip (aggressive + moderate): Uses IsPublicIP detector, redacts via
mapper.MapPublicIP(value)→[REDACTED-PUBLIC-IP-N] - private_ip_aggressive (aggressive only): Uses composite detector IsPrivateIP && IsIPv4, redacts via
mapper.MapPrivateIP(value, false)→10.0.0.N - mac_address (aggressive + moderate): Matches ["mac"], uses IsMAC detector, redacts via
mapper.MapMAC(value)→XX:XX:XX:XX:XX:NN - hostname (aggressive only): Matches ["hostname", "domain", "althostnames", "hostnames"], uses IsHostname detector with email exclusion, redacts via
mapper.MapHostname(value)→host-NNN.example.com. Note: When hostname fields contain email addresses, delegates to email mapping. - endpoint (aggressive only): Matches ["endpoint", "tunneladdress"], redacts as
[REDACTED-ENDPOINT] - ip_address_field (aggressive only): Matches ["ipaddr", "ipaddrv6", "from", "to"], uses intelligent detection to only redact when value is actually an IP address, redacts public IPs via
mapper.MapPublicIP(value)→[REDACTED-PUBLIC-IP-N]and private IPs viamapper.MapPrivateIP(value, false)→10.0.0.N - subnet_field (aggressive only): Matches ["subnet", "subnetv6"], uses IsSubnet detector to only redact when value is actually a CIDR subnet, redacts as
[REDACTED-SUBNET]
Identity Rules#
- email (aggressive + moderate): Matches ["email"], uses IsEmail detector, redacts via
mapper.MapEmail(value)→userN@example.com - username (aggressive only): Matches ["username", "user", "login", "uid"], excludes system users (root, admin, nobody, daemon, www, www-data, opnsense, unbound, dhcpd, sshd, ntp, proxy), redacts via
mapper.MapUsername(value)→user-NNN - cloud_identifier (aggressive only): Matches ["dns_cf_account_id", "dns_cf_zone_id", "account_id", "zone_id"], redacts as
[REDACTED-CLOUD-ID]
Crypto Rules (Aggressive Only)#
- public_key (aggressive only): Matches ["pubkey", "pub_key"], uses IsBase64 detector, redacts as
[REDACTED-PUBLIC-KEY]
Field Pattern Matching#
FieldPatterns use case-insensitive substring matching. The matching algorithm:
- Pre-lowercases field patterns at engine construction time using
strings.ToLower - Converts field name to lowercase using
strings.ToLower - Performs substring search using
strings.Contains- pattern can appear anywhere in the field name - Returns true if match found (empty patterns always match)
Examples:
- Pattern "password" matches:
password,Password,userPassword,passwd,mypassword123 - Pattern "token" matches:
token,apiToken,auth_token,tokenValue
Value Detection#
ValueDetectors are optional functions that analyze field values for sensitive content. The patterns.go file provides detector implementations using pre-compiled regex patterns:
IP Address Detectors#
- IsIPv4: Regex pattern
\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\bfor dotted decimal notation - IsIPv6: Matches full and compressed IPv6 formats
- IsPrivateIP: Checks RFC1918 ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16) and IPv6 unique local (fc00::/7) using net.ParseIP and net.IPNet.Contains
- IsPublicIP: Excludes private, loopback (127.0.0.0/8), and link-local (169.254.0.0/16) ranges
Other Detectors#
- IsMAC: Matches
([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2})(colon or hyphen separators) - IsEmail: Standard email pattern with @ and TLD
- IsHostname: Requires dot, rejects IPs, matches FQDN pattern
^[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$ - IsPrivateKey: Checks for PEM format with "PRIVATE KEY" label or OpenVPN static-key envelope
- IsOpenVPNStaticKey: Detects OpenVPN static-key PEM envelope (
-----BEGIN OpenVPN Static key V1-----) used for --tls-auth / --tls-crypt HMAC keys - IsCertificate: Checks for PEM with "CERTIFICATE" or base64 data (minimum 40 characters)
- IsSubnet: Validates IPv4 or IPv6 CIDR notation using
net.ParseCIDR - IsBase64: Detects base64-encoded content for public key detection
Redaction Logic#
ShouldRedactField#
The RuleEngine.ShouldRedactField method determines if a field should be redacted based on its name:
- Iterates through all rules in the engine
- Checks if rule is active for current mode
- For active rules, checks each FieldPattern against field name
- Returns
(true, *Rule)on first match,(false, nil)otherwise
ShouldRedactValue#
The RuleEngine.ShouldRedactValue method implements a two-phase detection strategy:
Phase 1: Field-based check
- Calls
ShouldRedactField(fieldName)first - If field name matches a pattern, returns
(true, *Rule)immediately
Phase 2: Value-based check
- If Phase 1 fails, iterates through rules
- For each active rule with a ValueDetector:
- Executes detector function on value
- Returns
(true, *Rule)on first positive detection
This ordering means field-name matching has priority over value detection.
Redact#
The engine.Redact method applies the actual transformation:
- Receives the matched rule, field name, and value
- Calls the rule's custom Redactor function
- Passes the engine's Mapper for deterministic value replacement
Deterministic Mapping#
The Mapper component ensures consistent, deterministic redaction throughout a document. Key principle: same input → same output.
Mapping Formats#
Each data type has a specific mapping format:
- Public IPs:
[REDACTED-PUBLIC-IP-1],[REDACTED-PUBLIC-IP-2], ... - Private IPs (no structure):
10.0.0.1,10.0.0.2,10.0.0.3, ... - Hostnames:
host-001.example.com,host-002.example.com, ... - Usernames:
user-001,user-002, ... - MACs:
XX:XX:XX:XX:XX:01,XX:XX:XX:XX:XX:02, ... - Emails:
user1@example.com,user2@example.com, ... - AuthServer fields:
authserver.name:authserver-001,authserver-002, ...authserver.host:ldap-001.example.invalid,ldap-002.example.invalid, ...authserver.ldap_port:55001,55002, ...authserver.ldap_bindpw:BindPw-001-NotReal!,BindPw-002-NotReal!, ...- Other authserver LDAP fields (ldap_basedn, ldap_binddn, ldap_authcn, ldap_extended_query, ldap_attr_user, ldap_sync_memberof_groups, ldap_sync_default_groups): field-specific sequential pseudonyms
Consistency Guarantee#
The mapper maintains hash tables for each data type. When the same value appears multiple times:
mapper.MapPublicIP("8.8.8.8") → "[REDACTED-PUBLIC-IP-1]"
mapper.MapPublicIP("8.8.8.8") → "[REDACTED-PUBLIC-IP-1]" // Same!
mapper.MapPublicIP("1.1.1.1") → "[REDACTED-PUBLIC-IP-2]" // Different
Private IP Structure Preservation#
Private IPs support two modes:
-
Without structure (
preserveStructure=false):192.168.1.100→10.0.0.1172.16.5.20→10.0.0.2
-
With structure (
preserveStructure=true):192.168.1.100→192.168.X.1(preserves first two octets)172.16.5.20→172.16.X.2
Thread Safety#
The Mapper uses sync.RWMutex for thread-safe concurrent access to mapping tables.
Sanitizer Application Flow#
The Sanitizer struct orchestrates redaction:
- sanitizeValue calls engine.ShouldRedactValue
- If redaction needed, calls engine.Redact
- Tracks statistics (TotalFields, RedactedFields, SkippedFields, RedactionsByType)
XML Processing#
For XML sanitization, processes tokens stream-wise:
- Builds full element paths for context
- Checks both paths and individual names
- Sanitizes element content, attributes, and comments
- Preserves XML structure
Struct Processing#
For struct sanitization, uses reflection:
- Traverses fields recursively
- Handles nested structs and slices
- Applies rules to string fields
Critical Implementation Gotchas#
1. Global Field Pattern Scanning and Exact Matching#
Problem: ShouldRedactField scans ALL rules' FieldPatterns globally. Adding a FieldPattern to any rule can break "should not match" test assertions for other rules. Additionally, the pattern "key" uses exact matching (case-insensitive) to prevent false positives on compound names like "sshkey", "apikey", "authkey".
Impact: When adding new field patterns, review all existing test cases that assert ShouldRedactField returns false. Be aware that most patterns use substring matching, but "key" requires an exact match.
2. Field-Name Priority in ShouldRedactValue#
Problem: ShouldRedactValue checks field-name rules first (via ShouldRedactField), then value-detector rules. Field-name path fires before value detectors.
Impact: If a field matches by name, its value detector is never consulted. Rules with both FieldPatterns and ValueDetector: field match triggers redaction immediately via FieldPatterns; ValueDetector is only for the value-only matching path.
3. Rule Ordering Contract (See GOTCHAS.md §19.1)#
Problem: ShouldRedactField returns on the first matching rule. Rule order determines precedence.
Critical Contracts:
authserver_configMUST precedepassword: Both match "ldap_bindpw" (authserver_config via exact field pattern, password via "pass" substring). authserver_config pseudonymizes; password flat-redacts to[REDACTED-PASSWORD]. If reordered, LDAP bind passwords silently switch from pseudonymized to flat-redacted with no error or warning.emailMUST precedehostname: Email addresses contain dots that match hostname patterns. The ordering ensures emails are mapped via MapEmail, not MapHostname.
Impact: Reordering rules in builtinRules() can silently change redaction behavior. See GOTCHAS.md §19.1 for full documentation.
4. System User Exemption#
System users are explicitly exempted in the username rule:
- Exempted: root, admin, nobody, daemon, www, www-data, opnsense, unbound, dhcpd, sshd, ntp, proxy
- These are never redacted even in aggressive mode
5. IPv6 Detection Limitations#
IPv6 detection has known edge cases:
- Loopback
::1is NOT detected - IPv4-mapped
::ffff:192.168.1.1is NOT detected - All-zeros
::is NOT detected - Compressed forms like
2001:db8::1ARE detected
6. Statistics Overcounting Prevention#
Problem: Guarded Redactors may return the original value unchanged (e.g., IsIP check to prevent false positives on "any", "lan").
Solution: Always compare before/after Redact() output and only increment RedactedFields if the value actually changed.
7. Deterministic Mapping Behavior#
Fresh NewRuleEngine creates a fresh NewMapper():
- Mappings are deterministic within a session
- First private IP →
10.0.0.1, second →10.0.0.2 - First hostname →
host-001.example.com, second →host-002.example.com - Tests should assert exact expected values
Usage Examples#
Basic Sanitization#
import "github.com/EvilBit-Labs/opnDossier/internal/sanitizer"
// Create sanitizer with aggressive mode
s := sanitizer.New(sanitizer.ModeAggressive)
// Sanitize XML configuration
input := []byte(`<config><password>secret123</password></config>`)
output, stats, err := s.SanitizeXML(input)
// Output: <config><password>[REDACTED-PASSWORD]</password></config>
// Check statistics
fmt.Printf("Redacted %d of %d fields\n", stats.RedactedFields, stats.TotalFields)
Mode Comparison#
// Minimal mode - only credentials
minimal := sanitizer.New(sanitizer.ModeMinimal)
minOutput, _, _ := minimal.SanitizeXML(xmlData)
// Moderate mode - credentials + public network data
moderate := sanitizer.New(sanitizer.ModeModerate)
modOutput, _, _ := moderate.SanitizeXML(xmlData)
// Aggressive mode - everything
aggressive := sanitizer.New(sanitizer.ModeAggressive)
aggOutput, _, _ := aggressive.SanitizeXML(xmlData)
Custom Rule Engine#
// Create engine with custom mode
engine := sanitizer.NewRuleEngine(sanitizer.ModeModerate)
// Check if field should be redacted
shouldRedact, rule := engine.ShouldRedactField("password")
if shouldRedact {
fmt.Printf("Field matched rule: %s\n", rule.Name)
}
// Check if value should be redacted
shouldRedact, rule = engine.ShouldRedactValue("srcip", "8.8.8.8")
if shouldRedact {
redacted := engine.Redact(rule, "srcip", "8.8.8.8")
fmt.Printf("Redacted to: %s\n", redacted)
}
Struct Sanitization#
type FirewallConfig struct {
AdminPassword string
AdminEmail string
PublicIP string
}
config := FirewallConfig{
AdminPassword: "secret123",
AdminEmail: "admin@company.com",
PublicIP: "203.0.113.1",
}
s := sanitizer.New(sanitizer.ModeAggressive)
sanitized, err := s.SanitizeStruct(config)
// Result:
// AdminPassword: "[REDACTED-PASSWORD]"
// AdminEmail: "user1@example.com"
// PublicIP: "[REDACTED-PUBLIC-IP-1]"
Relevant Code Files#
| File | Purpose | Key Components |
|---|---|---|
| internal/sanitizer/rules.go | Rule definitions and engine | Rule struct, RuleEngine, ShouldRedactField, ShouldRedactValue, builtinRules |
| internal/sanitizer/sanitizer.go | Main sanitization logic | Sanitizer struct, SanitizeXML, SanitizeStruct, sanitizeValue |
| internal/sanitizer/patterns.go | Value detection patterns | IsIPv4, IsIPv6, IsPrivateIP, IsPublicIP, IsMAC, IsEmail, IsHostname, IsPrivateKey, IsCertificate |
| internal/sanitizer/mapper.go | Deterministic value replacement | Mapper struct, MapPublicIP, MapPrivateIP, MapHostname, MapUsername, MapEmail, MapMAC, MapAuthServerValue |
| internal/sanitizer/rules_test.go | Rule engine tests | Field matching tests, value detection tests, mode behavior tests |
| internal/sanitizer/patterns_test.go | Pattern detection tests | IP detection tests, MAC tests, email tests, hostname tests, crypto material tests |
| internal/sanitizer/mapper_test.go | Mapper consistency tests | Deterministic mapping tests, structure preservation tests |
| internal/sanitizer/sanitizer_test.go | Integration tests | XML sanitization tests, struct sanitization tests, mode comparison tests |
Related Topics#
- XML Processing: The sanitizer uses Go's encoding/xml package for stream-based token processing
- Regular Expressions: Pattern detection relies on pre-compiled regex patterns for performance
- Reflection: Struct sanitization uses reflect package to traverse and modify struct fields
- Thread Safety: Mapper uses sync.RWMutex for concurrent-safe access
- OPNsense Configuration: The sanitizer is designed specifically for OPNsense XML configuration format
- Privacy Engineering: Implements differential privacy concepts through tiered redaction modes