Sanitizer Field Pattern Maintenance#

Sanitizer Field Pattern Maintenance is a development maintenance pattern for the opnDossier project that ensures credential detection rules remain synchronized across two critical code files when adding multi-device support. The pattern addresses the architectural requirement that the sanitizer operates on raw XML element names via pattern matching, not on CommonDevice field names, making device-specific credential fields vulnerable to being silently missed unless explicitly cataloged in both pattern matching systems.

The sanitizer uses a dual-phase detection strategy: field-name pattern matching via FieldPatterns arrays in rule definitions (Phase 1), followed by value content analysis using detector functions (Phase 2). This architecture requires maintaining pattern lists in two separate locations—internal/sanitizer/rules.go and internal/sanitizer/patterns.go—to ensure comprehensive credential detection across OPNsense, pfSense, and future device types. Failure to update both files when adding new patterns can result in credential leakage when sanitizing configuration files for public sharing or support diagnostics.

The Two-File Update Requirement#

Core Maintenance Rule#

When adding credential field patterns for new device types or newly discovered credential fields, both files must be updated simultaneously:

internal/sanitizer/rules.go — Update the relevant rule's FieldPatterns array (used by ShouldRedactField)
internal/sanitizer/patterns.go — Update corresponding keyword slices like passwordKeywords (used for related detection functions)

Rationale: The sanitizer's field-name matching has priority over value detection. If a credential field's XML element name doesn't substring-match any existing pattern, it will only be caught if the value content triggers a ValueDetector—but most credential rules (password, secret, psk, snmp_community) rely solely on field-name patterns without value detectors.

Pattern Matching Mechanics#

Field-Name Matching Algorithm#

The fieldNameMatches function implements the pattern matching logic:

Default behavior: Case-insensitive substring matching via containsIgnoreCase
Exact match exception: Patterns in exactMatchPatterns (["key", "from", "to"]) require exact case-insensitive matches to prevent false positives on compound field names like sshkey, apikey, authkey
ASCII-only case folding: Uses custom toLower function instead of strings.ToLower for performance

Examples:

Pattern "password" matches: password, Password, userPassword, mypassword123
Pattern "bcrypt" would match: bcrypt-hash, bcrypt_hash, mybcrypt, bcryptPassword
Pattern "key" (exact match) matches only: key, Key, KEY (not sshkey or apikey)

Detection Execution Flow#

ShouldRedactValue coordinates the two-phase detection:

Phase 1: ShouldRedactField(fieldName)
  ├─ Iterate all active rules
  ├─ For each rule's FieldPatterns
  │ └─ Call fieldNameMatches(fieldName, pattern)
  └─ Return true + Rule on first match

Phase 2: Value Detection (only if Phase 1 fails)
  ├─ Iterate all active rules with ValueDetector != nil
  ├─ Call ValueDetector(value)
  └─ Return true + Rule on first match

Critical behavior: If a field name matches any FieldPattern, the ValueDetector is never consulted for that field. This makes field-name patterns the primary detection mechanism.

Credential Rules Catalog#

Current FieldPatterns in rules.go#

Rule Name	FieldPatterns	Active Modes	Redacted Value
`password`	`["password", "passwd", "pass", "pwd"]`	All	`[REDACTED-PASSWORD]`
`secret`	`["secret", "token", "apikey", "api_key", "api-key", "accesskey", "secretkey", "authkey", "auth_key", "otp_seed", "otpseed"]`	All	`[REDACTED-SECRET]`
`psk`	`["psk", "preshared", "pre-shared", "ipsecpsk"]`	All	`[REDACTED-PSK]`
`snmp_community`	`["community", "rocommunity", "rwcommunity"]`	All	`[REDACTED-SNMP-COMMUNITY]`
`private_key`	`["privatekey", "private_key", "prv", "privkey", "key", "openvpn.tls", "openvpn-server.tls", "openvpn-client.tls", "openvpn.statickeys", "statickeys", "tls_crypt", "tls_auth"]`	All	`[REDACTED-PRIVATE-KEY]`
`ssh_authorized_keys`	`["authorizedkeys", "authorized_keys", "sshkey", "ssh_key"]`	All	`[REDACTED-SSH-KEY]`

Current passwordKeywords in patterns.go#

The passwordKeywords slice contains 14 keywords used by the LooksLikePassword function:

var passwordKeywords = []string{
    "password", "passwd", "pass", "secret", "key",
    "token", "credential", "auth", "prv", "private", "bindpw",
    "bcrypt-hash", "sha512-hash",
    "statickeys", "tls_crypt", "tls_auth",
}

The "bindpw" keyword was added to properly classify ldap_bindpw fields in authserver configurations as credentials, ensuring they are treated with appropriate security in the sanitizer's credential detection logic.

The "statickeys", "tls_crypt", and "tls_auth" keywords correspond to OpenVPN HMAC key material documented in GOTCHAS.md §11.3.

Note: While LooksLikePassword exists and could be used as a ValueDetector, the current rule engine does not directly use it. The actual credential detection relies on the FieldPatterns in rules.go.

Device-Specific Credential Field Catalogs#

OpenVPN Pattern Selection Strategy#

OpenVPN's <tls> and <StaticKeys> elements hold --tls-auth / --tls-crypt HMAC key material that must be redacted. The sanitizer uses path-anchored patterns (openvpn.tls, openvpn-server.tls, openvpn-client.tls, openvpn.statickeys, statickeys) to avoid false-positive collisions with unrelated <tls> elements in the schema:

Suricata IDS: opnsense.OPNsense.IDS.general.eveLog.tls.* wraps boolean enable/extended/sessionResumption configuration (not secrets)
IPsec strongSwan: opnsense.OPNsense.IPsec.charon.syslog.daemon.tls carries a log-level enum 0–5 (not a secret)

The substring tls alone would false-positive on these non-credential paths. Path anchoring (full lowercased element path includes parent container) ensures only OpenVPN HMAC keys are redacted. The unambiguous aliases tls_crypt and tls_auth are safe as bare patterns because they never appear in non-OpenVPN contexts.

This strategy is documented in GOTCHAS.md §11.3 and verified by TestSanitizeXML_OpenVPN_TLS_NoFalsePositives in internal/sanitizer/sanitizer_test.go.

ValueDetector: IsOpenVPNStaticKey#

The private_key rule's ValueDetector now includes IsOpenVPNStaticKey, which recognizes the PEM envelope format used by OpenVPN static keys:

-----BEGIN OpenVPN Static key V1-----
<hexadecimal key material>
-----END OpenVPN Static key V1-----

This detector complements the path-anchored FieldPatterns. The standard PEM detector (IsPrivateKey) looks for PRIVATE KEY labels and would miss OpenVPN's custom label. By chaining IsOpenVPNStaticKey into the IsPrivateKey function, the sanitizer catches OpenVPN HMAC keys through both the field-name path (Phase 1) and the value-content path (Phase 2).

See TestIsOpenVPNStaticKey and TestIsPrivateKey_OpenVPNStaticKey in internal/sanitizer/patterns_test.go for detector coverage.

OPNsense XML Element Names#

OPNsense uses <opnsense> as root element. Common credential fields:

User passwords: <passwd> — matched by "passwd" pattern
SNMP: <community>, <rocommunity>, <rwcommunity> — matched by community patterns
VPN: <psk>, <ipsecpsk>, <preshared> — matched by psk patterns
Certificates: <prv> (private keys) — matched by "prv" pattern
SSH: <authorizedkeys> — matched by ssh_authorized_keys patterns
OTP: <otp_seed> — matched by "otp_seed" in secret rule
OpenVPN HMAC: <tls> (under <openvpn-server> / <openvpn-client>) — matched by path-anchored "openvpn-server.tls" / "openvpn-client.tls" patterns
OpenVPN MVC: <StaticKeys> (under <OpenVPN>) — matched by "openvpn.statickeys" and "statickeys" patterns

pfSense XML Element Differences#

pfSense uses <pfsense> root element with these credential field differences:

User passwords: <bcrypt-hash>, <sha512-hash> — NOT matched by any current pattern
RADIUS: <radius_secret> — matched by "secret" pattern
Auth: <auth_pass> — matched by "pass" pattern
Certificates/keys: Same <prv> as OPNsense — matched by "prv" pattern

⚠️ Critical Gap Identified: pfSense's <bcrypt-hash> and <sha512-hash> elements do NOT substring-match any current FieldPattern in the password rule ("password", "passwd", "pass", "pwd"). These fields would be silently missed by the sanitizer unless:

A ValueDetector for password-like content exists (currently none on the password rule)
The patterns are explicitly added to rules.go

Real-World Example: The pfSense Password Field Gap#

Problem Discovery#

When adding pfSense support, the sanitizer's generic patterns covered OPNsense's <password> and <passwd> elements via the "pass" substring match, but completely missed pfSense's <bcrypt-hash> element because:

"bcrypt-hash" contains neither "password", "passwd", "pass", nor "pwd"
The password rule has no ValueDetector
Phase 1 fails, Phase 2 never runs for field-name-only rules

Solution#

Add device-specific hash element patterns to BOTH files:

In internal/sanitizer/rules.go:

{
    Name: "password",
    FieldPatterns: []string{
        "password", "passwd", "pass", "pwd",
        "bcrypt-hash", "bcrypt", // pfSense user passwords
        "sha512-hash", "sha512", // pfSense alternative hash format
    },
    // ...
}

In internal/sanitizer/patterns.go:

passwordKeywords = []string{
    "password", "passwd", "pass", "secret", "key", "token",
    "credential", "auth", "prv", "private",
    "bcrypt", "sha512", "hash", // pfSense hash element detection
}

XML Element Name Discovery Workflow#

When adding support for a new device type, follow this workflow to identify credential fields requiring pattern coverage:

Step 1: Examine Device Schema DTOs#

Navigate to pkg/schema/<device>/ directory

Search for credential-related struct tags in *.go files:

grep -r 'xml:.*hash\|xml:.*pass\|xml:.*secret\|xml:.*key' pkg/schema/pfsense/

Catalog all XML element names from `xml:"element-name"` tags

Example from pfSense:

type SystemUser struct {
    Name string `xml:"name"`
    BcryptHash string `xml:"bcrypt-hash"` // ← Credential field!
    UID string `xml:"uid"`
}

Step 2: Cross-Reference Schema Documentation#

Check pkg/schema/<device>/README.md for structural documentation
pfSense schema README contains 838 lines documenting 50+ configuration sections
Review docs/development/xml-structure-research.md for device comparison notes

Step 3: Verify Existing Pattern Coverage#

For each discovered credential field XML element name:

Convert to lowercase
Check if it substring-matches any pattern in rules.go credential rules
If NO match found → add to maintenance backlog

Step 4: Pattern Addition Checklist#

For each unmatched credential field:

Determine which rule it belongs to (password, secret, psk, snmp_community, private_key, ssh_authorized_keys)
Add shortest effective substring pattern to rule's FieldPatterns array in rules.go
Add related keywords to corresponding slice in patterns.go (e.g., passwordKeywords)
Update test cases in rules_test.go and patterns_test.go
Verify with detection method (see next section)

Detection Method: Verification Workflow#

After updating pattern lists, verify coverage using this command-line detection method:

opndossier sanitize <config.xml> | grep -iE 'hash|secret|key|pass|community|token'

Interpretation:

All sensitive values redacted ([REDACTED-*] placeholders) → ✅ Complete coverage
Plain-text sensitive values visible → ❌ Missing pattern (field name printed in grep output indicates which pattern to add)

Example of missed field:

<bcrypt-hash>$2b$10$abcdef...</bcrypt-hash> ← Not redacted, shows "bcrypt-hash" needs pattern

Alternative verification using test fixtures:

go test -v ./internal/sanitizer -run TestSanitize.*PfSense

Categories of Credential Fields Across Device Types#

Universal Patterns (All Devices)#

Private keys: prv, privatekey, private_key
Certificates: crt, cert, certificate (handled by crypto rules)
Generic passwords: password, passwd, pass

OPNsense-Specific#

otp_seed — OTP seed values (covered by secret rule)
rocommunity, rwcommunity — SNMP (covered by snmp_community rule)

pfSense-Specific#

bcrypt-hash, sha512-hash — User password hashes (requires explicit addition)
auth_pass — Authentication passwords (covered by "pass" pattern)
radius_secret — RADIUS shared secrets (covered by "secret" pattern)

Future Device Considerations#

When adding support for devices like Cisco ASA, Juniper SRX, or Fortinet FortiGate:

Cisco: Look for <enable-password>, <secret>, <key-string>, <community-string>
Juniper: Check for <secret>, <encrypted-password>, <pre-shared-key>
Fortinet: Watch for <password>, <psksecret>, <private-key>

Audit each device's XML config schema for keywords: hash, password, secret, key, token, community, psk, auth, credential.

Maintenance Patterns and Best Practices#

Pattern Selection Guidelines#

Shortest effective substring: Prefer "bcrypt" over "bcrypt-hash" to catch variants (bcrypt_hash, mybcryptpass)
Avoid over-matching: Don't use "bc" (too generic) — balance coverage vs. false positives
Consider exact match exception: If a pattern like "hash" causes false positives on compound names, add it to exactMatchPatterns

Test Coverage Requirements#

When adding patterns, update these test files:

internal/sanitizer/rules_test.go: Add test cases to TestShouldRedactField asserting the new pattern matches expected field names
internal/sanitizer/patterns_test.go: If adding to patterns.go keyword lists, add test cases for any new detector functions
Integration tests: Add device-specific XML fixtures to test end-to-end sanitization

Global Pattern Scanning Gotcha#

⚠️ Critical: ShouldRedactField scans ALL rules' FieldPatterns globally. Adding a FieldPattern to any rule can break "should not match" assertions for other rules.

Example conflict:

Adding pattern "id" to cloud_identifier rule
Breaks test asserting "userid" should NOT match password rule
Solution: Review all existing false assertions in rules_test.go before adding broad patterns

Implementation Checklist#

When adding credential field patterns for multi-device support:

Sanitizer Rule Engine — Core architecture for dual-phase detection
Multi-Device Parser Registry — Self-registration pattern in pkg/parser/registry.go
CommonDevice Interface — Device-agnostic abstraction layer that sanitizer operates below
pfSense Schema Implementation — 838-line structural reference
GOTCHAS.md §11.3 — OpenVPN TLS pattern selection rationale and false-positive risks

Relevant Code Files#

File	Purpose
`internal/sanitizer/rules.go`	Rule definitions, FieldPatterns arrays, ShouldRedactField logic
`internal/sanitizer/patterns.go`	ValueDetector functions, passwordKeywords, LooksLikePassword
`internal/sanitizer/sanitizer.go`	XML stream processing, sanitization execution
`internal/sanitizer/rules_test.go`	Pattern matching test cases
`internal/sanitizer/patterns_test.go`	Value detector test cases
`pkg/schema/pfsense/README.md`	pfSense XML structure documentation (838 lines)
`docs/development/xml-structure-research.md`	Device schema comparison notes