Registry Input Normalization Pattern#
Lead Section#
The Registry Input Normalization Pattern is a software engineering pattern that ensures consistent string normalization across all entry points of a registry system to prevent registration/lookup mismatches. The pattern requires that every method that accepts string keys—including registration, lookup, and alias resolution—applies identical normalization transformations before any map operations.
This pattern was discovered and formalized during the development of opnDossier's FormatRegistry in PR #434. During a Copilot-assisted code review, a subtle bug was identified: the Register() method normalized format names using strings.TrimSpace(strings.ToLower(...)), but the Get() and Canonical() methods initially only applied strings.ToLower(), omitting the whitespace trimming. This inconsistency meant that format strings with leading or trailing whitespace would register successfully but fail during lookup, creating silent failures in production.
The pattern is particularly critical for string-keyed map-based dispatch systems where keys may originate from multiple sources (configuration files, environment variables, user input, CLI arguments) that might contain inconsistent whitespace or casing. By enforcing identical normalization at all entry points, the pattern eliminates an entire class of hard-to-debug registration/lookup mismatches.
Problem Statement#
The Core Vulnerability#
String-keyed registries are inherently case-sensitive and whitespace-sensitive in most programming languages, including Go. When a registry system accepts string keys through multiple entry points (registration, lookup, alias resolution, validation), any inconsistency in how those strings are normalized creates opportunities for mismatches:
- If registration normalizes
" Markdown "to"markdown"but lookup searches for" markdown "(only lowercased), the lookup fails despite valid registration - If lookup normalizes but registration does not, a handler registered as
"Markdown"cannot be retrieved viaGet("markdown") - If alias resolution normalizes differently than registration, aliases fail to resolve correctly
These failures are particularly insidious because they are silent: the registry doesn't crash, it simply returns "not found" errors for keys that logically should exist. Users and developers may not realize the root cause is whitespace or casing rather than an actual missing registration.
Real-World Trigger Sources#
String keys in registry systems often originate from sources that introduce inconsistent formatting:
- Configuration Files: YAML, TOML, or JSON files may contain accidental leading/trailing spaces in string values
- Environment Variables: Shell environments may introduce whitespace when variables are set or exported
- User Input: CLI arguments, web forms, or API requests may contain mixed casing or whitespace
- Code Refactoring: As codebases evolve, different developers may use different conventions for string literals
- Third-Party Integrations: External systems may pass format identifiers with unexpected formatting
In opnDossier's case, the FormatRegistry centralizes format dispatch for output formats (markdown, JSON, YAML, text, HTML). Format names flow through multiple code paths: CLI flags, configuration validation, file extension detection, and handler dispatch. Any of these paths could introduce whitespace from user input or configuration parsing.
The Pattern: Identical Normalization at All Entry Points#
Pattern Definition#
The Registry Input Normalization Pattern requires:
- Define a canonical normalization function that applies all necessary transformations (e.g.,
TrimSpace + ToLower) - Apply the function at every write path (registration, including alias registration)
- Apply the function at every read path (lookup, alias resolution, validation, existence checks)
- Test all entry points with non-canonical inputs (mixed case, leading/trailing spaces, edge cases)
Normalization Function#
In opnDossier's implementation, the normalization is:
key := strings.TrimSpace(strings.ToLower(format))
This applies two transformations in sequence:
strings.ToLower: Converts all Unicode characters to lowercase, ensuring case-insensitive matchingstrings.TrimSpace: Removes leading and trailing whitespace (spaces, tabs, newlines)
The order matters: ToLower(TrimSpace(...)) would produce the same result, but TrimSpace(ToLower(...)) is the conventional idiom in Go.
Entry Point Coverage#
In opnDossier's FormatRegistry, the pattern is applied at three entry points:
| Method | Purpose | Normalization Applied | Line Reference |
|---|---|---|---|
Register(format, handler) | Registers a format handler at init-time | strings.TrimSpace(strings.ToLower(format)) | Line 58 |
Get(format) | Retrieves a handler for validation and dispatch | strings.TrimSpace(strings.ToLower(format)) | Line 100 |
Canonical(format) | Resolves aliases to canonical format names | strings.TrimSpace(strings.ToLower(format)) | Line 120 |
Additionally, alias registration within Register() applies the same normalization at line 73, ensuring aliases are stored under normalized keys as well.
Implementation Details: opnDossier's FormatRegistry#
The Bug Discovery#
During PR #434's Copilot-assisted code review, the normalization mismatch was discovered:
- Before the fix:
Register()appliedstrings.TrimSpace(strings.ToLower(...))butGet()andCanonical()only appliedstrings.ToLower() - Impact: Inputs with leading/trailing whitespace (e.g.,
" json "from a config file) would register successfully as"json"butGet(" json ")would search for" json "(only lowercased), failing to find the handler - Fix: Added
strings.TrimSpaceto bothGet()andCanonical()to matchRegister()'s normalization
This is a textbook example of the pattern violation: partial normalization (only at some entry points) is as dangerous as no normalization, because it creates unpredictable behavior depending on which code path is taken.
FormatRegistry Structure#
The FormatRegistry struct maintains two maps:
type FormatRegistry struct {
mu sync.RWMutex
handlers map[string]FormatHandler
aliases map[string]string
}
handlers: Maps normalized canonical format names to their handler implementationsaliases: Maps normalized alias names to normalized canonical format namesmu: Read-write mutex for thread-safe concurrent access
Both maps store normalized keys only. The normalization function acts as a gatekeeper ensuring no non-normalized strings ever enter the maps.
FormatHandler Interface#
The FormatHandler interface decouples format metadata from generation logic:
type FormatHandler interface {
FileExtension() string
Aliases() []string
Generate(g *HybridGenerator, data *common.CommonDevice, opts Options) (string, error)
GenerateToWriter(g *HybridGenerator, w io.Writer, data *common.CommonDevice, opts Options) error
}
Each handler declares:
- Its canonical file extension (
.md,.json,.yaml,.txt,.html) - Alternative names via
Aliases()(e.g.,["md"]for markdown,["yml"]for YAML) - Both in-memory (
Generate) and streaming (GenerateToWriter) generation methods
The five built-in handlers are registered in the DefaultRegistry singleton:
var DefaultRegistry = newDefaultRegistry()
func newDefaultRegistry() *FormatRegistry {
r := NewFormatRegistry()
r.Register("markdown", &markdownHandler{}) // aliases: ["md"]
r.Register("json", &jsonHandler{})
r.Register("yaml", &yamlHandler{}) // aliases: ["yml"]
r.Register("text", &textHandler{}) // aliases: ["txt"]
r.Register("html", &htmlHandler{}) // aliases: ["htm"]
return r
}
Registration: database/sql Driver Pattern#
Register() follows the database/sql driver pattern, panicking on registration errors rather than returning errors:
Panic conditions:
- Nil handler
- Empty format name after normalization
- Duplicate canonical format name
- Format name conflicting with existing alias
- Duplicate alias name
- Alias conflicting with canonical format
This fail-fast approach ensures that registration errors are caught during initialization (typically in init() functions or at program startup) rather than silently failing at runtime.
Validation-then-commit pattern: Register() validates all aliases before mutating any state, ensuring a panic never leaves the registry in a partially registered state. Only after all validation passes does it commit changes to both the handlers and aliases maps.
Lookup: Two-Step Resolution#
Get() performs a two-step lookup:
- First checks the
handlersmap for canonical format names - Then checks the
aliasesmap and resolves to the canonical handler
If neither lookup succeeds, Get() returns ErrUnsupportedFormat.
Example: Get("yml") → checks handlers["yml"] (not found) → checks aliases["yml"] → resolves to "yaml" → returns handlers["yaml"]
Canonicalization: Alias Resolution#
Canonical() resolves aliases to canonical format names:
- Returns the input if it's already a canonical name
- Resolves via the
aliasesmap - Returns the normalized input with
ok=falsefor unrecognized formats
This is a more permissive interface than Get(): it returns the normalized string even for invalid formats, allowing callers to decide how to handle unrecognized inputs. Get() remains the authoritative validation gate.
Test Coverage and Validation#
Comprehensive Test Suite#
PR #434 added 76 registry test cases achieving 100% coverage of the registry module. The test suite explicitly covers normalization edge cases to prevent regression of the original bug.
Critical Test Scenarios#
Tests should validate normalization across all entry points:
- Case insensitivity:
Get("JSON"),Get("json"),Get("JsOn")all return the same handler - Whitespace tolerance:
Get(" json "),Get("json\t"),Get("\njson")all succeed - Alias resolution with normalization:
Get(" MD ")resolves to the markdown handler - Canonical resolution with normalization:
Canonical(" YML ")returns"yaml" - Registration with non-canonical input:
Register(" JSON ", handler)stores under"json"
The test suite confirms that aliases resolve to the same handler instance as their canonical name, validating pointer equality:
mdHandler, _ := reg.Get("md")
markdownHandler, _ := reg.Get("markdown")
if mdHandler != markdownHandler {
t.Error("Alias should resolve to same handler instance")
}
Regression Guard#
The test suite serves as a regression guard for the normalization pattern. If a future refactoring removes TrimSpace from any entry point, tests with whitespace-padded inputs will fail immediately.
Cross-Package Integration#
After PR #434, the FormatRegistry replaced scattered format constants and switch statements across 8+ locations in the opnDossier codebase. Key integration points include:
CLI Validation and Completion#
cmd/convert.go uses DefaultRegistry.Get() for format validation, ensuring invalid formats are caught before processing begins. Shell completion in cmd/shared_flags.go derives format names from DefaultRegistry.ValidFormats(), ensuring CLI autocomplete matches valid registry entries.
Configuration Validation#
internal/converter/options.go delegates format validation to the registry: the Format.Validate() method calls DefaultRegistry.Get() to verify format strings from configuration files.
Processor Alias Resolution#
internal/processor/processor.go uses DefaultRegistry.Canonical() for alias resolution, ensuring aliases like md, yml, txt, and htm work consistently across all code paths.
Handler Dispatch#
internal/converter/hybrid_generator.go uses the registry for handler dispatch, replacing hardcoded switch statements with DefaultRegistry.Get() calls.
All of these integration points benefit from the normalization pattern: format strings from user input, config files, or CLI arguments are normalized consistently, preventing silent failures.
Generalization and Best Practices#
When to Apply This Pattern#
The Registry Input Normalization Pattern is essential when:
- String-keyed registries or maps are used for dispatch, lookup, or validation
- Multiple entry points accept string keys (registration, lookup, validation, etc.)
- String keys originate from external sources (user input, config files, environment variables)
- Case sensitivity or whitespace differences are not semantically meaningful
Pattern Checklist#
To implement the pattern correctly:
- Define a single canonical normalization function (e.g.,
normalize(s) = TrimSpace(ToLower(s))) - Identify all entry points that accept string keys
- Apply the normalization function at every entry point before map operations
- Store only normalized keys in maps; never store raw input
- Write tests for each entry point with mixed-case, whitespace-padded, and edge-case inputs
- Document the normalization function and pattern in code comments and architecture docs
Anti-Patterns to Avoid#
❌ Partial normalization: Normalizing at some entry points but not others
❌ Inconsistent normalization: Using different normalization logic at different entry points
❌ Late normalization: Storing raw keys and normalizing only during lookup
❌ Silent fallbacks: Returning defaults or empty results instead of errors for non-normalized mismatches
❌ Untested edge cases: Failing to test whitespace, mixed case, and Unicode edge cases
Performance Considerations#
Normalization adds a small performance overhead (string allocation for TrimSpace, Unicode iteration for ToLower). However:
- For init-time registration, this is negligible
- For lookup paths, the overhead is typically dwarfed by map access costs
- Caching normalized keys at the call site can eliminate repeated normalization
In opnDossier's implementation, normalization happens once per Get() call. Since format dispatch typically occurs once per report generation (not in tight loops), the overhead is acceptable.
Related Patterns and Concepts#
Registry Pattern (Gang of Four)#
The Registry pattern (also called Service Locator) provides a centralized directory for looking up services or handlers by name. The Input Normalization Pattern is a refinement that addresses string-key consistency in registry implementations.
Canonical Key Pattern#
The Canonical Key Pattern generalizes beyond registries: any map-based lookup system should define a canonical key representation and normalize all inputs to that representation. Examples include:
- URL routing (normalizing paths with/without trailing slashes)
- HTTP header maps (case-insensitive header names)
- Database table/column name lookups (case sensitivity varies by DBMS)
Fail-Fast Initialization (database/sql Pattern)#
The database/sql driver pattern panics on duplicate registration during initialization rather than returning errors. This ensures misconfiguration is caught at program startup, not during production traffic. opnDossier's Register() follows this pattern.
Validation-Then-Commit Pattern#
opnDossier's Register() validates all aliases before mutating any state, ensuring panics never leave the registry partially registered. This is a variant of transaction semantics: either the entire registration succeeds or nothing is committed.
Relevant Code Files#
| File Path | Description | Key Lines |
|---|---|---|
internal/converter/registry.go | FormatRegistry implementation, FormatHandler interface, normalization pattern | Register: 50-92, Get: 96-111, Canonical: 116-131 |
internal/converter/registry_test.go | 76 test cases with 100% coverage, normalization edge cases | Lines 1-500+ |
internal/converter/options.go | Format validation delegating to DefaultRegistry.Get() | Lines 8-35 |
internal/converter/hybrid_generator.go | Handler dispatch via registry | N/A |
cmd/convert.go | CLI format validation using DefaultRegistry.Get() | N/A |
cmd/shared_flags.go | Shell completion deriving from DefaultRegistry.ValidFormats() | N/A |
internal/processor/processor.go | Alias resolution via DefaultRegistry.Canonical() | N/A |
See Also#
- Refactoring Patterns (opnDossier Knowledge Base) — Registry-Based Dispatch Pattern section
- Technical Deep-Dive: Architecture, Internals & Roadmap — FormatRegistry Pattern section
- Architecture Review Findings — Converter Pattern Consolidation (PR #434)
- PR #434: Introduce FormatRegistry
- AGENTS.md §5.9b — Implementation guidance on the FormatRegistry pattern
- Go
database/sql.Register()documentation — Panic-on-duplicate pattern inspiration