Registry Input Normalization Pattern#

Lead Section#

The Registry Input Normalization Pattern is a software engineering pattern that ensures consistent string normalization across all entry points of a registry system to prevent registration/lookup mismatches. The pattern requires that every method that accepts string keys—including registration, lookup, and alias resolution—applies identical normalization transformations before any map operations.

This pattern was discovered and formalized during the development of opnDossier's FormatRegistry in PR #434. During a Copilot-assisted code review, a subtle bug was identified: the Register() method normalized format names using strings.TrimSpace(strings.ToLower(...)), but the Get() and Canonical() methods initially only applied strings.ToLower(), omitting the whitespace trimming. This inconsistency meant that format strings with leading or trailing whitespace would register successfully but fail during lookup, creating silent failures in production.

The pattern is particularly critical for string-keyed map-based dispatch systems where keys may originate from multiple sources (configuration files, environment variables, user input, CLI arguments) that might contain inconsistent whitespace or casing. By enforcing identical normalization at all entry points, the pattern eliminates an entire class of hard-to-debug registration/lookup mismatches.

Problem Statement#

The Core Vulnerability#

String-keyed registries are inherently case-sensitive and whitespace-sensitive in most programming languages, including Go. When a registry system accepts string keys through multiple entry points (registration, lookup, alias resolution, validation), any inconsistency in how those strings are normalized creates opportunities for mismatches:

If registration normalizes " Markdown " to "markdown" but lookup searches for " markdown " (only lowercased), the lookup fails despite valid registration
If lookup normalizes but registration does not, a handler registered as "Markdown" cannot be retrieved via Get("markdown")
If alias resolution normalizes differently than registration, aliases fail to resolve correctly

These failures are particularly insidious because they are silent: the registry doesn't crash, it simply returns "not found" errors for keys that logically should exist. Users and developers may not realize the root cause is whitespace or casing rather than an actual missing registration.

Real-World Trigger Sources#

String keys in registry systems often originate from sources that introduce inconsistent formatting:

Configuration Files: YAML, TOML, or JSON files may contain accidental leading/trailing spaces in string values
Environment Variables: Shell environments may introduce whitespace when variables are set or exported
User Input: CLI arguments, web forms, or API requests may contain mixed casing or whitespace
Code Refactoring: As codebases evolve, different developers may use different conventions for string literals
Third-Party Integrations: External systems may pass format identifiers with unexpected formatting

In opnDossier's case, the FormatRegistry centralizes format dispatch for output formats (markdown, JSON, YAML, text, HTML). Format names flow through multiple code paths: CLI flags, configuration validation, file extension detection, and handler dispatch. Any of these paths could introduce whitespace from user input or configuration parsing.

The Pattern: Identical Normalization at All Entry Points#

Pattern Definition#

The Registry Input Normalization Pattern requires:

Define a canonical normalization function that applies all necessary transformations (e.g., TrimSpace + ToLower)
Apply the function at every write path (registration, including alias registration)
Apply the function at every read path (lookup, alias resolution, validation, existence checks)
Test all entry points with non-canonical inputs (mixed case, leading/trailing spaces, edge cases)

Normalization Function#

In opnDossier's implementation, the normalization is:

key := strings.TrimSpace(strings.ToLower(format))

This applies two transformations in sequence:

strings.ToLower: Converts all Unicode characters to lowercase, ensuring case-insensitive matching
strings.TrimSpace: Removes leading and trailing whitespace (spaces, tabs, newlines)

The order matters: ToLower(TrimSpace(...)) would produce the same result, but TrimSpace(ToLower(...)) is the conventional idiom in Go.

Entry Point Coverage#

In opnDossier's FormatRegistry, the pattern is applied at three entry points:

Method	Purpose	Normalization Applied	Line Reference
`Register(format, handler)`	Registers a format handler at init-time	`strings.TrimSpace(strings.ToLower(format))`	Line 58
`Get(format)`	Retrieves a handler for validation and dispatch	`strings.TrimSpace(strings.ToLower(format))`	Line 100
`Canonical(format)`	Resolves aliases to canonical format names	`strings.TrimSpace(strings.ToLower(format))`	Line 120

Additionally, alias registration within Register() applies the same normalization at line 73, ensuring aliases are stored under normalized keys as well.

Implementation Details: opnDossier's FormatRegistry#

The Bug Discovery#

During PR #434's Copilot-assisted code review, the normalization mismatch was discovered:

Before the fix: Register() applied strings.TrimSpace(strings.ToLower(...)) but Get() and Canonical() only applied strings.ToLower()
Impact: Inputs with leading/trailing whitespace (e.g., " json " from a config file) would register successfully as "json" but Get(" json ") would search for " json " (only lowercased), failing to find the handler
Fix: Added strings.TrimSpace to both Get() and Canonical() to match Register()'s normalization

This is a textbook example of the pattern violation: partial normalization (only at some entry points) is as dangerous as no normalization, because it creates unpredictable behavior depending on which code path is taken.

FormatRegistry Structure#

The FormatRegistry struct maintains two maps:

type FormatRegistry struct {
    mu sync.RWMutex
    handlers map[string]FormatHandler
    aliases map[string]string
}

handlers: Maps normalized canonical format names to their handler implementations
aliases: Maps normalized alias names to normalized canonical format names
mu: Read-write mutex for thread-safe concurrent access

Both maps store normalized keys only. The normalization function acts as a gatekeeper ensuring no non-normalized strings ever enter the maps.

FormatHandler Interface#

The FormatHandler interface decouples format metadata from generation logic:

type FormatHandler interface {
    FileExtension() string
    Aliases() []string
    Generate(g *HybridGenerator, data *common.CommonDevice, opts Options) (string, error)
    GenerateToWriter(g *HybridGenerator, w io.Writer, data *common.CommonDevice, opts Options) error
}

Each handler declares:

Its canonical file extension (.md, .json, .yaml, .txt, .html)
Alternative names via Aliases() (e.g., ["md"] for markdown, ["yml"] for YAML)
Both in-memory (Generate) and streaming (GenerateToWriter) generation methods

The five built-in handlers are registered in the DefaultRegistry singleton:

var DefaultRegistry = newDefaultRegistry()

func newDefaultRegistry() *FormatRegistry {
    r := NewFormatRegistry()
    r.Register("markdown", &markdownHandler{}) // aliases: ["md"]
    r.Register("json", &jsonHandler{})
    r.Register("yaml", &yamlHandler{}) // aliases: ["yml"]
    r.Register("text", &textHandler{}) // aliases: ["txt"]
    r.Register("html", &htmlHandler{}) // aliases: ["htm"]
    return r
}

Registration: database/sql Driver Pattern#

Register() follows the database/sql driver pattern, panicking on registration errors rather than returning errors:

Panic conditions:

This fail-fast approach ensures that registration errors are caught during initialization (typically in init() functions or at program startup) rather than silently failing at runtime.

Validation-then-commit pattern: Register() validates all aliases before mutating any state, ensuring a panic never leaves the registry in a partially registered state. Only after all validation passes does it commit changes to both the handlers and aliases maps.

Lookup: Two-Step Resolution#

Get() performs a two-step lookup:

First checks the handlers map for canonical format names
Then checks the aliases map and resolves to the canonical handler

If neither lookup succeeds, Get() returns ErrUnsupportedFormat.

Example: Get("yml") → checks handlers["yml"] (not found) → checks aliases["yml"] → resolves to "yaml" → returns handlers["yaml"]

Canonicalization: Alias Resolution#

Canonical() resolves aliases to canonical format names:

This is a more permissive interface than Get(): it returns the normalized string even for invalid formats, allowing callers to decide how to handle unrecognized inputs. Get() remains the authoritative validation gate.

Test Coverage and Validation#

Comprehensive Test Suite#

PR #434 added 76 registry test cases achieving 100% coverage of the registry module. The test suite explicitly covers normalization edge cases to prevent regression of the original bug.

Critical Test Scenarios#

Tests should validate normalization across all entry points:

Case insensitivity: Get("JSON"), Get("json"), Get("JsOn") all return the same handler
Whitespace tolerance: Get(" json "), Get("json\t"), Get("\njson") all succeed
Alias resolution with normalization: Get(" MD ") resolves to the markdown handler
Canonical resolution with normalization: Canonical(" YML ") returns "yaml"
Registration with non-canonical input: Register(" JSON ", handler) stores under "json"

The test suite confirms that aliases resolve to the same handler instance as their canonical name, validating pointer equality:

mdHandler, _ := reg.Get("md")
markdownHandler, _ := reg.Get("markdown")
if mdHandler != markdownHandler {
    t.Error("Alias should resolve to same handler instance")
}

Regression Guard#

The test suite serves as a regression guard for the normalization pattern. If a future refactoring removes TrimSpace from any entry point, tests with whitespace-padded inputs will fail immediately.

Cross-Package Integration#

After PR #434, the FormatRegistry replaced scattered format constants and switch statements across 8+ locations in the opnDossier codebase. Key integration points include:

CLI Validation and Completion#

cmd/convert.go uses DefaultRegistry.Get() for format validation, ensuring invalid formats are caught before processing begins. Shell completion in cmd/shared_flags.go derives format names from DefaultRegistry.ValidFormats(), ensuring CLI autocomplete matches valid registry entries.

Configuration Validation#

internal/converter/options.go delegates format validation to the registry: the Format.Validate() method calls DefaultRegistry.Get() to verify format strings from configuration files.

Processor Alias Resolution#

internal/processor/processor.go uses DefaultRegistry.Canonical() for alias resolution, ensuring aliases like md, yml, txt, and htm work consistently across all code paths.

Handler Dispatch#

internal/converter/hybrid_generator.go uses the registry for handler dispatch, replacing hardcoded switch statements with DefaultRegistry.Get() calls.

All of these integration points benefit from the normalization pattern: format strings from user input, config files, or CLI arguments are normalized consistently, preventing silent failures.

Generalization and Best Practices#

When to Apply This Pattern#

The Registry Input Normalization Pattern is essential when:

String-keyed registries or maps are used for dispatch, lookup, or validation
Multiple entry points accept string keys (registration, lookup, validation, etc.)
String keys originate from external sources (user input, config files, environment variables)
Case sensitivity or whitespace differences are not semantically meaningful

Pattern Checklist#

To implement the pattern correctly:

Define a single canonical normalization function (e.g., normalize(s) = TrimSpace(ToLower(s)))
Identify all entry points that accept string keys
Apply the normalization function at every entry point before map operations
Store only normalized keys in maps; never store raw input
Write tests for each entry point with mixed-case, whitespace-padded, and edge-case inputs
Document the normalization function and pattern in code comments and architecture docs

Anti-Patterns to Avoid#

❌ Partial normalization: Normalizing at some entry points but not others
❌ Inconsistent normalization: Using different normalization logic at different entry points
❌ Late normalization: Storing raw keys and normalizing only during lookup
❌ Silent fallbacks: Returning defaults or empty results instead of errors for non-normalized mismatches
❌ Untested edge cases: Failing to test whitespace, mixed case, and Unicode edge cases

Performance Considerations#

Normalization adds a small performance overhead (string allocation for TrimSpace, Unicode iteration for ToLower). However:

For init-time registration, this is negligible
For lookup paths, the overhead is typically dwarfed by map access costs
Caching normalized keys at the call site can eliminate repeated normalization

In opnDossier's implementation, normalization happens once per Get() call. Since format dispatch typically occurs once per report generation (not in tight loops), the overhead is acceptable.

Registry Pattern (Gang of Four)#

The Registry pattern (also called Service Locator) provides a centralized directory for looking up services or handlers by name. The Input Normalization Pattern is a refinement that addresses string-key consistency in registry implementations.

Canonical Key Pattern#

The Canonical Key Pattern generalizes beyond registries: any map-based lookup system should define a canonical key representation and normalize all inputs to that representation. Examples include:

URL routing (normalizing paths with/without trailing slashes)
HTTP header maps (case-insensitive header names)
Database table/column name lookups (case sensitivity varies by DBMS)

Fail-Fast Initialization (database/sql Pattern)#

The database/sql driver pattern panics on duplicate registration during initialization rather than returning errors. This ensures misconfiguration is caught at program startup, not during production traffic. opnDossier's Register() follows this pattern.

Validation-Then-Commit Pattern#

opnDossier's Register() validates all aliases before mutating any state, ensuring panics never leave the registry partially registered. This is a variant of transaction semantics: either the entire registration succeeds or nothing is committed.

Relevant Code Files#

File Path	Description	Key Lines
`internal/converter/registry.go`	FormatRegistry implementation, FormatHandler interface, normalization pattern	Register: 50-92, Get: 96-111, Canonical: 116-131
`internal/converter/registry_test.go`	76 test cases with 100% coverage, normalization edge cases	Lines 1-500+
`internal/converter/options.go`	Format validation delegating to DefaultRegistry.Get()	Lines 8-35
`internal/converter/hybrid_generator.go`	Handler dispatch via registry	N/A
`cmd/convert.go`	CLI format validation using DefaultRegistry.Get()	N/A
`cmd/shared_flags.go`	Shell completion deriving from DefaultRegistry.ValidFormats()	N/A
`internal/processor/processor.go`	Alias resolution via DefaultRegistry.Canonical()	N/A

Registry Input Normalization Pattern#

Lead Section#

Problem Statement#

The Core Vulnerability#

Real-World Trigger Sources#

The Pattern: Identical Normalization at All Entry Points#

Pattern Definition#

Normalization Function#

Entry Point Coverage#

Implementation Details: opnDossier's FormatRegistry#

The Bug Discovery#

FormatRegistry Structure#

FormatHandler Interface#

Registration: database/sql Driver Pattern#

Lookup: Two-Step Resolution#

Canonicalization: Alias Resolution#

Test Coverage and Validation#

Comprehensive Test Suite#

Critical Test Scenarios#

Regression Guard#

Cross-Package Integration#

CLI Validation and Completion#

Configuration Validation#

Processor Alias Resolution#

Handler Dispatch#

Generalization and Best Practices#

When to Apply This Pattern#

Pattern Checklist#

Anti-Patterns to Avoid#

Performance Considerations#

Registry Pattern (Gang of Four)#

Canonical Key Pattern#

Fail-Fast Initialization (database/sql Pattern)#

Validation-Then-Commit Pattern#

Relevant Code Files#

See Also#

References#

Registry Input Normalization Pattern#

Lead Section#

Problem Statement#

The Core Vulnerability#

Real-World Trigger Sources#

The Pattern: Identical Normalization at All Entry Points#

Pattern Definition#

Normalization Function#

Entry Point Coverage#

Implementation Details: opnDossier's FormatRegistry#

The Bug Discovery#

FormatRegistry Structure#

FormatHandler Interface#

Registration: database/sql Driver Pattern#

Lookup: Two-Step Resolution#

Canonicalization: Alias Resolution#

Test Coverage and Validation#

Comprehensive Test Suite#

Critical Test Scenarios#

Regression Guard#

Cross-Package Integration#

CLI Validation and Completion#

Configuration Validation#

Processor Alias Resolution#

Handler Dispatch#

Generalization and Best Practices#

When to Apply This Pattern#

Pattern Checklist#

Anti-Patterns to Avoid#

Performance Considerations#

Related Patterns and Concepts#

Registry Pattern (Gang of Four)#

Canonical Key Pattern#

Fail-Fast Initialization (database/sql Pattern)#

Validation-Then-Commit Pattern#

Relevant Code Files#

See Also#

References#