Documents
Documentation Orphan Cleanup
Documentation Orphan Cleanup
Type
Topic
Status
Published
Created
Mar 4, 2026
Updated
Mar 4, 2026
Created by
Dosu Bot
Updated by
Dosu Bot

Documentation Orphan Cleanup#

Lead Section#

Documentation Orphan Cleanup is a maintenance pattern for identifying and removing orphaned documentation files—files that exist in a documentation directory but are not referenced in the documentation site's navigation configuration. This pattern emerged during documentation reorganization efforts in the opnDossier project, where multiple documentation restructurings created unreferenced files in the docs/ directory that were not linked in the MkDocs configuration.

Orphaned documentation files accumulate when documentation is reorganized, renamed, or consolidated without removing previous versions. These files consume repository space, can confuse contributors, may appear in search results despite being outdated, and create maintenance burden through stale information. The cleanup pattern provides a systematic approach to detect orphans by comparing navigation configuration against filesystem content, classify each orphan's purpose and value, and either remove, consolidate, or integrate files into the active documentation structure.

In the opnDossier case, orphaned files included duplicate configuration documentation, a near-identical contributing guide, and a valuable security assurance document that was accidentally omitted from navigation. The pattern also addresses diagram rendering gotchas—specifically the need to convert ASCII art diagrams to Mermaid format for proper rendering in modern documentation systems using pymdownx.superfences extensions.

Context: Documentation Reorganization in opnDossier#

Multiple Restructuring Events#

The opnDossier project underwent multiple major documentation reorganizations between August 2025 and February 2026:

  1. Project Migration (PR #43, August 2025): Complete namespace change from opnFocus to opnDossier, affecting all import paths, environment variables, and documentation references
  2. Template System Removal (PR #187, January 2026): Transition from template-based to programmatic documentation generation via MarkdownBuilder
  3. Documentation Accuracy Fix (PR #267, February 2026): Comprehensive rewrite to remove fabricated content, update outdated information, and restructure sections

Each reorganization event created orphaned files through different mechanisms:

  • Directory renames: Moving from docs/dev-guide/ to docs/development/ without removing old directory
  • File consolidation: Merging multiple documents into unified pages, leaving originals unreferenced
  • Namespace migrations: Updating file paths and names while old versions remained in place
  • Content rewrites: Creating new simplified user-facing guides while comprehensive references remained orphaned

The mkdocs.yml navigation defines 8 main sections with 31 total documentation pages:

  1. Home - Landing page
  2. User Guide (5 pages) - Installation, usage, configuration
  3. Examples (7 pages) - Practical usage scenarios
  4. Data Model & Integration (4 pages) - Model reference and export formats
  5. Development (7 pages) - Architecture, standards, API
  6. Security (3 pages) - Compliance, validation, vulnerability scanning
  7. Reference (4 pages) - API docs, compliance standards
  8. Research & Internal (1 page) - Theme usage

Files not referenced in this structure are considered orphaned, even if they exist on the filesystem.

Orphaned Files Identified#

Root-Level Configuration File#

File: docs/configuration.md (629 lines, created January 30, 2026)

Status: Orphaned comprehensive reference document

Navigation Reference: Not in mkdocs.yml navigation

Content: Comprehensive configuration reference with detailed tables for all environment variables, CLI flags, common patterns, and troubleshooting

Assessment: This appears to be legacy content superseded by docs/user-guide/configuration.md (216 lines, updated February 19, 2026). The user guide version is simplified and more accessible, while the root-level version is an exhaustive reference. The February 19 documentation rewrite created the simpler version but left the comprehensive original in place.

Recommendation: Evaluate whether comprehensive reference content should be preserved by moving it to docs/user-guide/configuration-reference.md (which is referenced in navigation but doesn't exist) or removed as truly deprecated.

Dev-Guide Directory#

Directory: docs/dev-guide/

Files: contributing.md (215 lines)

Status: Near-duplicate of content in docs/development/

Navigation Reference: Not in mkdocs.yml; only docs/development/contributing.md is referenced

Content Comparison: The two contributing.md files are 99% identical (created February 28 and March 1, 2026), differing only in:

  • Relative path references (line 76): ../development/standards.md vs standards.md
  • One missing line in development version about emoji usage policy (line 192)

Assessment: Clear duplication resulting from directory consolidation. The docs/dev-guide/ directory appears to be orphaned content from a previous organization scheme.

Recommendation: Delete docs/dev-guide/contributing.md as the canonical version is docs/development/contributing.md.

Security Assurance Document#

File: docs/security/security-assurance.md (151 lines)

Status: Valuable content accidentally omitted from navigation

Navigation Reference: Not in mkdocs.yml Security section, but linked from root SECURITY.md

Content: Formal security assurance case following NIST IR 7608 model, including:

  • 7 explicit security requirements (SR-1 through SR-7)
  • Threat model with 3 actor profiles and 7 attack vectors
  • Trust boundaries diagram (ASCII art, lines 47-76)
  • Saltzer & Schroeder security principles with implementation mappings
  • CWE/SANS Top 25 countermeasures with Go-specific details
  • Supply chain security measures

Assessment: This is unique, valuable content NOT duplicated elsewhere. It serves a distinct audience (security architects, auditors, formal reviewers) compared to operational security policy in SECURITY.md or user-facing validation documentation in docs/security/validation.md.

Recommendation: Add to mkdocs.yml navigation under Security section AND convert ASCII diagram to Mermaid format for proper rendering.

Templates vs Data Model Directory#

Context: The background mentions docs/templates/ containing duplicates of docs/data-model/ files.

Finding: The docs/data-model/ directory does not exist in the current repository. Data model documentation is organized under docs/templates/:

  • model-reference.md
  • index.md
  • examples/json-export.md
  • examples/yaml-processing.md

Navigation Reference: All these files are properly referenced in mkdocs.yml under "Data Model & Integration" section.

Assessment: This appears to be a documentation structure decision rather than an orphan scenario. The namespace "templates" is intentional for model/export documentation.

Detection Pattern#

Algorithmic Approach#

The detection algorithm compares navigation configuration against filesystem content:

import yaml
import os
from pathlib import Path

def find_orphaned_docs(mkdocs_path='mkdocs.yml', docs_dir='docs'):
    """Identify documentation files not referenced in MkDocs navigation."""

    # Load mkdocs.yml configuration
    with open(mkdocs_path, 'r') as f:
        config = yaml.safe_load(f)

    # Extract all nav entries recursively
    referenced_files = set()

    def extract_nav_files(nav_list):
        """Recursively extract file paths from navigation structure."""
        for item in nav_list:
            if isinstance(item, dict):
                for key, value in item.items():
                    if isinstance(value, str) and value.endswith('.md'):
                        # Store relative path from docs/ directory
                        referenced_files.add(value)
                    elif isinstance(value, list):
                        extract_nav_files(value)

    extract_nav_files(config.get('nav', []))

    # Find all markdown files in docs/ directory
    all_docs = set()
    docs_path = Path(docs_dir)
    for md_file in docs_path.rglob('*.md'):
        # Convert to relative path from docs/ directory
        rel_path = md_file.relative_to(docs_path)
        all_docs.add(str(rel_path).replace('\\', '/'))

    # Identify orphans
    orphaned = all_docs - referenced_files

    return {
        'referenced': sorted(referenced_files),
        'all_files': sorted(all_docs),
        'orphaned': sorted(orphaned)
    }

# Usage
results = find_orphaned_docs()
print(f"Total documentation files: {len(results['all_files'])}")
print(f"Referenced in navigation: {len(results['referenced'])}")
print(f"Orphaned files: {len(results['orphaned'])}")
print("\nOrphaned files:")
for orphan in results['orphaned']:
    print(f" - {orphan}")

Manual Verification Steps#

Automated detection may produce false positives. Manual verification should check:

  1. Indirect References: Is the file linked from other documentation pages even if not in navigation?
  2. Special Purpose Files: Is this a README, template, or include file used by other docs?
  3. External References: Is the file linked from repository root files (README.md, SECURITY.md, CONTRIBUTING.md)?
  4. Build Artifacts: Is this file generated during build rather than source content?

For example, docs/security/security-assurance.md is orphaned from MkDocs navigation but linked from root SECURITY.md, indicating it has value beyond navigation structure.

Diagram Rendering Gotcha: ASCII Art vs Mermaid#

The Problem#

Modern documentation systems using MkDocs Material with the pymdownx.superfences extension are optimized for diagram-as-code formats like Mermaid, not ASCII art. ASCII diagrams face several rendering issues:

  1. Font Dependency: Require monospace fonts with consistent character widths
  2. Responsive Layout: Break on mobile devices or narrow viewports
  3. Accessibility: Screen readers cannot interpret ASCII art semantically
  4. Version Control: Large text-based diagrams create noisy diffs
  5. Maintenance: Manual spacing adjustments for any content changes

Example: ASCII Art Trust Boundary Diagram#

The security assurance document contains an ASCII trust boundary diagram:

+------------------------------------------------------------------+
| Untrusted |
| +------------------+ +-------------------+ |
| | config.xml | | CLI Arguments | |
| | (any content) | | (paths, flags) | |
| +--------+---------+ +--------+----------+ |
| | | |
+-----------+-----------------------+-------------------------------+
            | |
   =========|=======================|============ Trust Boundary ====
            | |
+-----------v-----------------------v-------------------------------+
| opnDossier |
| +----------------+ +----------------+ +--------------+ |
| | XML Parser | | Schema Mapping | | Report Gen | |

This 30-line ASCII diagram requires careful character alignment and breaks if fonts change.

Mermaid Conversion#

opnDossier's mkdocs.yml configures Mermaid support via pymdownx.superfences:

markdown_extensions:
  - pymdownx.superfences:
      custom_fences:
        - name: mermaid
          class: mermaid
          format: !!python/name:pymdownx.superfences.fence_code_format

The same trust boundary diagram expressed as Mermaid:

Mermaid Advantages#

The opnDossier architecture documentation demonstrates extensive Mermaid usage with multiple diagram types:

Mermaid provides:

  1. SVG Rendering: Scales perfectly at any resolution
  2. Semantic Structure: Screen readers can access diagram content
  3. Automatic Layout: Engine handles node positioning and routing
  4. Version Control: Plain text diffs show logical changes
  5. Multiple Diagram Types: Flowcharts, sequence, class, state, ER diagrams
  6. Styling Support: Custom colors, fonts, and themes

Cleanup Procedures#

Phase 1: Audit and Classification#

Objective: Document all orphaned files with context

For each orphaned file:

  1. Record Metadata:

    • File path and size
    • Creation/modification dates from git history
    • Last commit author and message
  2. Analyze Content:

    • What does this file document?
    • Is content duplicated elsewhere?
    • What audience does it serve?
  3. Check References:

    • Search codebase for file references
    • Check root repository files (README, CONTRIBUTING, SECURITY)
    • Look for links in other documentation
  4. Classify:

    • Delete: Duplicates or outdated content
    • Integrate: Valuable content to add to navigation
    • Consolidate: Merge with existing documentation
    • Review: Uncertain cases requiring maintainer decision

Phase 2: Execution#

For Files to Delete:

# Document the reason for removal
git rm docs/dev-guide/contributing.md

# Use conventional commit format
git commit -s -m "docs: remove orphaned contributing guide duplicate

The docs/dev-guide/contributing.md file was a near-duplicate of
docs/development/contributing.md (referenced in mkdocs.yml). Only
the canonical development/ version is needed.

Refs: #issue-number"

For Files to Integrate into Navigation:

  1. Review content for accuracy and completeness
  2. Convert any ASCII diagrams to Mermaid
  3. Update internal links if paths changed
  4. Add entry to appropriate section in mkdocs.yml
  5. Test local documentation build

Example mkdocs.yml addition:

- Security:
    - Compliance: security/compliance.md
    - Validation: security/validation.md
    - Vulnerability Scanning: security/vulnerability-scanning.md
    - Security Assurance: security/security-assurance.md # Added

For Files to Consolidate:

  1. Compare content between duplicate versions
  2. Identify unique sections in each version
  3. Merge into single authoritative document
  4. Update or create navigation entry
  5. Remove superseded versions

Phase 3: Verification#

Build Validation:

# Strict mode catches broken references
mkdocs build --strict

# Check for warnings
mkdocs serve

Navigation Testing:

  1. Verify all nav entries point to existing files
  2. Check no file is referenced twice
  3. Test key navigation paths manually
  4. Verify internal links between docs work

Link Checking:

Use markdown linters to validate internal references:

# Example with markdownlint
markdownlint docs/**/*.md

# Check for broken links
# (Add link checking to CI pipeline)

Phase 4: Documentation#

Update project documentation standards to prevent future orphans:

  1. Update CONTRIBUTING.md: Document the requirement to update mkdocs.yml when adding/moving/removing docs
  2. Pre-commit Hook: Add check that warns about docs/ files not in mkdocs.yml
  3. CI Validation: Add workflow step to detect orphaned documentation
  4. Documentation Standards: Include navigation sync in review checklist

Prevention Strategies#

Pre-Commit Hook#

Add to .pre-commit-config.yaml:

repos:
  - repo: local
    hooks:
      - id: check-docs-navigation
        name: Check documentation navigation sync
        entry: scripts/check-docs-nav.py
        language: python
        files: '^(docs/.*\.md|mkdocs\.yml)$'
        pass_filenames: false

Script scripts/check-docs-nav.py:

#!/usr/bin/env python3
"""Pre-commit hook to detect orphaned documentation files."""

import sys
import yaml
from pathlib import Path

def main():
    # Same detection logic as earlier
    # Return non-zero exit code if orphans found
    results = find_orphaned_docs()

    if results['orphaned']:
        print("ERROR: Orphaned documentation files detected:")
        for orphan in results['orphaned']:
            print(f" - docs/{orphan}")
        print("\nEither add these files to mkdocs.yml navigation")
        print("or remove them if they are no longer needed.")
        return 1

    return 0

if __name__ == '__main__':
    sys.exit(main())

CI/CD Integration#

Add to GitHub Actions workflow:

name: Documentation

on: [push, pull_request]

jobs:
  check-docs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'

      - name: Install dependencies
        run: |
          pip install mkdocs-material pyyaml

      - name: Check for orphaned docs
        run: python scripts/check-docs-nav.py

      - name: Build documentation
        run: mkdocs build --strict

      - name: Check for broken links
        run: |
          # Add link checking tool
          npm install -g markdown-link-check
          find docs -name '*.md' -exec markdown-link-check {} \;

Development Workflow#

When Adding Documentation:

  1. Create markdown file in appropriate docs/ subdirectory
  2. Add entry to mkdocs.yml navigation in same commit
  3. Run mkdocs serve locally to verify rendering
  4. Include documentation changes in PR description

When Moving/Renaming Documentation:

  1. Update file path in docs/ directory
  2. Update mkdocs.yml navigation references
  3. Search for and update any internal links to the file
  4. Commit file move and navigation update together

When Removing Documentation:

  1. Remove file from docs/ directory
  2. Remove entry from mkdocs.yml navigation
  3. Search for and remove/update any references to the file
  4. Document removal reason in commit message

Documentation Structure Patterns#

  • Directory-per-Section: Organize by user journey (user-guide/, development/, security/)
  • Progressive Disclosure: Simple guides with links to comprehensive references
  • Navigation Hierarchy: Match navigation structure to mental models, not filesystem structure

MkDocs Material Features#

The opnDossier mkdocs.yml uses extensive Material theme features:

Automation Tools#

Relevant Code Files#

File PathPurposeLines
mkdocs.ymlMkDocs configuration and navigation structure133
docs/configuration.mdOrphaned comprehensive configuration reference629
docs/user-guide/configuration.mdActive simplified configuration guide216
docs/dev-guide/contributing.mdOrphaned duplicate contributing guide215
docs/development/contributing.mdActive canonical contributing guide214
docs/security/security-assurance.mdOrphaned valuable security assurance case (needs integration)151
docs/development/architecture.mdExamples of Mermaid diagram usage450+
.pre-commit-config.yamlPre-commit hooks for documentation linting84
  • Documentation Standards: opnDossier's comprehensive documentation guidelines
  • MkDocs Material Documentation: Theme and extension documentation
  • PyMdown Extensions: Markdown extensions including superfences for Mermaid
  • Mermaid Documentation: Diagram-as-code syntax and examples
  • Documentation Versioning: Managing multiple documentation versions with mike
  • Static Site Generation: Build-time documentation processing and optimization
  • Information Architecture: Organizing documentation for discoverability and usability
Documentation Orphan Cleanup | Dosu