Documents
Functional Requirements Specification
Functional Requirements Specification
Type
Topic
Status
Published
Created
Mar 4, 2026
Updated
Mar 5, 2026
Created by
Dosu Bot
Updated by
Dosu Bot

Functional Requirements Specification - DBSurveyor#

Lead Section#

The Functional Requirements Specification for DBSurveyor defines 24 comprehensive requirements (F000-F023) organized into seven distinct categories that govern the design, implementation, and operation of a security-first, dual-binary database surveying toolchain. These requirements specify the complete operational envelope for surveying unknown database systems while maintaining strict security, privacy, and offline-operation guarantees.

DBSurveyor operates as a two-part toolchain with separate collector (dbsurveyor-collect) and postprocessor (dbsurveyor) binaries, enabling flexible workflows where data collection occurs in secure environments and documentation generation happens offline. The requirements encompass all aspects of functionality from system architecture and authentication through metadata collection, data profiling, core processing, reporting, and advanced Pro features.

The specification prioritizes critical security guarantees including offline-only operation, zero telemetry, no credentials in outputs, and airgap compatibility, making it suitable for red/blue team operators, DBAs, data analysts, developers, security analysts, and compliance officers working with sensitive database environments.

System Architecture Requirements (F000-F002)#

F000: Dual-Binary Architecture#

F000 mandates a dual-binary architecture with independent collector (dbsurveyor-collect) and postprocessor (dbsurveyor) executables. This separation enables:

The architecture is implemented through three main crates: dbsurveyor-core (shared library), dbsurveyor-collect (collection binary), and dbsurveyor (documentation binary).

F001: Multi-Database Connection Support#

F001 requires multi-database connection support via Rust traits for PostgreSQL, MySQL, SQLite, SQL Server, and MongoDB. Implementation includes:

F002: Feature Flag-Based Driver Selection#

F002 specifies feature flag-based database driver selection to minimize binary size and dependencies. This enables:

Authentication and Security Requirements (F003-F005)#

F003: Multiple Authentication Methods#

F003 mandates multiple authentication methods including username/password, certificates, and environment variables. This provides flexibility for various operational scenarios while supporting:

F004: AES-GCM Encryption#

F004 requires AES-GCM encryption of output files with random nonces and embedded KDF parameters. Implementation details:

F005: Zero Telemetry Operation#

F005 mandates zero telemetry operation with no credential storage in outputs or logs. Security guarantees:

Metadata Collection Requirements (F006-F010)#

F006: Comprehensive Schema Discovery#

F006 requires comprehensive schema discovery and enumeration across all supported database engines. The system collects:

F007: Table Structure Analysis#

F007 mandates table structure analysis including columns, data types, constraints, indexes, and foreign keys. This provides complete structural understanding of each table entity including:

  • Column names, types, nullability, and default values
  • Primary key constraints
  • Unique constraints
  • Foreign key relationships and referential integrity rules
  • Index definitions for performance optimization

F008: Database-Specific Object Extraction#

F008 requires database-specific object extraction including views, stored procedures, functions, triggers, and user-defined types. The system captures:

F009: Incremental Collection with Change Detection#

F009 specifies incremental collection with change detection and delta updates for large databases. This enables efficient re-surveying of large database systems by:

  • Detecting schema changes since last collection
  • Updating only modified objects
  • Reducing collection time for databases with thousands of tables
  • Minimizing database load during repeated surveys

F010: Connection Pooling and Graceful Degradation#

F010 requires connection pooling, timeout handling, and graceful degradation on partial failures. Implementation includes:

Data Profiling and Analysis Requirements (F011-F013)#

F011: Statistical Analysis#

F011 requires statistical analysis including row counts, table sizes, and column value distributions. This provides quantitative understanding of:

  • Table row counts and approximate sizes
  • Column cardinality (distinct value counts)
  • Value distribution patterns
  • Data volume metrics for capacity planning

F012: Data Quality Metrics#

F012 mandates data quality metrics including completeness, consistency, and uniqueness with configurable thresholds. Analysis includes:

  • Completeness: percentage of non-null values
  • Consistency: adherence to expected patterns
  • Uniqueness: duplicate detection
  • Configurable quality thresholds for automated reporting

F013: Configurable Sample Data Extraction#

F013 requires configurable sample data extraction with privacy controls and pattern-based redaction. Features include:

Core Processing Requirements (F014-F016)#

F014: Portable JSON Output Format#

F014 mandates generation of .dbsurveyor.json output that is fully portable and structured with "format_version": "1.0" specification. The output includes:

F015: Postprocessor for Report Generation#

F015 requires a postprocessor that loads .dbsurveyor.json and generates markdown/JSON reports. Implementation:

F016: SQL Reconstruction Mode#

F016 specifies SQL reconstruction mode that outputs CREATE TABLE definitions based on discovered schema. This enables:

  • SQL DDL script generation
  • Schema replication in different environments
  • Documentation of database structure as executable SQL
  • Database migration assistance

Reporting and Visualization Requirements (F017-F019)#

F017: Report Mode#

F017 mandates report mode that renders a browsable table-of-contents style markdown document. Output includes:

  • Markdown output
  • Hierarchical table of contents
  • Browsable navigation structure
  • Human-readable schema documentation

F018: Diagram Mode (Pro)#

F018 specifies diagram mode (Pro feature) that generates Mermaid.js or D2 visual schema diagrams. Visualization includes:

  • Mermaid ERD diagrams
  • D2 diagram format support
  • Entity-relationship visualizations
  • Table relationship mapping

F019: Classify Mode (Pro)#

F019 requires classify mode (Pro feature) that tags likely PII/PCI fields based on regex or naming heuristics. Classification includes:

  • PII (Personally Identifiable Information) detection
  • PCI (Payment Card Industry) data identification
  • Regex-based pattern matching
  • Naming convention heuristics
  • Automated compliance reporting

Pro Features and Advanced Capabilities (F020-F023)#

F020: HTML Output (Pro)#

F020 mandates HTML output (Pro feature) with standalone styled HTML report generation with search/filter. Features:

  • HTML output
  • Standalone single-file distribution
  • Built-in search and filtering
  • Styled, interactive documentation

F021: Offline-Only Operation#

F021 requires that all features must function without network access. This guarantees:

F022: Pluggable Database Engine Support#

F022 specifies support for pluggable DB engines via a Rust trait-based adapter system. The DatabaseAdapter trait defines the interface for all database implementations, including a sample_table(TableRef, &SamplingConfig) method for per-table data sampling. Adapters may be:

F023: Additional Advanced Capabilities#

F023 encompasses additional capabilities:

  • Data Redaction: Allow user to redact sensitive sample values in postprocessor
  • NoSQL Support: MVP must include at least one NoSQL database with schema-like inspection support (MongoDB)
  • Throttling: Support configurable throttling rate in collector to reduce detection risk and avoid slow logs
  • Compression and Encryption: Support optional output compression (e.g., gzip) and encryption (e.g., AES-GCM) via CLI flags

Requirements Priority Levels#

The specification defines three priority levels:

High Priority (F000-F007, F014, F015, F021-F023)#

Core architectural, security, and basic collection requirements essential for MVP:

  • System architecture foundation
  • Security guarantees
  • Basic metadata collection
  • Core processing and output
  • Offline operation
  • Pluggable engine support

Medium Priority (F013, F016-F019)#

Enhanced functionality for practical operational use:

  • Sample data extraction with privacy
  • SQL reconstruction
  • Report and diagram generation
  • PII/PCI classification

Low Priority (F020, Advanced Diagrams)#

Advanced features for premium/Pro offering:

  • Standalone HTML reports
  • Advanced visualization options

Performance Requirements#

The specification includes measurable performance targets:

Security Best Practices#

The requirements enforce comprehensive security best practices:

Implementation Architecture#

The requirements are implemented through comprehensive architectural components:

Workspace Structure#

  • dbsurveyor/ - Main postprocessor binary
  • dbsurveyor-core/ - Shared library and core modules
  • dbsurveyor-collect/ - Collection binary
  • Cargo.toml - Workspace configuration

Core Modules#

  • adapters - Database-specific implementations
  • error - Comprehensive error hierarchy
  • models - Unified data structures
  • security - Encryption and credential management

Testing Strategy#

Comprehensive test organization:

  • Unit tests
  • Integration tests (PostgreSQL, MySQL, SQLite, MongoDB)
  • Security tests (credential protection, encryption validation, offline operation)
  • Performance tests

Configuration Management#

Configuration loaded from multiple sources with clear precedence:

  1. Command line arguments (highest priority)
  2. Environment variables
  3. Project configuration (.dbsurveyor.toml)
  4. User configuration (~/.config/dbsurveyor/config.toml)
  5. Default values (lowest priority)

Development Milestones#

The requirements support a phased development approach:

  • v0.1 (Collector MVP): Core collection functionality (F000-F007, F014)
  • v0.2 (Postprocessor MVP): Basic reporting (F015-F017)
  • v0.3 (Pro Features): Advanced capabilities (F018-F023)
  • v1.0 (Production Release): Complete specification compliance

Relevant Code Files#

The functional requirements are implemented across the following key code files in the DBSurveyor repository:

File PathDescriptionRelated Requirements
project_specs/requirements.mdComplete functional requirements specification documentF000-F023 (All)
Cargo.tomlWorkspace configuration with feature flags for database adaptersF000, F002
dbsurveyor-collect/src/main.rsCollector binary with CLI, connection testing, and schema collection orchestrationF000, F003, F006-F013
dbsurveyor-collect/Cargo.tomlFeature flags for database drivers (postgresql, mysql, sqlite, mongodb, mssql)F002
dbsurveyor/src/main.rsPostprocessor binary with subcommands for generate, analyze, SQL DDL reconstruction, and validationF000, F015-F020
dbsurveyor-core/src/lib.rsCore library module organization and public API exportsF000, F001
dbsurveyor-core/Cargo.tomlFeature flags for database adapters, compression, and encryptionF002, F004, F023
dbsurveyor-core/src/models.rsUnified data structures (DatabaseSchema, Table, Column, UnifiedDataType, Index, etc.)F001, F006-F008, F014
dbsurveyor-core/src/error.rsComprehensive error hierarchy with credential sanitizationF005, F010
dbsurveyor-core/src/adapters/mod.rsDatabaseAdapter trait definition and create_adapter factory with feature-gated instantiationF001, F022
dbsurveyor-core/src/adapters/postgres/PostgreSQL adapter with connection pooling, schema collection, type mapping, sampling, views, routines, triggersF001, F003, F006-F010
dbsurveyor-core/src/adapters/mysql/MySQL adapter with connection handling, schema queries, type conversion, samplingF001, F003, F006-F010
dbsurveyor-core/src/adapters/sqlite/SQLite adapter with connection management, schema introspection, type mapping, samplingF001, F003, F006-F010
dbsurveyor-core/src/security/mod.rsSecurity module with credential protection and connection string parsingF003, F005
dbsurveyor-core/src/security/credentials.rsZeroizing credential container implementationF005
dbsurveyor-core/src/security/encryption.rsAES-GCM-256 encryption with Argon2id key derivationF004, F023
dbsurveyor-core/src/quality/Quality analysis module with completeness, consistency, uniqueness, and anomaly detectionF011-F013
dbsurveyor-core/src/validation.rsJSON schema validation for output filesF014
  • Database Schema Discovery
  • Database Reverse Engineering
  • SQL Generation and DDL Extraction
  • Data Classification and PII Detection
  • Security and Encryption Standards
  • Offline-First Architecture Patterns
  • Database Adapter Design Patterns
Functional Requirements Specification | Dosu