Functional Requirements Specification - DBSurveyor#
Lead Section#
The Functional Requirements Specification for DBSurveyor defines 24 comprehensive requirements (F000-F023) organized into seven distinct categories that govern the design, implementation, and operation of a security-first, dual-binary database surveying toolchain. These requirements specify the complete operational envelope for surveying unknown database systems while maintaining strict security, privacy, and offline-operation guarantees.
DBSurveyor operates as a two-part toolchain with separate collector (dbsurveyor-collect) and postprocessor (dbsurveyor) binaries, enabling flexible workflows where data collection occurs in secure environments and documentation generation happens offline. The requirements encompass all aspects of functionality from system architecture and authentication through metadata collection, data profiling, core processing, reporting, and advanced Pro features.
The specification prioritizes critical security guarantees including offline-only operation, zero telemetry, no credentials in outputs, and airgap compatibility, making it suitable for red/blue team operators, DBAs, data analysts, developers, security analysts, and compliance officers working with sensitive database environments.
System Architecture Requirements (F000-F002)#
F000: Dual-Binary Architecture#
F000 mandates a dual-binary architecture with independent collector (dbsurveyor-collect) and postprocessor (dbsurveyor) executables. This separation enables:
- Collection operations in secure/restricted environments
- Offline documentation generation without database access
- Independent versioning and deployment of collection vs. reporting capabilities
- Airgap deployment scenarios where collection and documentation occur in different security zones
The architecture is implemented through three main crates: dbsurveyor-core (shared library), dbsurveyor-collect (collection binary), and dbsurveyor (documentation binary).
F001: Multi-Database Connection Support#
F001 requires multi-database connection support via Rust traits for PostgreSQL, MySQL, SQLite, SQL Server, and MongoDB. Implementation includes:
- Database adapters implementing a common trait for unified access
- Factory pattern providing database-agnostic instantiation
- UnifiedDataType mapping for cross-database type normalization
- Support for both SQL (relational) and NoSQL (MongoDB) database paradigms
- DatabaseAdapter trait includes a
sample_table(TableRef, &SamplingConfig)method for per-table data sampling
F002: Feature Flag-Based Driver Selection#
F002 specifies feature flag-based database driver selection to minimize binary size and dependencies. This enables:
- Default build including only PostgreSQL and SQLite drivers
- Full build with all databases (PostgreSQL, MySQL, SQLite, MongoDB, MSSQL)
- Custom builds with only required database support
- Reduced attack surface through minimal dependency inclusion
Authentication and Security Requirements (F003-F005)#
F003: Multiple Authentication Methods#
F003 mandates multiple authentication methods including username/password, certificates, and environment variables. This provides flexibility for various operational scenarios while supporting:
- Username/password authentication
- TLS/SSL certificate-based authentication
- Environment variable-based credential passing
- Connection string sanitization in all error messages and logs
F004: AES-GCM Encryption#
F004 requires AES-GCM encryption of output files with random nonces and embedded KDF parameters. Implementation details:
- AES-GCM-256 encryption with 256-bit key, 96-bit nonce, 128-bit authentication tag
- Argon2id key derivation with 64 MiB memory cost, 3 iterations, 4 parallel threads
- Cryptographically secure random nonces that are never reused
- Authentication tags to prevent tampering
F005: Zero Telemetry Operation#
F005 mandates zero telemetry operation with no credential storage in outputs or logs. Security guarantees:
- NO NETWORK CALLS beyond the collector's connection to the target database
- NO TELEMETRY: Zero data collection, usage tracking, or external reporting
- NO CREDENTIALS IN OUTPUTS: Database credentials never stored, logged, or included in any output files
- Credentials automatically zeroed using the
zeroizecrate - Credential structures never serialized to disk
Metadata Collection Requirements (F006-F010)#
F006: Comprehensive Schema Discovery#
F006 requires comprehensive schema discovery and enumeration across all supported database engines. The system collects:
- Tables with column definitions and data types
- Views, indexes, and constraints
- Collection metadata including timestamps and schema versions
F007: Table Structure Analysis#
F007 mandates table structure analysis including columns, data types, constraints, indexes, and foreign keys. This provides complete structural understanding of each table entity including:
- Column names, types, nullability, and default values
- Primary key constraints
- Unique constraints
- Foreign key relationships and referential integrity rules
- Index definitions for performance optimization
F008: Database-Specific Object Extraction#
F008 requires database-specific object extraction including views, stored procedures, functions, triggers, and user-defined types. The system captures:
- Database procedures and functions
- Triggers and custom types
- Engine-specific objects that extend base SQL functionality
F009: Incremental Collection with Change Detection#
F009 specifies incremental collection with change detection and delta updates for large databases. This enables efficient re-surveying of large database systems by:
- Detecting schema changes since last collection
- Updating only modified objects
- Reducing collection time for databases with thousands of tables
- Minimizing database load during repeated surveys
F010: Connection Pooling and Graceful Degradation#
F010 requires connection pooling, timeout handling, and graceful degradation on partial failures. Implementation includes:
- Connection pooling with security-focused defaults: 30s connect timeout, 30s query timeout, max 10 connections, read-only mode
- Comprehensive error hierarchy with connection errors, collection errors, configuration errors, encryption errors, and I/O errors
- Partial collection success when some objects fail
- Detailed error reporting without credential leakage
Data Profiling and Analysis Requirements (F011-F013)#
F011: Statistical Analysis#
F011 requires statistical analysis including row counts, table sizes, and column value distributions. This provides quantitative understanding of:
- Table row counts and approximate sizes
- Column cardinality (distinct value counts)
- Value distribution patterns
- Data volume metrics for capacity planning
F012: Data Quality Metrics#
F012 mandates data quality metrics including completeness, consistency, and uniqueness with configurable thresholds. Analysis includes:
- Completeness: percentage of non-null values
- Consistency: adherence to expected patterns
- Uniqueness: duplicate detection
- Configurable quality thresholds for automated reporting
F013: Configurable Sample Data Extraction#
F013 requires configurable sample data extraction with privacy controls and pattern-based redaction. Features include:
- Optional data samples via pagination and chunk processing
- Configurable sample size limits
- Pattern-based redaction for sensitive data
- Privacy-aware sampling strategies
Core Processing Requirements (F014-F016)#
F014: Portable JSON Output Format#
F014 mandates generation of .dbsurveyor.json output that is fully portable and structured with "format_version": "1.0" specification. The output includes:
- Unified data structures including DatabaseSchema with tables, views, indexes, constraints, procedures, functions, triggers, custom types, samples, and collection metadata
- Format version for backward compatibility
- Complete serialization of discovered schema
- Platform-independent JSON encoding
- TableSample includes an optional
sample_statusfield of typeSampleStatus(enum values:Complete,PartialRetry,Skipped) for tracking sampling outcomes, which is omitted from JSON when not set for backward compatibility
F015: Postprocessor for Report Generation#
F015 requires a postprocessor that loads .dbsurveyor.json and generates markdown/JSON reports. Implementation:
- Schema analysis feeds template context to Askama template engine for multiple output formats
- Independent operation without database connectivity
- Multiple output format support
- Template-driven document generation
F016: SQL Reconstruction Mode#
F016 specifies SQL reconstruction mode that outputs CREATE TABLE definitions based on discovered schema. This enables:
- SQL DDL script generation
- Schema replication in different environments
- Documentation of database structure as executable SQL
- Database migration assistance
Reporting and Visualization Requirements (F017-F019)#
F017: Report Mode#
F017 mandates report mode that renders a browsable table-of-contents style markdown document. Output includes:
- Markdown output
- Hierarchical table of contents
- Browsable navigation structure
- Human-readable schema documentation
F018: Diagram Mode (Pro)#
F018 specifies diagram mode (Pro feature) that generates Mermaid.js or D2 visual schema diagrams. Visualization includes:
- Mermaid ERD diagrams
- D2 diagram format support
- Entity-relationship visualizations
- Table relationship mapping
F019: Classify Mode (Pro)#
F019 requires classify mode (Pro feature) that tags likely PII/PCI fields based on regex or naming heuristics. Classification includes:
- PII (Personally Identifiable Information) detection
- PCI (Payment Card Industry) data identification
- Regex-based pattern matching
- Naming convention heuristics
- Automated compliance reporting
Pro Features and Advanced Capabilities (F020-F023)#
F020: HTML Output (Pro)#
- HTML output
- Standalone single-file distribution
- Built-in search and filtering
- Styled, interactive documentation
F021: Offline-Only Operation#
F021 requires that all features must function without network access. This guarantees:
- AIRGAP COMPATIBLE: Full functionality in air-gapped environments
- Complete separation of collection phase from documentation generation
- No external dependencies during operation
- Security-critical deployment compatibility
F022: Pluggable Database Engine Support#
F022 specifies support for pluggable DB engines via a Rust trait-based adapter system. The DatabaseAdapter trait defines the interface for all database implementations, including a sample_table(TableRef, &SamplingConfig) method for per-table data sampling. Adapters may be:
- Compiled-in via feature flags
- Loaded as WASM modules
- Executed as stdio child-process plugins using stable JSON contract
- OutputGenerator trait with implementations for each format
F023: Additional Advanced Capabilities#
F023 encompasses additional capabilities:
- Data Redaction: Allow user to redact sensitive sample values in postprocessor
- NoSQL Support: MVP must include at least one NoSQL database with schema-like inspection support (MongoDB)
- Throttling: Support configurable throttling rate in collector to reduce detection risk and avoid slow logs
- Compression and Encryption: Support optional output compression (e.g., gzip) and encryption (e.g., AES-GCM) via CLI flags
Requirements Priority Levels#
The specification defines three priority levels:
High Priority (F000-F007, F014, F015, F021-F023)#
Core architectural, security, and basic collection requirements essential for MVP:
- System architecture foundation
- Security guarantees
- Basic metadata collection
- Core processing and output
- Offline operation
- Pluggable engine support
Medium Priority (F013, F016-F019)#
Enhanced functionality for practical operational use:
- Sample data extraction with privacy
- SQL reconstruction
- Report and diagram generation
- PII/PCI classification
Low Priority (F020, Advanced Diagrams)#
Advanced features for premium/Pro offering:
- Standalone HTML reports
- Advanced visualization options
Performance Requirements#
The specification includes measurable performance targets:
- CLI startup < 100ms
- Completion < 10s for databases with < 1000 tables
- Streaming and batch processing for memory efficiency with pagination for large tables and chunk processing
- Async/await with Tokio runtime enables concurrent operations with connection pooling
Security Best Practices#
The requirements enforce comprehensive security best practices:
- Use read-only database users with SELECT permissions only
- All database operations are read-only (SELECT/DESCRIBE only, no INSERT, UPDATE, DELETE, or DDL)
- All queries use parameterized statements to prevent SQL injection
- Use strong encryption passwords (minimum 12 characters, mix of character types)
- Set restrictive permissions on encrypted files (chmod 600)
- Use secure deletion (shred) when done with sensitive data
- TLS/SSL certificate validation is enforced for encrypted connections
- Avoid credentials in command history using environment variables
Implementation Architecture#
The requirements are implemented through comprehensive architectural components:
Workspace Structure#
dbsurveyor/- Main postprocessor binarydbsurveyor-core/- Shared library and core modulesdbsurveyor-collect/- Collection binaryCargo.toml- Workspace configuration
Core Modules#
adapters- Database-specific implementationserror- Comprehensive error hierarchymodels- Unified data structuressecurity- Encryption and credential management
Testing Strategy#
Comprehensive test organization:
- Unit tests
- Integration tests (PostgreSQL, MySQL, SQLite, MongoDB)
- Security tests (credential protection, encryption validation, offline operation)
- Performance tests
Configuration Management#
Configuration loaded from multiple sources with clear precedence:
- Command line arguments (highest priority)
- Environment variables
- Project configuration (.dbsurveyor.toml)
- User configuration (~/.config/dbsurveyor/config.toml)
- Default values (lowest priority)
Development Milestones#
The requirements support a phased development approach:
- v0.1 (Collector MVP): Core collection functionality (F000-F007, F014)
- v0.2 (Postprocessor MVP): Basic reporting (F015-F017)
- v0.3 (Pro Features): Advanced capabilities (F018-F023)
- v1.0 (Production Release): Complete specification compliance
Relevant Code Files#
The functional requirements are implemented across the following key code files in the DBSurveyor repository:
Related Topics#
- Database Schema Discovery
- Database Reverse Engineering
- SQL Generation and DDL Extraction
- Data Classification and PII Detection
- Security and Encryption Standards
- Offline-First Architecture Patterns
- Database Adapter Design Patterns