Project Roadmap And Milestone Strategy#
The Project Roadmap and Milestone Strategy for DBSurveyor defines a version-based milestone naming strategy that guides the development of this security-focused, offline-first database analysis toolchain. The strategy employs semantic versioning with contextual descriptions to organize feature development into distinct capability phases, enabling systematic progression from minimum viable products to a production-ready system. This approach supports the project's dual-binary architecture consisting of separate collector (dbsurveyor-collect) and postprocessor (dbsurveyor) components designed for offline workflows and security-conscious operations.
The roadmap follows a progressive milestone structure: v0.1 (Collector MVP) establishes core collection infrastructure, v0.2 (Postprocessor MVP) adds documentation generation capabilities, v0.3 (Pro Features) introduces advanced visualization and analysis tools, and v1.0 (Production Release) delivers cross-platform distribution with comprehensive hardening. The project currently targets v1.0 with SQL database coverage spanning PostgreSQL, MySQL, and SQLite, deferring NoSQL and enterprise database support to post-v1.0 releases.
This milestone strategy reflects a deliberate prioritization of security, offline operation, and core functionality over feature breadth, aligning with the project's non-negotiable guarantees of zero telemetry, no credential exposure in outputs, and complete air-gap compatibility. The phased approach enables incremental delivery while maintaining strict quality standards including zero-warning compilation, 70%+ test coverage, and comprehensive security scanning at every stage.
Milestone Phases#
DBSurveyor's development roadmap is organized into four primary milestone phases, each building distinct capabilities that progressively advance the project toward production readiness. The version-based milestone naming strategy uses semantic versioning combined with descriptive names to communicate both the technical version and the functional focus of each release.
v0.1 - Collector MVP#
The Collector MVP milestone establishes the foundational infrastructure for database connectivity and metadata extraction. This phase focuses on building the core collection capabilities that enable DBSurveyor to connect to multiple database engines and extract comprehensive schema information.
Key Features:
- Database connectivity and metadata extraction across multiple engine types
- Multi-engine support for PostgreSQL, MySQL, SQLite, and MongoDB
- Basic schema collection functionality including tables, columns, indexes, and constraints
- Structured output generation in standardized JSON format with schema validation
v0.2 - Postprocessor MVP#
The Postprocessor MVP milestone introduces analysis and documentation generation capabilities built on the metadata collected in v0.1. This phase completes the dual-binary architecture by implementing the postprocessor component that transforms raw metadata into actionable documentation.
Key Features:
- Documentation generation from collected metadata in multiple formats
- Markdown and HTML report generation with customizable templates
- SQL DDL reconstruction capabilities to recreate schema definitions
- Privacy controls and redaction features for sensitive data protection
v0.3 - Pro Features#
The Pro Features milestone extends the platform with advanced analysis and visualization capabilities designed for professional use cases requiring enhanced schema understanding and compliance workflows.
Key Features:
- Advanced schema diagramming using Mermaid and D2 diagram formats
- Data classification and compliance reporting for regulatory requirements
- Interactive HTML exports with search and filtering capabilities
- Plugin system architecture enabling extensibility and custom integrations
v1.0 - Production Release#
The Production Release milestone represents the first production-ready version with comprehensive hardening, optimization, and distribution infrastructure. This phase prioritizes stability, performance, and ease of deployment across multiple platforms.
Key Features:
- Cross-platform packaging and distribution for Linux, macOS, and Windows
- Comprehensive documentation including user guides, API references, and examples
- Security hardening and audit completion with formal security review
- Performance optimization and tuning to meet established benchmarks
v1.0 Success Criteria:
The v1.0 release has clearly defined must-have features that establish the minimum requirements for production deployment:
- PostgreSQL adapter with 100% feature completion including all schema objects and metadata
- MySQL and SQLite adapters with core schema collection capabilities
- Multi-database selection via flag-driven filters with glob pattern support (
--all-databases,--include-system-databases,--exclude-databases) - Partial-failure behavior with machine-actionable failure metadata for automation workflows
- Automation-friendly exit codes with default success on partial success and optional strict mode
- Postprocessor generating offline Markdown documentation and SQL DDL reconstruction
- AES-GCM encryption with Argon2id key derivation for sensitive output protection
- 70% test coverage minimum with testcontainers integration for comprehensive testing
Post-v1.0 Roadmap#
The phased milestone approach extends beyond v1.0 with additional capability expansions planned for future releases:
v1.5 - NoSQL Expansion (7-9 months total):
- MongoDB adapter with full NoSQL schema collection support
- Document-oriented database analysis capabilities
- NoSQL-specific documentation and diagram generation
v2.0 - Enterprise Databases (10-13 months total):
- SQL Server adapter with comprehensive T-SQL support
- Oracle adapter for enterprise database environments
- Mature plugin system architecture for third-party extensions
Deferred Features:
The following features are intentionally deferred to post-v1.0 releases to maintain focus on core SQL database functionality:
- Pro-tier visual diagrams and interactive HTML reports
- Advanced PII detection with machine learning
- Data quality metrics and anomaly detection
Implementation Status#
The project tracks implementation progress through clear categorization of completed features and work in development, providing transparency into the current state of each milestone phase.
Completed Features#
Core Infrastructure:
The foundational architecture and security infrastructure have been fully implemented, establishing the project's dual-binary design pattern and operational guarantees. The initial project setup followed a security-first approach from the outset. The dual-binary architecture comprising collector and postprocessor components was implemented in PR #42 (merged September 1, 2025), enabling the separation of data collection from analysis for offline workflows. Offline-only operation with no telemetry has been architected as a non-negotiable guarantee.
Database Support:
Multi-database adapter support has been progressively implemented with focus on SQL databases for the v1.0 milestone. The PostgreSQL adapter with comprehensive schema collection capabilities was completed first, establishing the adapter interface pattern. SQLite adapter support followed, validating the adapter abstraction across different SQL dialects. In January 2026, PR #51 introduced unified database adapters with feature flag architecture supporting 6 engines (PostgreSQL, MySQL, SQLite, SQL Server, Oracle, and MongoDB), establishing the foundation for selective compilation and modular engine support.
Security Features:
Comprehensive security features protect sensitive data throughout the collection and storage lifecycle. AES-GCM encryption with Argon2id key derivation provides authenticated encryption for output files. Credential sanitization ensures no credentials appear in output files, implementing comprehensive pattern-based redaction. Memory-safe credential handling using the zeroize library prevents credential leakage through memory inspection.
Output and Infrastructure:
The output format and development infrastructure have been standardized to support automation and quality assurance. JSON Schema validation for all outputs using format version 1.0 ensures consistent, machine-readable metadata. Zstandard compression support reduces output file sizes by 60-80%. The CI/CD pipeline with comprehensive security scanning including CodeQL, Syft, and Grype was established early in development.
Additional capabilities have been delivered in 2026:
- Core schema discovery and data sampling engine (merged January 29, 2026) provides comprehensive introspection
- Data quality metrics with configurable thresholds (merged January 30, 2026) enables completeness, uniqueness, and consistency analysis
- GoReleaser v2 migration with multi-variant builds (merged February 2026) introduced per-database variant distributions and removed 15 redundant configuration files
In Development#
Current development work focuses on completing v1.0 milestone requirements:
- MySQL, MongoDB, and SQL Server adapters for additional database engine coverage
- Advanced HTML report generation (placeholder implementation)
- SQL DDL reconstruction for schema recreation (placeholder implementation)
- Mermaid ERD diagram generation for visual schema representation (placeholder implementation)
- Multi-database collection with partial-failure handling (planned)
Feature Prioritization#
DBSurveyor employs a formal feature priority matrix to guide development resource allocation and milestone planning. Features are categorized into High, Medium, and Low priority tiers based on their criticality to core functionality, security requirements, and user value.
High Priority Features#
High priority features (F000-F007, F014-F015, F021-F023) represent the essential capabilities that define DBSurveyor's core value proposition and enable basic operational functionality. These features receive priority scheduling and blocking status for milestone completion:
- Core dual-binary architecture: Separation of collection and postprocessing for offline workflows
- Database survey and connectivity: Multi-engine connection capabilities with secure credential handling
- Portable output generation: Standardized JSON format with schema validation
- Offline operation mode: Zero network dependencies after installation
- Pluggable database engines: Modular adapter system with feature flag compilation
- Throttling and compression capabilities: Performance optimization and storage efficiency
Medium Priority Features#
Medium priority features (F013, F016-F019) extend the platform with valuable analysis and documentation capabilities that enhance but do not define core functionality:
- SQL reconstruction: DDL generation from collected metadata
- Report and diagram generation modes: Multiple output format support
- Pro features foundation: Advanced visualization and analysis infrastructure
- Data sampling with privacy controls: Representative data extraction with redaction
Low Priority Features#
Low priority features (F020) provide enhanced user experience and advanced capabilities that can be deferred to later milestones without impacting core functionality:
- HTML output generation: Pro-tier feature for interactive reports
- Advanced visualizations: Enhanced diagram types and customization options
- Enhanced privacy features: Machine learning-based PII detection and classification
Development Strategy#
The development strategy for DBSurveyor balances rapid iteration with rigorous quality standards through a single-maintainer model, automated tooling, and phased timeline planning.
Timeline and Phases#
The phased milestone approach establishes a clear timeline with defined effort estimates for each major release:
- v0.5 (Phase 1): PostgreSQL Foundation - 2-3 months dedicated to establishing comprehensive PostgreSQL adapter support with testcontainers integration
- v1.0 (Phase 2): SQL Database Coverage - 5-6 months total (13-19 weeks of development effort) to complete MySQL and SQLite adapters alongside core postprocessor functionality
- v1.5 (Phase 3): NoSQL Expansion - 7-9 months total including MongoDB adapter and document-oriented database analysis capabilities
- v2.0 (Phase 4): Enterprise Databases - 10-13 months total encompassing SQL Server and Oracle adapters plus mature plugin architecture
This timeline reflects a conservative estimate accounting for comprehensive testing, security review, and documentation requirements at each phase.
Team Structure#
DBSurveyor operates under a single-maintainer model with UncleSp1d3r serving as the primary maintainer. This organizational structure provides several strategic advantages:
- Streamlined decision-making: Technical decisions require no multi-approval processes, enabling rapid response to design challenges
- Direct push access: Maintainer has direct commit access for rapid iteration on urgent fixes and feature development
- Optimized development cycles: Immediate feedback loops without coordination overhead accelerate development velocity
While this model optimizes for speed and decisiveness, it concentrates project knowledge and decision-making authority in a single person, creating potential continuity risks that are mitigated through comprehensive documentation and automated quality gates.
Quality Standards#
The project enforces strict quality requirements that apply to all code before merge, ensuring consistent quality across the codebase:
Rust Quality Gate:
Zero-warning compilation is enforced via cargo clippy -- -D warnings, treating all clippy warnings as compilation errors. This prevents warning debt accumulation and maintains code quality standards.
Additional Quality Requirements:
- Formatting validation:
cargo fmt --checkensures consistent code style across all Rust source files - Test suite passage: Complete test suite must pass with no failures, maintaining minimum 70% coverage for v1.0 milestone
- Security scans: CodeQL static analysis, Syft SBOM generation, and Grype vulnerability scanning run on every pull request
- License compliance: FOSSA validation ensures all dependencies meet license requirements
These automated quality gates run in CI/CD pipelines, providing fast feedback on quality violations without manual review overhead.
Code Review Process#
DBSurveyor uses CodeRabbit.ai as the primary code review tool rather than traditional human review for most changes. CodeRabbit provides:
- Automated line-by-line code analysis: Every changed line receives automated review feedback
- Conversational review feedback: Natural language comments explain issues and suggest improvements
- Security analysis: Identifies potential security vulnerabilities and suggests mitigations
- Best practice enforcement: Validates adherence to Rust idioms and project conventions
Notably, GitHub Copilot automatic reviews are explicitly disabled in favor of CodeRabbit's more comprehensive analysis capabilities.
Performance Requirements#
DBSurveyor establishes quantitative performance targets to ensure responsive operation and efficient resource utilization across various deployment scenarios.
General Performance Targets#
- CLI startup time: < 100ms from invocation to ready state, ensuring minimal user wait time
- Collection speed: < 10 seconds for databases with < 1000 tables, enabling rapid schema capture
- Output file sizes: < 10MB when possible, facilitating efficient storage and transfer
- Postprocessor speed: < 500ms on small/medium databases, providing near-instant documentation generation
- Memory usage: < 1GB for typical workloads, supporting deployment on resource-constrained systems
Benchmark Targets for Milestone Planning#
The project defines specific benchmark targets that inform milestone completion criteria and performance regression testing:
- Single database (100 tables): < 5 seconds total collection time
- Multi-database (10 databases, 50 tables each): < 30 seconds for complete collection across all databases
- Large database (1000 tables): < 60 seconds for comprehensive schema capture
These benchmarks account for network latency, query execution time, and metadata serialization overhead under typical production conditions.
Security Architecture#
Security is architected as a set of non-negotiable guarantees rather than configurable options, simplifying the security model and eliminating misconfiguration risks.
Critical Security Guarantees#
DBSurveyor provides four fundamental security guarantees that define its security posture:
- Offline-Only Operation: Zero network calls after initial installation, preventing data exfiltration and eliminating dependency on external services
- No Telemetry: Absolutely no data collection, usage tracking, or analytics of any kind
- No Credentials in Outputs: Database credentials never appear in any output files, regardless of format or encryption status
- Airgap Compatibility: Full functionality in air-gapped environments without internet access or external dependencies
Additional Security Requirements#
Beyond the core guarantees, additional security mechanisms protect data throughout its lifecycle:
- AES-GCM authenticated encryption: Authenticated encryption with random nonces prevents tampering and ensures confidentiality
- Argon2id key derivation: Memory-hard key derivation (256-bit keys, 64 MiB memory, 3 iterations) resists brute-force attacks
- Secure memory handling: Zeroing on deallocation using the zeroize library prevents credential recovery from memory dumps
- Pattern-based sensitive data redaction: Configurable patterns identify and redact PII, API keys, and other sensitive data
- Restrictive file permissions: Output files created with 0600 permissions (owner read/write only) prevent unauthorized access
Technical Stack#
The technical stack emphasizes security, performance, and cross-platform compatibility through careful dependency selection and build configuration.
Language and Runtime#
DBSurveyor is implemented in Rust 1.93.1 with a Minimum Supported Rust Version (MSRV) of 1.77+. Rust provides memory safety without garbage collection, zero-cost abstractions, and strong type safety that aligns with the project's security-first approach.
Key Dependencies#
Command-Line Interface:
clap v4+: Type-safe argument parsing with derive macros for maintainable CLI definitions
Async Runtime:
tokio: Industry-standard async runtime providing efficient concurrent I/O for database operations
Database Drivers:
sqlx: Pure Rust SQL database driver for PostgreSQL, MySQL, and SQLite with compile-time query checkingtiberius: Async TDS (Tabular Data Stream) implementation for SQL Server connectivitymongodb: Official MongoDB driver for Rust with full async support
Cryptography:
aes-gcm: AES-GCM authenticated encryption implementationring: Cryptographic primitives for key derivation and random number generation
Compression and Serialization:
zstd: Zstandard compression bindings achieving 60-80% size reductionserdeecosystem: Serialization framework with JSON, YAML, and custom format support
Testing:
cargo-nextest: Next-generation test runner with improved parallelization and outputtestcontainers: Docker-based integration testing with real database instances
Build System and Distribution#
The project uses GoReleaser v2 with cargo-zigbuild for cross-platform build and distribution. The GoReleaser v2 migration completed in February 2026 fully replaced cargo-dist with a multi-variant build strategy:
- Multi-variant builds: 7 distinct binaries per platform (1 postprocessor + 6 collector variants: all, postgresql, mysql, sqlite, mongodb, mssql)
- Cross-compilation: cargo-zigbuild handles cross-compilation for all targets without manual toolchain installation
- Target platforms: 6 targets covering Linux (x86_64 gnu/musl, aarch64), macOS (x86_64, aarch64 Apple Silicon), Windows (x86_64)
- Security signing: Cosign keyless signing on all release artifacts using GitHub Actions OIDC
- SBOM generation: Syft generates Software Bill of Materials for each archive
This architecture enables users to download only the database drivers they need, reducing binary size and dependency footprint. Each variant includes the shared postprocessor binary alongside the variant-specific collector binary with compression and encryption support.
Release Engineering#
DBSurveyor's release engineering process emphasizes automation, reproducibility, and security through standardized CI/CD pipelines and output format specifications.
CI/CD Pipeline#
The project implements the EvilBit Labs Pipeline Standard using GitHub Actions with just task automation for consistent local and CI execution:
just test # Run all tests (unit + integration)
just lint # Run cargo clippy with strict warnings
just format # Code formatting validation
just build-release # Cross-platform builds
just package # Distribution packaging
The just command runner provides a standardized interface for common development tasks, ensuring identical behavior between developer workstations and CI environments. This consistency reduces "works on my machine" issues and streamlines contributor onboarding.
CI/CD Pipeline Features:
The pipeline incorporates comprehensive automation and security scanning:
- Semantic versioning with Release Please: Automated changelog generation and version bumping based on conventional commit messages
- Signed releases with Cosign keyless signing: Cryptographic signatures using GitHub Actions OIDC enable supply chain verification without key management
- SBOM generation with Syft: Software Bill of Materials for all archives supporting security auditing and compliance
- Automated dependency updates via Renovate: Continuous dependency monitoring with automated pull requests for updates
- Security scanning suite: CodeQL static analysis, Syft SBOM generation, and Grype vulnerability scanning on every build
- Coverage reporting: Codecov integration tracks test coverage trends and enforces minimum coverage requirements
The GoReleaser v2 implementation builds 42 binaries per release (7 variants × 6 targets), distributing them across multiple Linux package formats (deb, rpm, apk) and the EvilBit-Labs Homebrew tap. Each variant enables users to download only the database drivers they need.
Output Format Strategy#
DBSurveyor defines standardized output formats with explicit versioning ("format_version": "1.0") to support format evolution while maintaining backward compatibility:
.dbsurveyor.json: Uncompressed JSON metadata in human-readable format for easy inspection and scripting.dbsurveyor.json.zst: Zstandard compressed format achieving 60-80% size reduction for efficient storage and transfer.dbsurveyor.enc: AES-GCM encrypted format with embedded KDF parameters, enabling secure storage of sensitive metadata
The format versioning strategy allows tools to detect and handle format changes gracefully, preventing silent data corruption or misinterpretation.
Standards Compliance#
DBSurveyor achieves full compliance with all EvilBit Labs organizational standards, demonstrating alignment with enterprise-grade development practices:
- Pipeline Standard: GitHub Actions CI/CD with
justtask automation and reproducible builds - Security Standard: No telemetry, credential protection, secure defaults, and comprehensive threat modeling
- Documentation Standard: User guides, API documentation, architecture decision records, and inline code documentation
- Testing Standard: >80% code coverage target (70% minimum for v1.0), integration tests with testcontainers, property-based testing
- Release Standard: Semantic versioning, signed releases with Cosign, SBOM generation, automated changelogs
- Offline Standard: Complete air-gap operation with zero network dependencies post-installation
- Cross-Platform Standard: First-class support for Linux, macOS, and Windows with consistent behavior across platforms
The project documentation indicates no standard deviations have been identified, reflecting comprehensive adherence to organizational requirements.
Future Development Guidance#
The project maintains explicit guidance for future feature development to ensure consistency with organizational standards:
HTTP Client Requirements:
If future features require HTTP client functionality, developers must use OpenAPI Generator for Rust client code generation rather than hand-written HTTP clients. This approach aligns with EvilBit Labs standards for type-safe, well-documented API clients and ensures API contract enforcement at compile time.
This guidance reflects lessons learned across the organization and prevents common pitfalls associated with manually implementing HTTP clients, such as incomplete error handling, undocumented edge cases, and API drift detection.
Relevant Code Files#
The following files contain the authoritative documentation and implementation of the roadmap and milestone strategy:
| File Path | Description |
|---|---|
project_specs/requirements.md | Complete milestone structure (v0.1-v1.0), feature requirements, priority matrix, performance targets, and compliance standards |
CHANGELOG.md | Implementation history tracking completed features, in-development work, and version progression |
CONTRIBUTORS.md | Development approach, team structure, quality standards, code review process, and contributor guidelines |
.github/workflows/ | CI/CD pipeline implementation with security scanning, testing, and release automation |
Justfile | Task automation recipes for build, test, format, lint, and release operations |
.goreleaser.yml | Cross-platform release configuration for binary distribution |
Related Topics#
- Security-First Database Tooling: Architectural patterns for building database analysis tools with non-negotiable security guarantees (offline-only operation, zero telemetry, credential protection) for use in security-sensitive and air-gapped environments
- Progressive Milestone Planning: Version-based roadmap strategies that structure feature development through distinct capability phases (MVP → Documentation → Pro Features → Production) rather than incremental feature iteration
- Dual-Binary Architecture Pattern: Design pattern of splitting functionality into separate collector and postprocessor binaries to enable offline workflows, process isolation, and air-gapped operation for database analysis tools
- Feature Flag Architecture: Modular compilation strategies using Rust feature flags to enable selective database engine inclusion and reduce binary size for specific deployment scenarios
- Multi-Database Collection: Patterns for surveying multiple databases with partial-failure handling, machine-actionable failure metadata, and automation-friendly exit codes
- Performance Benchmarking for Database Tools: Establishing quantitative performance targets and regression testing strategies for database metadata collection and analysis tools