JSON Schema V1.0 Design And Validation Architecture#

Lead Section#

The JSON Schema V1.0 Design and Validation Architecture is the foundational specification for the .dbsurveyor.json format, which provides a standardized, validated representation of database schemas across PostgreSQL, MySQL, SQLite, and MongoDB. Based on the Frictionless Data Table Schema specification, the architecture implements a security-first design that prohibits credential fields, filters connection strings, and enforces comprehensive validation rules.

The architecture is built on JSON Schema draft 2020-12 with schema identifier https://dbsurveyor.dev/schemas/v1.0/dbsurveyor-schema.json, formally defined in a 254-line JSON Schema definition file. The implementation uses Rust data structures with serde serialization and provides comprehensive validation through the jsonschema crate with embedded schema definitions.

Version 1.0 is stable and fully supported, designed for evolution through additive changes with version detection, migration support, and a defined deprecation path.

Schema Structure and Design#

Root Schema Object#

Every .dbsurveyor.json file contains required fields: format_version, database_info, tables, and collection_metadata. Optional fields include views, indexes, constraints, procedures, functions, triggers, custom types, and data samples.

The root structure is defined in the DatabaseSchema struct with complete serde serialization support.

Database Information Layer#

The database_info object contains essential metadata: name, version, size_bytes, encoding, collation, owner, and access_level.

Access Levels:

Full: Complete schema access with all permissions
Limited: Partial access due to permission restrictions
None: No schema access available

Collection Status can be:

"Success" for successful collection
Failure object with detailed error context
Skipped indicator with explanation for system databases

Table and Column Structure#

Tables are defined with comprehensive metadata: name, schema, columns array, primary keys, foreign keys, indexes, and comment. The Table struct implements this specification.

Each Column specifies:

name: Column identifier
data_type: Using the unified type system
is_nullable: NULL constraint status
is_primary_key: Primary key membership
is_auto_increment: Auto-increment/serial property
ordinal_position: Column order in table
comment: Optional description

Unified Data Type System#

Type Architecture#

The UnifiedDataType enum maps database-specific types to a common representation using an enum-with-struct-variants pattern.

Type Definitions#

Basic Types (null parameters):

Boolean: True/false values
Date: Date-only values
Json: JSON/JSONB data
Uuid: UUID/GUID values

String Types:

String { max_length: Option<u32> }: Character strings with optional length limit

Numeric Types:

Date/Time Types:

DateTime { with_timezone: bool }: Date and time with optional timezone
Time { with_timezone: bool }: Time-only with optional timezone

Complex Types:

Array { element_type: Box<UnifiedDataType> }: Arrays with recursive element type specification
Binary { max_length: Option<u32> }: Binary data with optional length limit
Custom { type_name: String }: Database-specific custom types

Database-Specific Type Mapping#

PostgreSQL (type mapping implementation):

MySQL (type mapping implementation):

SQLite (type mapping implementation):

MongoDB (BSON type mapping implementation):

Constraints and Relationships#

Primary Key Structure#

Primary keys are defined with name and array of columns.

Foreign Key Specification#

Foreign keys include name, columns array, referenced_table, referenced_schema, referenced_columns, and referential actions.

Referential Actions:

Cascade: Delete/update cascades to referencing rows
SetNull: Set referencing columns to NULL
SetDefault: Set referencing columns to default values
Restrict: Prevent if references exist
NoAction: No action taken

Check Constraints#

Check constraints specify name, constraint_type, definition (constraint expression), and enforced boolean.

Index Definitions#

Indexes include name, table_name, schema, columns array (with order and nulls_order), is_unique, is_primary, index_type, and optional comment.

Index Types:

Security Constraints and Validation#

Security-First Design Principles#

The architecture implements strict security rules: no credential fields (field names cannot contain password, secret, token), no connection strings (database URLs automatically filtered), and no sensitive patterns (credential patterns rejected). All error messages are sanitized to prevent credential leakage.

Multi-Layer Security Implementation#

Field Name Validation (helpers.rs implementation):

Value-Based Credential Detection (validation.rs implementation):

Connection String Redaction (error.rs implementation):

redact_database_url() masks passwords in connection strings (e.g., postgres://user:secret@localhost/db becomes postgres://user:****@localhost/db)

Secure Credential Storage (credentials.rs implementation):

Read-Only Mode Enforcement (connection configuration):

Timeout Configuration:

Connection Pool Limits (PostgreSQL connection pool):

Validation Architecture#

Validation Implementation#

The validation module (628 lines) implements comprehensive validation.

Key Functions:

The module embeds the JSON Schema definition (lines 73-323) in the SCHEMA_V1_0 constant, eliminating external file dependencies at runtime.

Validation Process#

All output files are validated against DBSurveyor JSON Schema v1.0 through format version checking, required field validation, data type validation, security validation, and relationship validation.

Error Structure#

When validation fails, DBSurveyor provides detailed error information: path (location in schema), message (error description), value (actual value found), and suggestion (how to fix).

Warnings#

Warnings are included in collection metadata for non-critical issues like large table collection times, unsupported custom types, and temporary table detection.

Testing#

Comprehensive test coverage (906+ lines) validates various scenarios, including:

Valid schemas across all supported databases
Invalid data types and missing required fields
Unsupported format versions
Security violations: tests confirm column names like "password_hash" are allowed (legitimate metadata) while actual credential values are blocked

Multi-Database Support#

Server-Level Collections#

For server-level collections, the format supports multiple databases through a server_info object containing server_type, version, host, port, total_databases, collected_databases, system_databases_excluded, connection_user, has_superuser_privileges, and collection_mode.

The collection_mode field specifies MultiDatabase with discovered, collected, and failed counts.

Database Access Management#

Collection process respects database access permissions through access level indicators (Full, Limited, None) and can report collection failures with detailed error context or skip reasons for system databases.

Data Sampling#

Optional data samples provide insight into actual data with table_name, schema_name, rows array, sample_size, total_rows, sampling_strategy, collected_at timestamp, and warnings.

Sampling Strategies:

SamplingConfig defines sensitive patterns (password, email, SSN fields) to warn about sampling sensitive data.

Collection Metadata and Versioning#

Collection Metadata#

Every schema file includes metadata: collected_at (ISO 8601 timestamp), collection_duration_ms (milliseconds), collector_version, warnings array, and collector_options (settings used during collection). Implemented in CollectionMetadata struct.

Version Management#

Current format version 1.0 is stable and fully supported. Schema designed for evolution through additive changes (new optional fields), version detection, migration support, and deprecation path where old fields are marked before removal.

Integration Architecture#

Tool Integration#

Schema format integrates with all DBSurveyor tools:

dbsurveyor-collect: generates schema files with validation before writing
dbsurveyor: processes and validates schemas using validate_and_parse_schema()
dbsurveyor-docs: generates documentation
dbsurveyor-validate: standalone validation

Module Architecture#

dbsurveyor-core/src/lib.rs exports validation functions and core models, making them available to both collection and postprocessing binaries.

External Tool Compatibility#

Compatible with JSON Schema validators (jsonschema, ajv), data analysis tools (pandas, jq), documentation generators (Docusaurus, MkDocs), and CI/CD pipelines (GitHub Actions, GitLab CI).

Best Practices#

Schema Collection#

Performance Optimization#

For large databases (1000+ tables): use compression for file size reduction, consider sampling for development environments, utilize connection pooling, enable parallel collection for multiple databases, and selectively sample only essential tables.

Relevant Code Files#

File	Description	Lines	URL
`dbsurveyor-core/schemas/dbsurveyor-schema-v1.0.json`	JSON Schema definition with complete type and constraint specifications	254	View
`dbsurveyor-core/src/validation.rs`	Validation implementation with embedded schema, security checks, and error handling	628	View
`dbsurveyor-core/src/models.rs`	Core data structures: DatabaseSchema, Table, Column, UnifiedDataType	574	View
`dbsurveyor-core/src/adapters/postgres/type_mapping.rs`	PostgreSQL type mapping to UnifiedDataType	191	View
`dbsurveyor-core/src/adapters/mysql/type_mapping.rs`	MySQL type mapping to UnifiedDataType	196	View
`dbsurveyor-core/src/adapters/sqlite/type_mapping.rs`	SQLite type mapping with affinity rules	136	View
`dbsurveyor-core/src/adapters/mongodb/type_mapping.rs`	MongoDB BSON type mapping to UnifiedDataType	130	View
`dbsurveyor-core/src/security/credentials.rs`	Secure credential storage with automatic memory zeroing	102	View
`dbsurveyor-core/src/security/connection.rs`	Connection string parsing with secure credential separation	~150	View
`dbsurveyor-core/src/adapters/helpers.rs`	Validation patterns for sensitive fields and connection strings	~106	View
`dbsurveyor-core/src/error.rs`	Error types with connection string redaction	254	View
`dbsurveyor-core/src/validation/tests.rs`	Comprehensive validation test suite	906+	View
`dbsurveyor-collect/src/main.rs`	Collection binary with validator initialization and output validation	~415+	View
`dbsurveyor/src/main.rs`	Postprocessor binary with schema validation and parsing	~311+	View

Frictionless Data Table Schema specification: Foundation for the .dbsurveyor.json format
JSON Schema draft 2020-12: Specification standard used
DBSurveyor Architecture: Overall system design and adapter pattern
Database-specific adapters: PostgreSQL, MySQL, SQLite, MongoDB implementations
Output formats and compression: JSON, YAML, encrypted, and compressed output options