JSON Schema V1.0 Design And Validation Architecture#
Lead Section#
The JSON Schema V1.0 Design and Validation Architecture is the foundational specification for the .dbsurveyor.json format, which provides a standardized, validated representation of database schemas across PostgreSQL, MySQL, SQLite, and MongoDB. Based on the Frictionless Data Table Schema specification, the architecture implements a security-first design that prohibits credential fields, filters connection strings, and enforces comprehensive validation rules.
The architecture is built on JSON Schema draft 2020-12 with schema identifier https://dbsurveyor.dev/schemas/v1.0/dbsurveyor-schema.json, formally defined in a 254-line JSON Schema definition file. The implementation uses Rust data structures with serde serialization and provides comprehensive validation through the jsonschema crate with embedded schema definitions.
Version 1.0 is stable and fully supported, designed for evolution through additive changes with version detection, migration support, and a defined deprecation path.
Schema Structure and Design#
Root Schema Object#
Every .dbsurveyor.json file contains required fields: format_version, database_info, tables, and collection_metadata. Optional fields include views, indexes, constraints, procedures, functions, triggers, custom types, and data samples.
The root structure is defined in the DatabaseSchema struct with complete serde serialization support.
Database Information Layer#
The database_info object contains essential metadata: name, version, size_bytes, encoding, collation, owner, and access_level.
Access Levels:
Full: Complete schema access with all permissionsLimited: Partial access due to permission restrictionsNone: No schema access available
Collection Status can be:
"Success"for successful collection- Failure object with detailed error context
- Skipped indicator with explanation for system databases
Table and Column Structure#
Tables are defined with comprehensive metadata: name, schema, columns array, primary keys, foreign keys, indexes, and comment. The Table struct implements this specification.
Each Column specifies:
name: Column identifierdata_type: Using the unified type systemis_nullable: NULL constraint statusis_primary_key: Primary key membershipis_auto_increment: Auto-increment/serial propertyordinal_position: Column order in tablecomment: Optional description
Unified Data Type System#
Type Architecture#
The UnifiedDataType enum maps database-specific types to a common representation using an enum-with-struct-variants pattern.
Type Definitions#
Basic Types (null parameters):
Boolean: True/false valuesDate: Date-only valuesJson: JSON/JSONB dataUuid: UUID/GUID values
String Types:
String { max_length: Option<u32> }: Character strings with optional length limit
Numeric Types:
Integer { bits: u8, signed: bool }: Supports 8, 16, 24, 32, 64, 128-bit integersFloat { precision: Option<u8> }: Floating point with 1-53 bit precision
Date/Time Types:
DateTime { with_timezone: bool }: Date and time with optional timezoneTime { with_timezone: bool }: Time-only with optional timezone
Complex Types:
Array { element_type: Box<UnifiedDataType> }: Arrays with recursive element type specificationBinary { max_length: Option<u32> }: Binary data with optional length limitCustom { type_name: String }: Database-specific custom types
Database-Specific Type Mapping#
PostgreSQL (type mapping implementation):
- VARCHAR, TEXT → String
- SMALLINT (16-bit), INTEGER (32-bit), BIGINT (64-bit) → Integer
- REAL (24-bit precision), DOUBLE PRECISION (53-bit precision) → Float
- NUMERIC/DECIMAL: scale 0 → Integer, otherwise → Float
- TIMESTAMP, TIMESTAMPTZ → DateTime
- BYTEA → Binary, JSON/JSONB → Json, UUID → Uuid
- Arrays recursively mapped → Array
- ENUM, DOMAIN, INET, geometric types → Custom
MySQL (type mapping implementation):
- VARCHAR, TEXT variants → String
- TINYINT, SMALLINT, MEDIUMINT (24-bit), INT, BIGINT → Integer
- UNSIGNED modifier sets signed: false
- TINYINT(1) → Boolean (MySQL convention)
- FLOAT (24-bit), DOUBLE (53-bit) → Float
- DATETIME (no TZ), TIMESTAMP (with TZ) → DateTime
- BINARY, BLOB variants → Binary, JSON → Json
- BIT(1) → Boolean, BIT(n) → Binary
- ENUM, SET, geometry types → Custom
SQLite (type mapping implementation):
- Uses type affinity rules: INT pattern → INTEGER, CHAR/TEXT → TEXT, BLOB → BLOB, REAL/FLOAT → REAL
- TINYINT (8-bit), SMALLINT (16-bit), MEDIUMINT (24-bit), INTEGER (32-bit), BIGINT (64-bit)
- TEXT types with optional length → String
- FLOAT (24-bit), DOUBLE/REAL (53-bit) → Float
- All DateTime types without timezone
- Empty type → Binary (BLOB affinity)
MongoDB (BSON type mapping implementation):
- Bson::String → String
- Bson::Int32 (32-bit), Bson::Int64 (64-bit) → Integer
- Bson::Double → Float (53-bit precision)
- Bson::Decimal128 → Float (128-bit precision)
- Bson::DateTime, Bson::Timestamp → DateTime (always with_timezone: true, UTC)
- Bson::Binary → Binary
- Bson::ObjectId → String with max_length: 24
- Embedded documents → Json, Arrays → Array with recursive element type inference
- Regex, JavaScript, Symbol, MinKey, MaxKey → Custom
Constraints and Relationships#
Primary Key Structure#
Primary keys are defined with name and array of columns.
Foreign Key Specification#
Referential Actions:
Cascade: Delete/update cascades to referencing rowsSetNull: Set referencing columns to NULLSetDefault: Set referencing columns to default valuesRestrict: Prevent if references existNoAction: No action taken
Check Constraints#
Index Definitions#
Index Types:
btree: Balanced tree indexhash: Hash indexgin: Generalized inverted indexgist: Generalized search treespgist: Space-partitioned GiST
Security Constraints and Validation#
Security-First Design Principles#
The architecture implements strict security rules: no credential fields (field names cannot contain password, secret, token), no connection strings (database URLs automatically filtered), and no sensitive patterns (credential patterns rejected). All error messages are sanitized to prevent credential leakage.
Multi-Layer Security Implementation#
Field Name Validation (helpers.rs implementation):
- Pre-compiled regex patterns detect sensitive field names: password, passwd, pwd, email, ssn, credit_card, api_key, secret_key, token, auth_token, bearer
- Connection string pattern detection for postgres://, mysql://, mongodb://, mssql:// with credentials
Value-Based Credential Detection (validation.rs implementation):
validate_no_credentials_recursive()checks for credential patterns like password=, pwd=, secret=, token=, key=, api_key=validate_no_connection_strings_recursive()detects connection strings with credentials- Important: Column names like "password_hash" or "api_secret" are allowed as legitimate schema metadata - only actual credential values are blocked
Connection String Redaction (error.rs implementation):
redact_database_url()masks passwords in connection strings (e.g.,postgres://user:secret@localhost/dbbecomespostgres://user:****@localhost/db)
Secure Credential Storage (credentials.rs implementation):
Credentialsstruct usesZeroizingcontainers to automatically clear sensitive data from memory on drop- Connection parsing separates credentials into secure containers, never storing passwords in plain String
Read-Only Mode Enforcement (connection configuration):
- Default read_only: true in ConnectionConfig
- PostgreSQL enforces via
SET default_transaction_read_only = onin connection pool after_connect hook
Timeout Configuration:
- Connection-level: 30s connect timeout, 30s query timeout, 600s idle timeout, 3600s max lifetime
- Session-level: statement_timeout, lock_timeout (30s), idle_in_transaction_session_timeout (60s)
Connection Pool Limits (PostgreSQL connection pool):
Data Validation Limits#
String lengths: 255 characters for names, 1000 for comments. Arrays: maximum 1000 items. Object nesting: limited to 10 levels. Schema files: capped at 100MB.
Type and Relationship Validation#
Encryption and Compression#
AES-GCM-256 authenticated encryption with Argon2id key derivation (64 MiB memory, 3 iterations) and random 96-bit nonces. Zstandard compression provides 60-80% size reduction for large schemas.
Validation Architecture#
Validation Implementation#
The validation module (628 lines) implements comprehensive validation.
Key Functions:
initialize_schema_validator()compiles the embedded JSON Schema and caches it in OnceLockvalidate_schema_output()performs JSON Schema validation, format version checking, and security validationvalidate_and_parse_schema()combines JSON parsing, validation, and deserialization
The module embeds the JSON Schema definition (lines 73-323) in the SCHEMA_V1_0 constant, eliminating external file dependencies at runtime.
Validation Process#
Error Structure#
Warnings#
Testing#
Comprehensive test coverage (906+ lines) validates various scenarios, including:
- Valid schemas across all supported databases
- Invalid data types and missing required fields
- Unsupported format versions
- Security violations: tests confirm column names like "password_hash" are allowed (legitimate metadata) while actual credential values are blocked
Multi-Database Support#
Server-Level Collections#
The collection_mode field specifies MultiDatabase with discovered, collected, and failed counts.
Database Access Management#
Data Sampling#
Sampling Strategies:
{"MostRecent": {"limit": 10}}: Latest N rows{"Random": {"limit": 100}}: Random sample of N rows"None": No sampling
Collection Metadata and Versioning#
Collection Metadata#
Every schema file includes metadata: collected_at (ISO 8601 timestamp), collection_duration_ms (milliseconds), collector_version, warnings array, and collector_options (settings used during collection). Implemented in CollectionMetadata struct.
Version Management#
Current format version 1.0 is stable and fully supported. Schema designed for evolution through additive changes (new optional fields), version detection, migration support, and deprecation path where old fields are marked before removal.
Integration Architecture#
Tool Integration#
Schema format integrates with all DBSurveyor tools:
- dbsurveyor-collect: generates schema files with validation before writing
- dbsurveyor: processes and validates schemas using
validate_and_parse_schema() - dbsurveyor-docs: generates documentation
- dbsurveyor-validate: standalone validation
Module Architecture#
dbsurveyor-core/src/lib.rs exports validation functions and core models, making them available to both collection and postprocessing binaries.
External Tool Compatibility#
Best Practices#
Schema Collection#
- Use descriptive names: Avoid generic names like "db1", "test"
- Include comments: Add meaningful descriptions for tables and columns
- Sample strategically: Use sampling for large tables to avoid huge files
- Validate early: Check schemas immediately after collection
Performance Optimization#
Relevant Code Files#
| File | Description | Lines | URL |
|---|---|---|---|
dbsurveyor-core/schemas/dbsurveyor-schema-v1.0.json | JSON Schema definition with complete type and constraint specifications | 254 | View |
dbsurveyor-core/src/validation.rs | Validation implementation with embedded schema, security checks, and error handling | 628 | View |
dbsurveyor-core/src/models.rs | Core data structures: DatabaseSchema, Table, Column, UnifiedDataType | 574 | View |
dbsurveyor-core/src/adapters/postgres/type_mapping.rs | PostgreSQL type mapping to UnifiedDataType | 191 | View |
dbsurveyor-core/src/adapters/mysql/type_mapping.rs | MySQL type mapping to UnifiedDataType | 196 | View |
dbsurveyor-core/src/adapters/sqlite/type_mapping.rs | SQLite type mapping with affinity rules | 136 | View |
dbsurveyor-core/src/adapters/mongodb/type_mapping.rs | MongoDB BSON type mapping to UnifiedDataType | 130 | View |
dbsurveyor-core/src/security/credentials.rs | Secure credential storage with automatic memory zeroing | 102 | View |
dbsurveyor-core/src/security/connection.rs | Connection string parsing with secure credential separation | ~150 | View |
dbsurveyor-core/src/adapters/helpers.rs | Validation patterns for sensitive fields and connection strings | ~106 | View |
dbsurveyor-core/src/error.rs | Error types with connection string redaction | 254 | View |
dbsurveyor-core/src/validation/tests.rs | Comprehensive validation test suite | 906+ | View |
dbsurveyor-collect/src/main.rs | Collection binary with validator initialization and output validation | ~415+ | View |
dbsurveyor/src/main.rs | Postprocessor binary with schema validation and parsing | ~311+ | View |
Related Topics#
- Frictionless Data Table Schema specification: Foundation for the .dbsurveyor.json format
- JSON Schema draft 2020-12: Specification standard used
- DBSurveyor Architecture: Overall system design and adapter pattern
- Database-specific adapters: PostgreSQL, MySQL, SQLite, MongoDB implementations
- Output formats and compression: JSON, YAML, encrypted, and compressed output options