Documents
JSON Schema V1.0 Design And Validation Architecture
JSON Schema V1.0 Design And Validation Architecture
Type
Topic
Status
Published
Created
Mar 4, 2026
Updated
Mar 4, 2026
Created by
Dosu Bot
Updated by
Dosu Bot

JSON Schema V1.0 Design And Validation Architecture#

Lead Section#

The JSON Schema V1.0 Design and Validation Architecture is the foundational specification for the .dbsurveyor.json format, which provides a standardized, validated representation of database schemas across PostgreSQL, MySQL, SQLite, and MongoDB. Based on the Frictionless Data Table Schema specification, the architecture implements a security-first design that prohibits credential fields, filters connection strings, and enforces comprehensive validation rules.

The architecture is built on JSON Schema draft 2020-12 with schema identifier https://dbsurveyor.dev/schemas/v1.0/dbsurveyor-schema.json, formally defined in a 254-line JSON Schema definition file. The implementation uses Rust data structures with serde serialization and provides comprehensive validation through the jsonschema crate with embedded schema definitions.

Version 1.0 is stable and fully supported, designed for evolution through additive changes with version detection, migration support, and a defined deprecation path.

Schema Structure and Design#

Root Schema Object#

Every .dbsurveyor.json file contains required fields: format_version, database_info, tables, and collection_metadata. Optional fields include views, indexes, constraints, procedures, functions, triggers, custom types, and data samples.

The root structure is defined in the DatabaseSchema struct with complete serde serialization support.

Database Information Layer#

The database_info object contains essential metadata: name, version, size_bytes, encoding, collation, owner, and access_level.

Access Levels:

Collection Status can be:

  • "Success" for successful collection
  • Failure object with detailed error context
  • Skipped indicator with explanation for system databases

Table and Column Structure#

Tables are defined with comprehensive metadata: name, schema, columns array, primary keys, foreign keys, indexes, and comment. The Table struct implements this specification.

Each Column specifies:

  • name: Column identifier
  • data_type: Using the unified type system
  • is_nullable: NULL constraint status
  • is_primary_key: Primary key membership
  • is_auto_increment: Auto-increment/serial property
  • ordinal_position: Column order in table
  • comment: Optional description

Unified Data Type System#

Type Architecture#

The UnifiedDataType enum maps database-specific types to a common representation using an enum-with-struct-variants pattern.

Type Definitions#

Basic Types (null parameters):

  • Boolean: True/false values
  • Date: Date-only values
  • Json: JSON/JSONB data
  • Uuid: UUID/GUID values

String Types:

Numeric Types:

Date/Time Types:

Complex Types:

Database-Specific Type Mapping#

PostgreSQL (type mapping implementation):

MySQL (type mapping implementation):

SQLite (type mapping implementation):

MongoDB (BSON type mapping implementation):

Constraints and Relationships#

Primary Key Structure#

Primary keys are defined with name and array of columns.

Foreign Key Specification#

Foreign keys include name, columns array, referenced_table, referenced_schema, referenced_columns, and referential actions.

Referential Actions:

Check Constraints#

Check constraints specify name, constraint_type, definition (constraint expression), and enforced boolean.

Index Definitions#

Indexes include name, table_name, schema, columns array (with order and nulls_order), is_unique, is_primary, index_type, and optional comment.

Index Types:

Security Constraints and Validation#

Security-First Design Principles#

The architecture implements strict security rules: no credential fields (field names cannot contain password, secret, token), no connection strings (database URLs automatically filtered), and no sensitive patterns (credential patterns rejected). All error messages are sanitized to prevent credential leakage.

Multi-Layer Security Implementation#

Field Name Validation (helpers.rs implementation):

Value-Based Credential Detection (validation.rs implementation):

Connection String Redaction (error.rs implementation):

Secure Credential Storage (credentials.rs implementation):

Read-Only Mode Enforcement (connection configuration):

Timeout Configuration:

Connection Pool Limits (PostgreSQL connection pool):

Data Validation Limits#

String lengths: 255 characters for names, 1000 for comments. Arrays: maximum 1000 items. Object nesting: limited to 10 levels. Schema files: capped at 100MB.

Type and Relationship Validation#

All mandatory fields must be present, values must match declared types, foreign keys must reference valid tables, and index columns must exist in the referenced table.

Encryption and Compression#

AES-GCM-256 authenticated encryption with Argon2id key derivation (64 MiB memory, 3 iterations) and random 96-bit nonces. Zstandard compression provides 60-80% size reduction for large schemas.

Validation Architecture#

Validation Implementation#

The validation module (628 lines) implements comprehensive validation.

Key Functions:

The module embeds the JSON Schema definition (lines 73-323) in the SCHEMA_V1_0 constant, eliminating external file dependencies at runtime.

Validation Process#

All output files are validated against DBSurveyor JSON Schema v1.0 through format version checking, required field validation, data type validation, security validation, and relationship validation.

Error Structure#

When validation fails, DBSurveyor provides detailed error information: path (location in schema), message (error description), value (actual value found), and suggestion (how to fix).

Warnings#

Warnings are included in collection metadata for non-critical issues like large table collection times, unsupported custom types, and temporary table detection.

Testing#

Comprehensive test coverage (906+ lines) validates various scenarios, including:

Multi-Database Support#

Server-Level Collections#

For server-level collections, the format supports multiple databases through a server_info object containing server_type, version, host, port, total_databases, collected_databases, system_databases_excluded, connection_user, has_superuser_privileges, and collection_mode.

The collection_mode field specifies MultiDatabase with discovered, collected, and failed counts.

Database Access Management#

Collection process respects database access permissions through access level indicators (Full, Limited, None) and can report collection failures with detailed error context or skip reasons for system databases.

Data Sampling#

Optional data samples provide insight into actual data with table_name, schema_name, rows array, sample_size, total_rows, sampling_strategy, collected_at timestamp, and warnings.

Sampling Strategies:

SamplingConfig defines sensitive patterns (password, email, SSN fields) to warn about sampling sensitive data.

Collection Metadata and Versioning#

Collection Metadata#

Every schema file includes metadata: collected_at (ISO 8601 timestamp), collection_duration_ms (milliseconds), collector_version, warnings array, and collector_options (settings used during collection). Implemented in CollectionMetadata struct.

Version Management#

Current format version 1.0 is stable and fully supported. Schema designed for evolution through additive changes (new optional fields), version detection, migration support, and deprecation path where old fields are marked before removal.

Integration Architecture#

Tool Integration#

Schema format integrates with all DBSurveyor tools:

Module Architecture#

dbsurveyor-core/src/lib.rs exports validation functions and core models, making them available to both collection and postprocessing binaries.

External Tool Compatibility#

Compatible with JSON Schema validators (jsonschema, ajv), data analysis tools (pandas, jq), documentation generators (Docusaurus, MkDocs), and CI/CD pipelines (GitHub Actions, GitLab CI).

Best Practices#

Schema Collection#

  1. Use descriptive names: Avoid generic names like "db1", "test"
  2. Include comments: Add meaningful descriptions for tables and columns
  3. Sample strategically: Use sampling for large tables to avoid huge files
  4. Validate early: Check schemas immediately after collection

Performance Optimization#

For large databases (1000+ tables): use compression for file size reduction, consider sampling for development environments, utilize connection pooling, enable parallel collection for multiple databases, and selectively sample only essential tables.

Relevant Code Files#

FileDescriptionLinesURL
dbsurveyor-core/schemas/dbsurveyor-schema-v1.0.jsonJSON Schema definition with complete type and constraint specifications254View
dbsurveyor-core/src/validation.rsValidation implementation with embedded schema, security checks, and error handling628View
dbsurveyor-core/src/models.rsCore data structures: DatabaseSchema, Table, Column, UnifiedDataType574View
dbsurveyor-core/src/adapters/postgres/type_mapping.rsPostgreSQL type mapping to UnifiedDataType191View
dbsurveyor-core/src/adapters/mysql/type_mapping.rsMySQL type mapping to UnifiedDataType196View
dbsurveyor-core/src/adapters/sqlite/type_mapping.rsSQLite type mapping with affinity rules136View
dbsurveyor-core/src/adapters/mongodb/type_mapping.rsMongoDB BSON type mapping to UnifiedDataType130View
dbsurveyor-core/src/security/credentials.rsSecure credential storage with automatic memory zeroing102View
dbsurveyor-core/src/security/connection.rsConnection string parsing with secure credential separation~150View
dbsurveyor-core/src/adapters/helpers.rsValidation patterns for sensitive fields and connection strings~106View
dbsurveyor-core/src/error.rsError types with connection string redaction254View
dbsurveyor-core/src/validation/tests.rsComprehensive validation test suite906+View
dbsurveyor-collect/src/main.rsCollection binary with validator initialization and output validation~415+View
dbsurveyor/src/main.rsPostprocessor binary with schema validation and parsing~311+View
JSON Schema V1.0 Design And Validation Architecture | Dosu