Zerobyte Startup Orchestration#
Overview#
Zerobyte is a TypeScript/Bun-based backup automation application built on Restic that implements a sophisticated startup orchestration system designed for reliability and consistency. The application follows a well-defined three-phase bootstrap sequence that ensures proper initialization of the database, authentication system, and runtime environment, while incorporating comprehensive error handling and retry mechanisms to maintain service availability even when individual components fail.
This document provides a detailed overview of the startup orchestration process, covering automatic database schema initialization, authentication setup, eager database loading, enhanced error handling, and retry logic.
Startup Orchestration Flow#
Entry Point and Bootstrap Trigger#
The application starts from app/server.ts, which serves as the main entry point. Zerobyte uses a Nitro plugin system to trigger the bootstrap process automatically during server initialization. The bootstrap plugin is defined in app/server/plugins/bootstrap.ts and invokes bootstrapApplication(), which orchestrates the three-phase initialization sequence.
Three-Phase Bootstrap Sequence#
The bootstrap process executes three critical phases in strict order to ensure proper initialization:
Phase 1: Database Schema Migrations#
The first phase, executed by runDbMigrations(), initializes the SQLite database and applies schema migrations:
- Database Directory Creation: Automatically creates the database directory if it doesn't exist
- Legacy Database Migration: Handles automatic renaming from the legacy
ironmount.dbtozerobyte.db - Schema Migration Execution: Runs Drizzle ORM schema migrations from the
app/drizzledirectory - Foreign Key Constraints: Automatically enables foreign key constraints for referential integrity
Database schema migrations execute synchronously and fail fast—any failure terminates the bootstrap process, as the application cannot function without a properly initialized database.
Phase 2: Application-Level Data Migrations#
The second phase, managed by runMigrations(), handles application-level data transformations and schema updates:
- Fresh Install Detection: Checks if this is a fresh installation (no users exist) and auto-checkpoints all migrations without execution
- Sequential Execution: For existing installations, executes pending migrations sequentially
- Dependency Validation: Validates migration dependencies to ensure correct execution order
- Checkpoint Tracking: Records completion checkpoints in
appMetadataTableto prevent re-execution on restart - Error Classification: Distinguishes between critical migrations (which halt startup) and maintenance migrations (which continue despite errors)
The migration registry includes transformations such as snapshot retagging, Restic password isolation, organization assignment, path name concatenation, and backup include paths separation (splitting literal paths from glob patterns).
Phase 3: Application Startup#
The final phase, executed by startup(), performs runtime initialization:
- Configuration Schema Updates: Updates all volumes, repositories, and notification destinations to the latest configuration schema
- Scheduler Initialization: Starts the cron-based scheduler and clears any existing scheduled tasks
- Provisioning Sync: When the
PROVISIONING_PATHenvironment variable is set, syncs repositories and volumes from the specified JSON configuration file. The provisioning service resolves secrets usingenv://andfile://protocols and marks resources as managed in the UI - Volume Auto-Remounting: Re-mounts volumes that were previously mounted or have auto-remount enabled
- Backup State Recovery: Marks in-progress backups as "warning" status (indicating the application was restarted mid-backup)
- Job Scheduling: Schedules five periodic background jobs for system maintenance and backup execution
Restic Integration#
Restic Instance and Dependency Injection#
Zerobyte uses Restic for backup operations through a dependency injection pattern that improves testability and modularity. The restic instance is created in app/server/core/restic.ts using the createRestic() function from the @zerobyte/core/restic/server package.
Dependency Injection Pattern: Rather than importing restic utilities directly, the application defines a ResticDeps interface that provides:
- Secret Resolution:
resolveSecret()function for decrypting encrypted configuration values - Organization Password Retrieval:
getOrganizationResticPassword()function for fetching organization-specific Restic encryption passwords from the database - Path Configuration:
resticCacheDir,resticPassFile, anddefaultExcludesconstants - Hostname Configuration: Optional
hostnamefield for setting the Restic hostname
This dependency injection approach allows the restic module to operate without direct database or configuration dependencies, making it easier to test and reuse across different contexts.
Shared Package Architecture: Restic functionality is implemented in the @zerobyte/core package (located in packages/core/), which is a separate workspace package shared across the application. This package includes:
- Restic Commands: Backup, restore, snapshots, forget, check, and other Restic operations
- Helper Functions:
buildEnv(),addCommonArgs(),buildRepoUrl(),cleanupTemporaryKeys(), andvalidateCustomResticParams() - DTOs and Schemas: Type definitions and validation schemas for Restic operations
- Utilities: Common utilities like spawn, logger, sanitize, JSON parsing, and path manipulation
The core package provides both server-side (@zerobyte/core/restic/server) and shared (@zerobyte/core/restic) exports, enabling different parts of the application to import only what they need.
Automatic Database Schema Initialization#
Schema Definition#
The database schema is defined in app/server/db/schema.ts using Drizzle ORM with SQLite as the backend. The schema includes tables for:
- Authentication:
usersTable,sessionsTable,account,verification - Multi-tenancy:
organization,member,invitation - SSO:
ssoProviderfor OIDC/SAML provider configuration - Backup Resources:
volumesTable,repositoriesTable,backupSchedulesTable(stores backup configurations includingincludePathsfor explicit directory and file paths, andincludePatternsfor glob-based patterns) - Notifications:
notificationDestinationsTable - Metadata:
appMetadataTablefor migration checkpoints and system state - Provisioning:
provisioningIdcolumns inrepositoriesTableandvolumesTabletrack operator-managed resources
SSO Tables#
The ssoProvider table stores organization-scoped SSO provider configurations:
- Provider Identity:
providerId,issuer, anddomainfields identify the SSO provider - Organization Scoping: Each provider is linked to an
organizationIdwith cascade deletion - Configuration Storage:
oidcConfigandsamlConfigJSON fields store protocol-specific settings - Auto-Linking:
autoLinkMatchingEmailsboolean flag enables automatic account linking for matching email addresses - User Association: Optional
userIdreference tracks which user created the provider
SSO providers are initialized during database schema migrations and become available immediately after the Phase 1 bootstrap completes.
Backup Schedules Table#
The backupSchedulesTable stores backup configurations with distinct fields for path inclusion:
- Include Paths: The
includePathsfield (added in migration20260320172926_hesitant_naoko) stores a JSON array of explicit directory and file paths to include in backups (e.g.,"/home/user/documents","/var/log/app.log") - Include Patterns: The
includePatternsfield stores glob-based patterns for flexible matching (e.g.,"*.txt","/var/log/*","!/tmp/**") - Separation Purpose: This separation prevents issues with special characters in literal paths being incorrectly expanded as glob patterns during backup execution
- Automatic Migration: Existing backup schedules are automatically migrated by the
00005-split-backup-include-pathsdata migration, which analyzes existingincludePatternsentries and moves literal paths to the newincludePathsfield while preserving pattern-based entries inincludePatterns
Resource Identifier Type Safety#
Volumes, repositories, and backup schedules use a ShortId branded type for their shortId fields to improve type safety:
- Database Layer: The schema stores
shortIdfields as text columns in SQLite - TypeScript Layer: At compile time, these fields are typed as
ShortId(a branded type) rather than plain strings - Runtime Validation: The
isShortId()function validates that strings match the expected format (/^[A-Za-z0-9_-]+$/) - Type Conversion: The
asShortId()function converts validated strings (such as route parameters) to theShortIdbranded type - Type Safety Benefits: This branded type system prevents accidental mixing of different identifier types and provides compile-time validation
This is a compile-time type safety improvement that doesn't change runtime behavior or database storage—shortId values remain text fields in the database.
Automatic Initialization Features#
The database initialization process is fully automatic and requires no manual intervention:
- Directory Creation: The database directory is created automatically if it doesn't exist
- Legacy Migration: Existing
ironmount.dbdatabases are automatically renamed tozerobyte.db - Schema Application: All schema migrations from the
app/drizzledirectory are applied automatically - Constraint Enforcement: Foreign key constraints are enabled automatically to maintain referential integrity
This automatic initialization ensures that the application can start cleanly on first run or after updates without requiring manual database setup or migration execution.
Authentication Initialization#
Authentication System Setup#
Zerobyte uses better-auth for authentication, supporting both traditional email/password authentication and Single Sign-On (SSO) via OIDC and SAML protocols. The authentication system is configured with several security features:
- Derived Secret: Authentication secret derived from
APP_SECRETenvironment variable via HKDF (HMAC-based Key Derivation Function) - Secure Cookies: Cookie security configured based on the
BASE_URLprotocol—HTTPS URLs enable secure-only cookies - Database Persistence: Uses Drizzle adapter to store authentication data in SQLite
- Multi-Factor Authentication: Supports two-factor authentication with backup codes
- SSO Support: Integrates OIDC and SAML providers with organization-scoped configuration and automatic account linking
Authentication Configuration#
The authentication system is configured with the following features during startup:
Base URL Configuration: The baseURL is configured as an object with allowedHosts (including the base URL host and all trusted origins) and protocol set to "auto" to support both HTTP and HTTPS deployments.
SSO Error Handling: SSO authentication errors are routed through the /api/v1/auth/login-error endpoint, which maps Better Auth error messages to standardized error codes using mapAuthErrorToCode() before redirecting to the client-side login page with the appropriate error code parameter. Account linking errors (including "account not linked", "unable to link account", and "SSO account linking is not permitted for users outside this organization") are mapped to the ACCOUNT_LINK_REQUIRED error code, which displays the message: "SSO sign-in was blocked because this email already belongs to another user in this instance. Contact your administrator to resolve the account conflict."
Trusted Provider Linking: The authentication system uses Better Auth's native account.accountLinking.trustedProviders option with the ssoIntegration.resolveTrustedProviders callback (from app/server/modules/sso/sso.integration.ts) to enable automatic account linking. This callback queries the database to identify SSO providers with autoLinkMatchingEmails enabled for the organization, returning their provider IDs as trusted providers for automatic linking. Account linking is strictly organization-scoped—users cannot auto-link from one organization to another via SSO, even with matching email addresses. Existing users with personal organizations cannot auto-link to SSO organizations without explicit invitations, enforcing strict organization boundaries during authentication.
SSO Integration Architecture: The authentication system integrates with SSO through a dedicated module architecture. SSO functionality is implemented in app/server/modules/sso and integrated into the authentication layer through the ssoIntegration interface in app/server/lib/auth.ts. This interface provides the Better Auth SSO plugin, validation middlewares, callback detection, user creation handlers, organization membership resolution, and trusted provider resolution.
Testing Infrastructure: In test environments (NODE_ENV === "test"), the Better Auth testUtils() plugin is conditionally loaded to provide testing utilities. This plugin is not included in production builds.
Authentication Middleware Hooks#
Multiple middleware hooks customize the authentication flow:
validateSsoProviderId: Validates SSO provider identifiers to prevent the use of reserved provider IDsvalidateSsoCallbackUrls: Validates SSO callback URLs during authentication to prevent redirect attacksensureOnlyOneUser: Blocks new user registration unless explicitly enabled, supporting single-user deploymentsconvertLegacyUserOnFirstLogin: Automatically migrates users from the legacy authentication system on first loginrequireSsoInvitation: Enforces invite-only access for SSO authentication by checking for valid invitations before user creation
SSO Account Linking Validation Hook#
A dedicated database hook (account.create.before) validates SSO account linking before account creation:
- Cross-Organization Link Protection: During SSO authentication, the hook calls
canLinkSsoAccount()to validate that the user is permitted to link an SSO account - Existing Credential Account Detection: If the user has an existing credential (email/password) account, SSO account linking is blocked to prevent cross-organization account conflicts
- Invitation-Based Exception: Users without existing credential accounts can link SSO accounts when they have valid pending invitations to the organization
- Security-First Enforcement: This validation prevents users with existing accounts from linking to SSO organizations via email matching, even when they hold valid invitations
SSO-specific middlewares (such as validateSsoProviderId and validateSsoCallbackUrls) are integrated through the ssoIntegration.beforeMiddlewares interface, which registers them as authentication hooks. The authentication system integrates with the SSO module through the ssoIntegration interface, which provides beforeMiddlewares (for SSO-specific validation middleware), isSsoCallback (to detect SSO authentication flows), onUserCreate (to handle SSO-specific user creation logic), resolveOrgMembership (to assign users to SSO organizations via invitations), canLinkSsoAccount (to validate cross-organization account linking during SSO authentication), and resolveTrustedProviders (to enable automatic account linking for trusted SSO providers).
User Creation and Organization Provisioning#
The user creation flow includes automatic organization provisioning with support for both traditional and SSO authentication paths:
Traditional Authentication Path#
For email/password authentication, the user creation flow:
- Admin Assignment: The first user automatically receives admin privileges
- Username Generation: If no username is provided, generates a unique username using UUID
SSO Authentication Path#
For SSO authentication, the flow includes additional steps:
- Invitation Validation: Before user creation,
requireSsoInvitation(invoked throughssoIntegration.onUserCreate) checks for a valid, non-expired invitation matching the user's email and SSO provider's organization - Account Linking Validation: Before creating an SSO account, the system validates whether the user is permitted to link the account by calling
canLinkSsoAccount(). This validation blocks SSO account linking for users with existing credential accounts, even if they have valid pending invitations, to prevent cross-organization account conflicts - Automatic Flags: SSO users have
hasDownloadedResticPasswordautomatically set totruesince they don't need password setup - Account Linking Restrictions: When an SSO provider has
autoLinkMatchingEmailsenabled, Better Auth can automatically link accounts with matching email addresses within the same organization. However, cross-organization auto-linking is prevented—users cannot link from their existing personal organization to an SSO organization without an explicit invitation, maintaining strict organization boundaries
Organization Assignment#
Organization provisioning occurs during session creation and enforces strict organization isolation during SSO authentication:
SSO Authentication Path (when an SSO provider matches the user's email domain):
- SSO Provider Detection: The
ssoIntegration.resolveOrgMembershipmethod identifies if the authentication is via an SSO provider by examining the provider ID from the authentication context - Existing SSO Membership Check: If the user already has membership in the SSO provider's organization (from accepting a past invitation), proceeds with that organization
- Invitation Requirement: If the user does NOT have existing membership in the SSO organization, requires a valid pending invitation to join
- Invitation Rejection: Users without valid invitations are blocked from accessing the SSO organization, even if they have existing accounts or memberships in other organizations
Non-SSO Authentication Path (traditional email/password or no SSO provider match):
- Existing Membership Check: If the user already has an organization membership,
ensureDefaultOrguses that organization - Default Organization Creation: For users without existing memberships, creates a new personal organization with:
- Generated organization slug based on the user's email prefix
- Unique Restic encryption password per organization, encrypted with
APP_SECRETfor secure storage - Owner role assignment for the user
This authentication flow enforces strict organization boundaries by preventing users with existing accounts or personal organizations from auto-linking to SSO organizations without explicit invitations. Additionally, users with existing credential accounts are blocked from linking SSO accounts even when they hold valid invitations, preventing cross-organization account conflicts. This approach provides both cryptographic isolation between tenant backup repositories and security isolation preventing unauthorized cross-organization access through email matching.
Session Management#
Session management integrates with the organization context system:
- Sessions are created with an
activeOrganizationIdthat links users to their organization workspace - Organization provisioning happens automatically during session creation through the
ssoIntegration.resolveOrgMembershipmethod (for SSO authentication) orensureDefaultOrg(for traditional authentication) - AsyncLocalStorage propagates organization ID throughout the request lifecycle
- Request context enables multi-tenant isolation without explicit parameter passing
Organization Deletion and Session Cleanup: When an organization is deleted, the system automatically maintains session integrity by reassigning affected users' activeOrganizationId references. The cleanup process:
- Identifies Affected Users: Finds all users who are members of organizations being deleted (excluding the user being deleted)
- Gathers Alternative Memberships: Queries all organization memberships for affected users to identify potential fallback organizations
- Selects Fallback Organizations: For each affected user, selects the first available organization membership that is not being deleted as their new active organization
- Updates Sessions: Updates all sessions for affected users whose
activeOrganizationIdpoints to a deleted organization, reassigning them to their fallback organization (or null if no alternatives exist) - Transactional Consistency: Wraps the entire operation in a Drizzle ORM transaction to ensure atomicity—either all updates succeed together or none are applied. The transaction uses a synchronous callback pattern (rather than async), executing queries with
.sync()(e.g.,tx.query.member.findMany({...}).sync()) and mutations with.run()(e.g.,tx.delete(organization).where(...).run())
This process ensures sessions never point to deleted organizations, preventing application errors from orphaned references and maintaining consistent session state.
SSO Configuration and Startup#
SSO provider configuration is managed through a dedicated SSO module:
- Module Architecture: SSO functionality is implemented as a fully separate module in
app/server/modules/ssowith its own controller, service, and DTO files. The module integrates with the authentication system through thessoIntegrationinterface inapp/server/lib/auth.tsrather than being embedded within the auth module - Route Registration: SSO routes are mounted under
/api/v1/authand handled by a dedicatedssoControllerregistered separately inapp/server/app.tsvia.route("/api/v1/auth", ssoController) - Public Provider Endpoints: The
/auth/sso-providersendpoint returns all configured SSO providers with their organization slugs, enabling the login page to display available SSO authentication options - Provider Settings: Admin users can manage SSO providers through the
/auth/sso-settingsendpoint, which returns provider configurations, pending invitations, and auto-linking settings - Service Layer: The
ssoServicehandles all SSO-related database operations and business logic - Organization Isolation: Each SSO provider is scoped to a specific organization through the
organizationIdforeign key, ensuring proper multi-tenant isolation - Database Initialization: SSO provider tables are created during Phase 1 database schema migrations and are immediately available for authentication after bootstrap completes
SSO Invitation System#
The SSO invitation system provides access control for SSO-based authentication:
- Invitation Requirement: The
requireSsoInvitationmiddleware enforces that users must have a valid, non-expired invitation before they can complete SSO sign-in for an organization - Invitation Validation: Validates that the invitation matches the user's email, is in "pending" status, belongs to the SSO provider's organization, and has not expired
- Credential Account Restriction: Valid invitations allow new SSO users to access the organization, but do NOT override the restriction for users with existing credential accounts. The system blocks SSO account linking for existing credential account holders even when they have valid invitations, enforcing a security-first approach to cross-organization access control
- Automatic Acceptance: When a user successfully signs in with SSO (and passes all validation checks), their invitation status is automatically updated to "accepted" and an organization membership is created
- Invitation Management: Admin users can delete pending invitations through the
/auth/sso-invitations/:invitationIdendpoint
SSO Provider Management#
Admin users have access to comprehensive SSO provider management capabilities:
- Provider Deletion: The
/auth/sso-providers/:providerIdendpoint allows admin users to delete SSO providers, which also removes all associated accounts - Auto-Linking Configuration: The
/auth/sso-providers/:providerId/auto-linkingendpoint enables or disables automatic account linking by email for trusted SSO providers - Organization Scoping: All SSO provider management operations are scoped to the admin user's active organization to ensure proper multi-tenant isolation
- Provider Limits: The system enforces per-user provider limits—admin users can register up to 10 SSO providers, while non-admin users cannot register providers
Admin User Management#
The admin user management interface provides visibility and control over user accounts:
- User Listing: The
/auth/admin-usersendpoint returns a list of all users with their roles, ban status, and linked authentication accounts - Account Linking Information: For each user, the endpoint includes details about linked accounts (both traditional email/password and SSO accounts), identified by provider ID
- Account Unlinking: Admin users can unlink specific authentication accounts through the
/auth/admin-users/:userId/accounts/:accountIdendpoint - Last Account Protection: The system prevents deletion of a user's last authentication account to ensure users retain access to their accounts
Organization Switching#
The application sidebar includes an organization switcher that enables users to navigate between organizations:
- Multi-Organization Access: Users can be members of multiple organizations with different roles (owner, admin, member)
- Visual Organization List: The
OrganizationSwitchercomponent displays all organizations the user has access to with organization logos or initials - Active Organization Context: The switcher indicates the currently active organization and allows switching between organizations
- Session Persistence: Organization selection is persisted in the user's session through the
activeOrganizationIdfield
Eager Database Loading#
Provisioning System#
The provisioning system allows operators to manage repositories and volumes through a JSON configuration file instead of the UI. Resources are synced at startup and appear as "managed" entries in the normal UI.
Provisioning File Configuration: When the PROVISIONING_PATH environment variable is set, the application reads the JSON configuration file during Phase 3 startup and synchronizes the defined resources with the database.
Secret Resolution: The provisioning service resolves secret references using two protocols:
env://VAR_NAME: Resolves environment variables at sync timefile://SECRET_NAME: Reads secrets from/run/secrets/SECRET_NAME(Docker secrets pattern)
Database Integration: Provisioned resources are stored with a provisioningId field that identifies them as operator-managed. The schema includes unique constraints on (organization_id, provisioning_id) pairs to prevent duplicates.
Resource Management: Provisioned resources appear alongside manually created resources in the UI with "Managed" badges. They are marked read-only to prevent accidental modifications through the UI. Each provisioned resource must specify an existing organizationId for proper multi-tenant isolation.
Sync Behavior: The provisioning sync is atomic—partial failures don't leave inconsistent state. Resources can be marked for deletion by setting "delete": true in the configuration file.
Configuration Schema Updates#
During startup, Zerobyte ensures all entities have the latest configuration schema. This process:
- Updates volumes, repositories, and notification destinations to match current schema definitions
- Identifies entities using their
shortIdfield (typed asShortIdbranded type) when calling service methods - Uses
withContextto set organization context for multi-tenancy support - Processes each entity independently—failures for individual entities are logged but don't halt startup
- Ensures backward compatibility as configuration schemas evolve
The system calls volumeService.updateVolume(volume.shortId, volume) and repositoriesService.updateRepository(repo.shortId, {}), where volume.shortId and repo.shortId are ShortId branded types that provide compile-time type safety when passing identifiers to service methods.
Volume Auto-Remounting#
To restore the pre-shutdown state, Zerobyte automatically remounts volumes during startup:
- Target Selection: Remounts volumes with status "mounted" OR volumes with
autoRemount: trueand status "error" - Backend Support: Handles NFS, SMB, WebDAV, SFTP, and local directory backends
- ShortId Identification: Volumes are identified and remounted using their
shortIdfield (typed asShortIdbranded type) by callingvolumeService.mountVolume(volume.shortId) - Non-Blocking: Each volume mount is wrapped in a
.catch()handler—mount failures don't prevent application startup or affect other volumes - One-Time Attempt: Each volume receives exactly one mount attempt during startup; continuous retry is handled by the scheduled
VolumeAutoRemountJob
Backup State Recovery#
Zerobyte detects and handles interrupted backups during startup:
- Identifies backups that were in "running" state when the application shut down
- Marks these backups as "warning" status with explanatory messages
- Prevents stuck backup states from blocking new backup execution
- Provides visibility into interrupted operations through the UI
Enhanced Error Handling#
Zerobyte implements comprehensive error handling throughout the startup process, ensuring failures are caught, logged, and handled appropriately.
Database Migration Error Handling#
Database schema migrations use a fail-fast approach:
- Migrations execute synchronously without catching errors
- Any failure bubbles up to the bootstrap error handler
- Bootstrap failures terminate the application, as it cannot function without a properly initialized database
This design ensures database consistency—the application never starts with a partially migrated schema.
Application Migration Error Classification#
Application-level migrations use a sophisticated checkpoint-based system with multiple error handling strategies:
Checkpoint System: Each migration is tracked in appMetadataTable to prevent re-execution on restart, ensuring idempotency.
Dependency Validation: Migrations can declare dependencies on other migrations. Missing dependencies cause immediate termination with a clear error message, preventing out-of-order execution.
Critical vs. Maintenance Migrations:
- Critical Migrations: Halt the application with
process.exit(1)on failure. These migrations are essential for data integrity. - Maintenance Migrations: Log errors but allow startup to continue. These migrations improve the system but aren't essential for operation.
Partial Success Tracking: Individual entity migration failures are collected and logged with detailed information, allowing administrators to identify which specific entities failed and why.
Fresh Install Detection: New installations skip migration execution and auto-checkpoint all migrations, as there's no existing data to transform.
Volume Mount Error Handling#
Volume mount operations implement layered error handling to maximize availability:
Error Isolation: Each volume mount is wrapped in a .catch() handler that logs the error but doesn't halt startup or affect other volumes.
Non-Blocking Failures: Volume mount failures don't prevent application startup or other volumes from mounting. The application remains available even when some storage backends are unreachable.
Status Persistence: Volume mount status is persisted to the database with error messages, providing visibility through the UI and enabling targeted retry attempts.
Timeout Protection#
Operations are protected by timeouts to prevent indefinite hangs:
Volume Backend Operations:
- Timeout protection wrapper with 5-second default timeout
- Mount, unmount, and health check operations wrapped with
withTimeout() - Timeout errors are caught and converted to status "error" with appropriate error messages
- SFTP uses a 10-second timeout (2x standard) for higher latency connections
- Rclone mount operations use the centralized
zbConfig.serverIdleTimeoutconfiguration (viaSERVER_IDLE_TIMEOUTenvironment variable, default 300 seconds), making timeout behavior consistent with repository operations and dynamically configurable
Repository Operations:
- Repository initialization timeouts are dynamically configured based on the
SERVER_IDLE_TIMEOUTenvironment variable (multiplied by 1000 for milliseconds) instead of using fixed values - The internal spawn utility has been refactored from
exectosafeExec, providing more robust timeout and error detection - Timeout conditions are explicitly detected with a
timedOutflag, enabling clearer error messages ("Command timed out before completing") - This dynamic configuration ties repository initialization timeouts to server settings, making timeout behavior more predictable and consistent across the application
Backend Fallback Strategies#
Some volume backends implement intelligent fallback mechanisms:
- NFS: Falls back to mount with
-iflag (interrupt flag) if initial mount fails, allowing recovery from some mount failures - WebDAV: Implements the same
-iflag fallback - WebDAV Error Classification: Provides detailed error detection for connection refused, authentication failures, and configuration errors
Pre-Mount Health Checks#
Volume backends perform health checks before mounting:
- If already mounted, returns immediately with success (avoiding duplicate operations)
- If in error state, attempts unmount before remounting (clearing stale state)
- Prevents duplicate mount attempts and resource conflicts
Structured Logging#
Error logging throughout the startup process provides detailed diagnostic information:
- Migration failures include detailed entity information (entity type, ID, error message)
- Volume mount errors include volume name and error message
- Critical failures receive formatted multi-line messages with visual separators, clear explanations, resolution guidance, and support channels
Retry Logic and Reliability Improvements#
Startup Volume Remount#
During startup, volumes are auto-remounted with the following characteristics:
- Target Volumes: Remounts volumes with status "mounted" (to restore pre-shutdown state) OR volumes with
autoRemount: trueand status "error" (to attempt recovery) - One-Time Attempt: Each volume receives exactly one mount attempt during startup
- Error Isolation: Each volume mount is wrapped in
.catch()to prevent cascading failures - Non-Blocking: Mount failures don't prevent application startup
Startup remounting provides immediate recovery for transient failures, while persistent failures are handled by scheduled retry jobs.
VolumeAutoRemountJob#
The VolumeAutoRemountJob provides continuous retry capability:
- Schedule: Executes every 5 minutes after startup
- Target Selection: Only processes volumes with status "error" and
autoRemount: true - ShortId Usage: Calls
volumeService.mountVolume(volume.shortId)to remount volumes, wherevolume.shortIdis aShortIdbranded type - Per-Volume Error Handling: Each volume mount is wrapped in try-catch to prevent one failure from affecting others
- Job-Level Error Handling: The scheduler wraps all job execution in error handlers, logging failures without stopping the scheduler
- Fixed Interval Strategy: Uses a 5-minute fixed interval rather than exponential backoff, prioritizing consistent recovery attempts over reducing load
- No Retry Limits: Auto-remount jobs continue indefinitely—volumes remain eligible for remounting as long as they're in error state with
autoRemountenabled
This design ensures volumes automatically recover when storage backends become available again, without requiring manual intervention.
Scheduler Initialization#
The cron-based scheduler is initialized during startup:
- Starts the scheduler engine
- Clears any existing scheduled tasks to prevent duplicates (important for hot reload scenarios)
- Provides the foundation for all periodic background jobs
Background Job Scheduling#
Five periodic jobs are scheduled to maintain system health and execute backups:
| Job | Schedule | Purpose |
|---|---|---|
| CleanupDanglingMountsJob | Hourly | Cleanup of stale volume mounts |
| VolumeHealthCheckJob | Every 30 minutes | Verify volume accessibility and update status |
| RepositoryHealthCheckJob | Daily at 12:50 PM | Check repository integrity and availability |
| BackupExecutionJob | Every minute | Check for scheduled backups to run and execute them |
| VolumeAutoRemountJob | Every 5 minutes | Attempt to remount failed volumes with auto-remount enabled |
These jobs run continuously, providing system maintenance, health monitoring, and backup execution without manual intervention.
Graceful Degradation Philosophy#
Zerobyte is designed to maximize availability through graceful degradation:
Application starts even when:
- Some volumes fail to mount
- Non-critical migrations error
- Configuration schema updates fail for individual entities
Individual failures don't cascade:
- Volume mount failures are isolated per-volume
- Background jobs continue operating despite individual job failures
- Each entity (volume, repository, notification) is updated independently
This architecture ensures that partial failures don't result in total system unavailability. The application remains functional and continues attempting recovery automatically.
Graceful Shutdown#
Shutdown handling ensures clean termination:
- Stops the scheduler, canceling all cron jobs
- Unmounts all mounted volumes (NFS, SMB, WebDAV, SFTP)
- Logs unmount status for each volume
Graceful shutdown prevents resource leaks and ensures volumes are properly unmounted before the application terminates.
Configuration and Constants#
Environment Variables#
Configuration is loaded from environment variables using the zod validation library:
| Variable | Required | Default | Purpose |
|---|---|---|---|
APP_SECRET | Yes | — | Master encryption key for database secrets and authentication |
BASE_URL | Yes | — | Base URL for the application; determines cookie security mode (HTTPS → secure cookies) |
ZEROBYTE_DATABASE_URL | No | /var/lib/zerobyte/data/zerobyte.db | SQLite database file path |
RESTIC_CACHE_DIR | No | /var/lib/zerobyte/restic/cache | Restic cache directory |
PORT | No | 4096 | HTTP server port |
PROVISIONING_PATH | No | — | Path to JSON file containing repository and volume provisioning configuration |
TRUST_PROXY | No | false | When set to "true", trusts existing X-Forwarded-For headers from reverse proxies for client IP address determination. When "false" (default), uses the direct connection IP and ignores or sets its own X-Forwarded-For header. Important for deployments behind reverse proxies where accurate client IP tracking is needed |
System Constants#
System constants are defined for:
- Volume Mount Paths: Locations where remote volumes are mounted in the filesystem
- Repository Storage Locations: Paths for storing backup repositories
- Restic Configuration Paths: Locations for Restic configuration files
- Timeout Values: Default timeout of 5 seconds for most operations, 10 seconds for SFTP
These constants ensure consistent behavior across the application and provide a single source of truth for system-wide configuration.
Summary#
Zerobyte's startup orchestration system provides a robust foundation for reliable backup automation through:
- Structured Initialization: Three-phase bootstrap sequence ensures proper ordering of database, migration, and runtime initialization
- Automatic Schema Management: Database schema and data migrations are applied automatically with checkpoint tracking
- Secure Authentication: Authentication system with derived secrets, multi-factor support, SSO integration (OIDC/SAML), and automatic organization provisioning
- Eager Loading: Configuration updates, volume remounting, and backup state recovery restore system state immediately on startup
- Comprehensive Error Handling: Layered error handling with fail-fast for critical failures and graceful degradation for non-critical operations
- Continuous Retry: Automatic retry mechanisms for volume mounting with fixed intervals and no retry limits
- Graceful Degradation: Application remains available even when individual components fail, with isolated error handling preventing cascading failures
This design ensures Zerobyte can start reliably, recover from failures automatically, and maintain service availability even in degraded conditions.