Zerobyte Startup Orchestration#

Overview#

Zerobyte is a TypeScript/Bun-based backup automation application built on Restic that implements a sophisticated startup orchestration system designed for reliability and consistency. The application follows a well-defined three-phase bootstrap sequence that ensures proper initialization of the database, authentication system, and runtime environment, while incorporating comprehensive error handling and retry mechanisms to maintain service availability even when individual components fail.

This document provides a detailed overview of the startup orchestration process, covering automatic database schema initialization, authentication setup, eager database loading, enhanced error handling, and retry logic.

Startup Orchestration Flow#

Entry Point and Bootstrap Trigger#

The application starts from app/server.ts, which serves as the main entry point. Zerobyte uses a Nitro plugin system to trigger the bootstrap process automatically during server initialization. The bootstrap plugin is defined in app/server/plugins/bootstrap.ts and invokes bootstrapApplication(), which orchestrates the three-phase initialization sequence.

Three-Phase Bootstrap Sequence#

The bootstrap process executes three critical phases in strict order to ensure proper initialization:

Phase 1: Database Schema Migrations#

The first phase, executed by runDbMigrations(), initializes the SQLite database and applies schema migrations:

Database Directory Creation: Automatically creates the database directory if it doesn't exist
Legacy Database Migration: Handles automatic renaming from the legacy ironmount.db to zerobyte.db
Schema Migration Execution: Runs Drizzle ORM schema migrations from the app/drizzle directory
Foreign Key Constraints: Automatically enables foreign key constraints for referential integrity

Database schema migrations execute synchronously and fail fast—any failure terminates the bootstrap process, as the application cannot function without a properly initialized database.

Phase 2: Application-Level Data Migrations#

The second phase, managed by runMigrations(), handles application-level data transformations and schema updates:

Fresh Install Detection: Checks if this is a fresh installation (no users exist) and auto-checkpoints all migrations without execution
Sequential Execution: For existing installations, executes pending migrations sequentially
Dependency Validation: Validates migration dependencies to ensure correct execution order
Checkpoint Tracking: Records completion checkpoints in appMetadataTable to prevent re-execution on restart
Error Classification: Distinguishes between critical migrations (which halt startup) and maintenance migrations (which continue despite errors)

The migration registry includes transformations such as snapshot retagging, Restic password isolation, organization assignment, path name concatenation, and backup include paths separation (splitting literal paths from glob patterns).

Phase 3: Application Initialization#

The final phase performs runtime initialization and conditionally starts the agent runtime:

Agent Runtime Initialization: When the ENABLE_LOCAL_AGENT environment variable is set to "true", the bootstrap process starts the agent runtime by calling startAgentRuntime() and spawnLocalAgent() before full application startup. The system includes a waitForAgentReady verification step to ensure the local agent is ready before proceeding
Application Startup: The startup() function performs runtime initialization:
- Configuration Schema Updates: Updates all volumes, repositories, and notification destinations to the latest configuration schema
- Scheduler Initialization: Starts the cron-based scheduler and clears any existing scheduled tasks
- Provisioning Sync: When the PROVISIONING_PATH environment variable is set, syncs repositories and volumes from the specified JSON configuration file. The provisioning service resolves secrets using env:// and file:// protocols and marks resources as managed in the UI
- Volume Auto-Remounting: Re-mounts volumes that were previously mounted or have auto-remount enabled (conditional on local agent mode)
- Backup State Recovery: Marks in-progress backups as "warning" status (indicating the application was restarted mid-backup)
- Job Scheduling: Schedules five periodic background jobs for system maintenance and backup execution

Restic Integration#

Restic Instance and Dependency Injection#

Zerobyte uses Restic for backup operations through a dependency injection pattern that improves testability and modularity. The restic instance is created in app/server/core/restic.ts using the createRestic() function from the @zerobyte/core/restic/server package.

Dependency Injection Pattern: Rather than importing restic utilities directly, the application defines a ResticDeps interface that provides:

Secret Resolution: resolveSecret() function for decrypting encrypted configuration values
Organization Password Retrieval: getOrganizationResticPassword() function for fetching organization-specific Restic encryption passwords from the database
Path Configuration: resticCacheDir, resticPassFile, and defaultExcludes constants
Hostname Configuration: Optional hostname field for setting the Restic hostname

This dependency injection approach allows the restic module to operate without direct database or configuration dependencies, making it easier to test and reuse across different contexts.

Shared Package Architecture: Restic functionality is implemented in the @zerobyte/core package (located in packages/core/), which is a separate workspace package shared across the application. This package includes:

Restic Commands: Backup, restore, snapshots, forget, check, and other Restic operations
Helper Functions: buildEnv(), addCommonArgs(), buildRepoUrl(), cleanupTemporaryKeys(), and validateCustomResticParams()
DTOs and Schemas: Type definitions and validation schemas for Restic operations
Utilities: Common utilities like spawn, logger, sanitize, JSON parsing, and path manipulation

The core package provides both server-side (@zerobyte/core/restic/server) and shared (@zerobyte/core/restic) exports, enabling different parts of the application to import only what they need.

Automatic Database Schema Initialization#

Schema Definition#

The database schema is defined in app/server/db/schema.ts using Drizzle ORM with SQLite as the backend. The schema includes tables for:

Authentication: usersTable, sessionsTable, account, verification
Multi-tenancy: organization, member, invitation
SSO: ssoProvider for OIDC/SAML provider configuration
Backup Resources: volumesTable, repositoriesTable, backupSchedulesTable (stores backup configurations including includePaths for explicit directory and file paths, and includePatterns for glob-based patterns)
Notifications: notificationDestinationsTable
Metadata: appMetadataTable for migration checkpoints and system state
Provisioning: provisioningId columns in repositoriesTable and volumesTable track operator-managed resources

SSO Tables#

The ssoProvider table stores organization-scoped SSO provider configurations:

Provider Identity: providerId, issuer, and domain fields identify the SSO provider
Organization Scoping: Each provider is linked to an organizationId with cascade deletion
Configuration Storage: oidcConfig and samlConfig JSON fields store protocol-specific settings
Auto-Linking: autoLinkMatchingEmails boolean flag enables automatic account linking for matching email addresses
User Association: Optional userId reference tracks which user created the provider

SSO providers are initialized during database schema migrations and become available immediately after the Phase 1 bootstrap completes.

Backup Schedules Table#

The backupSchedulesTable stores backup configurations with distinct fields for path inclusion:

Include Paths: The includePaths field (added in migration 20260320172926_hesitant_naoko) stores a JSON array of explicit directory and file paths to include in backups (e.g., "/home/user/documents", "/var/log/app.log")
Include Patterns: The includePatterns field stores glob-based patterns for flexible matching (e.g., "*.txt", "/var/log/*", "!/tmp/**")
Separation Purpose: This separation prevents issues with special characters in literal paths being incorrectly expanded as glob patterns during backup execution
Automatic Migration: Existing backup schedules are automatically migrated by the 00005-split-backup-include-paths data migration, which analyzes existing includePatterns entries and moves literal paths to the new includePaths field while preserving pattern-based entries in includePatterns

Resource Identifier Type Safety#

Volumes, repositories, and backup schedules use a ShortId branded type for their shortId fields to improve type safety:

Database Layer: The schema stores shortId fields as text columns in SQLite
TypeScript Layer: At compile time, these fields are typed as ShortId (a branded type) rather than plain strings
Type Conversion: The asShortId() function converts validated strings (such as route parameters) to the ShortId branded type
Type Safety Benefits: This branded type system prevents accidental mixing of different identifier types and provides compile-time validation

This is a compile-time type safety improvement that doesn't change runtime behavior or database storage—shortId values remain text fields in the database.

Automatic Initialization Features#

The database initialization process is fully automatic and requires no manual intervention:

Directory Creation: The database directory is created automatically if it doesn't exist
Legacy Migration: Existing ironmount.db databases are automatically renamed to zerobyte.db
Schema Application: All schema migrations from the app/drizzle directory are applied automatically
Constraint Enforcement: Foreign key constraints are enabled automatically to maintain referential integrity

This automatic initialization ensures that the application can start cleanly on first run or after updates without requiring manual database setup or migration execution.

Authentication Initialization#

Authentication System Setup#

Zerobyte uses better-auth for authentication, supporting both traditional email/password authentication and Single Sign-On (SSO) via OIDC and SAML protocols. The authentication system is configured with several security features:

Derived Secret: Authentication secret derived from APP_SECRET environment variable via HKDF (HMAC-based Key Derivation Function)
Secure Cookies: Cookie security configured based on the BASE_URL protocol—HTTPS URLs enable secure-only cookies
Database Persistence: Uses Drizzle adapter to store authentication data in SQLite
Multi-Factor Authentication: Supports two-factor authentication with backup codes
SSO Support: Integrates OIDC and SAML providers with organization-scoped configuration and automatic account linking

Authentication Configuration#

The authentication system is configured with the following features during startup:

Base URL Configuration: The baseURL is configured as an object with allowedHosts (including the base URL host and all trusted origins) and protocol set to "auto" to support both HTTP and HTTPS deployments.

SSO Error Handling: SSO authentication errors are routed through the /api/v1/auth/login-error endpoint, which maps Better Auth error messages to standardized error codes using mapAuthErrorToCode() before redirecting to the client-side login page with the appropriate error code parameter. Account linking errors (including "account not linked", "unable to link account", and "SSO account linking is not permitted for users outside this organization") are mapped to the ACCOUNT_LINK_REQUIRED error code, which displays the message: "SSO sign-in was blocked because this email already belongs to another user in this instance. Contact your administrator to resolve the account conflict."

Trusted Provider Linking: The authentication system uses Better Auth's native account.accountLinking.trustedProviders option with the ssoIntegration.resolveTrustedProviders callback (from app/server/modules/sso/sso.integration.ts) to enable automatic account linking. This callback queries the database to identify SSO providers with autoLinkMatchingEmails enabled for the organization, returning their provider IDs as trusted providers for automatic linking. Account linking is strictly organization-scoped—users cannot auto-link from one organization to another via SSO, even with matching email addresses. Existing users with personal organizations cannot auto-link to SSO organizations without explicit invitations, enforcing strict organization boundaries during authentication.

SSO Integration Architecture: The authentication system integrates with SSO through a dedicated module architecture. SSO functionality is implemented in app/server/modules/sso and integrated into the authentication layer through the ssoIntegration interface in app/server/lib/auth.ts. This interface provides the Better Auth SSO plugin, validation middlewares, callback detection, user creation handlers, organization membership resolution, and trusted provider resolution.

Testing Infrastructure: In test environments (NODE_ENV === "test"), the Better Auth testUtils() plugin is conditionally loaded to provide testing utilities. This plugin is not included in production builds.

Authentication Middleware Hooks#

Multiple middleware hooks customize the authentication flow:

validateSsoProviderId: Validates SSO provider identifiers to prevent the use of reserved provider IDs
validateSsoCallbackUrls: Validates SSO callback URLs during authentication to prevent redirect attacks
ensureOnlyOneUser: Blocks new user registration unless explicitly enabled, supporting single-user deployments
convertLegacyUserOnFirstLogin: Automatically migrates users from the legacy authentication system on first login
requireSsoInvitation: Enforces invite-only access for SSO authentication by checking for valid invitations before user creation

SSO Account Linking Validation Hook#

A dedicated database hook (account.create.before) validates SSO account linking before account creation:

Cross-Organization Link Protection: During SSO authentication, the hook calls canLinkSsoAccount() to validate that the user is permitted to link an SSO account
Existing Credential Account Detection: If the user has an existing credential (email/password) account, SSO account linking is blocked to prevent cross-organization account conflicts
Invitation-Based Exception: Users without existing credential accounts can link SSO accounts when they have valid pending invitations to the organization
Security-First Enforcement: This validation prevents users with existing accounts from linking to SSO organizations via email matching, even when they hold valid invitations

SSO-specific middlewares (such as validateSsoProviderId and validateSsoCallbackUrls) are integrated through the ssoIntegration.beforeMiddlewares interface, which registers them as authentication hooks. The authentication system integrates with the SSO module through the ssoIntegration interface, which provides beforeMiddlewares (for SSO-specific validation middleware), isSsoCallback (to detect SSO authentication flows), onUserCreate (to handle SSO-specific user creation logic), resolveOrgMembership (to assign users to SSO organizations via invitations), canLinkSsoAccount (to validate cross-organization account linking during SSO authentication), and resolveTrustedProviders (to enable automatic account linking for trusted SSO providers).

User Creation and Organization Provisioning#

The user creation flow includes automatic organization provisioning with support for both traditional and SSO authentication paths:

Traditional Authentication Path#

For email/password authentication, the user creation flow:

Admin Assignment: The first user automatically receives admin privileges
Username Generation: If no username is provided, generates a unique username using UUID

SSO Authentication Path#

For SSO authentication, the flow includes additional steps:

Invitation Validation: Before user creation, requireSsoInvitation (invoked through ssoIntegration.onUserCreate) checks for a valid, non-expired invitation matching the user's email and SSO provider's organization
Account Linking Validation: Before creating an SSO account, the system validates whether the user is permitted to link the account by calling canLinkSsoAccount(). This validation blocks SSO account linking for users with existing credential accounts, even if they have valid pending invitations, to prevent cross-organization account conflicts
Automatic Flags: SSO users have hasDownloadedResticPassword automatically set to true since they don't need password setup
Account Linking Restrictions: When an SSO provider has autoLinkMatchingEmails enabled, Better Auth can automatically link accounts with matching email addresses within the same organization. However, cross-organization auto-linking is prevented—users cannot link from their existing personal organization to an SSO organization without an explicit invitation, maintaining strict organization boundaries

Organization Assignment#

Organization provisioning occurs during session creation and enforces strict organization isolation during SSO authentication:

SSO Authentication Path (when an SSO provider matches the user's email domain):

SSO Provider Detection: The ssoIntegration.resolveOrgMembership method identifies if the authentication is via an SSO provider by examining the provider ID from the authentication context
Existing SSO Membership Check: If the user already has membership in the SSO provider's organization (from accepting a past invitation), proceeds with that organization
Invitation Requirement: If the user does NOT have existing membership in the SSO organization, requires a valid pending invitation to join
Invitation Rejection: Users without valid invitations are blocked from accessing the SSO organization, even if they have existing accounts or memberships in other organizations

Non-SSO Authentication Path (traditional email/password or no SSO provider match):

Existing Membership Check: If the user already has an organization membership, ensureDefaultOrg uses that organization
Default Organization Creation: For users without existing memberships, creates a new personal organization with:
- Generated organization slug based on the user's email prefix
- Unique Restic encryption password per organization, encrypted with APP_SECRET for secure storage
- Owner role assignment for the user

This authentication flow enforces strict organization boundaries by preventing users with existing accounts or personal organizations from auto-linking to SSO organizations without explicit invitations. Additionally, users with existing credential accounts are blocked from linking SSO accounts even when they hold valid invitations, preventing cross-organization account conflicts. This approach provides both cryptographic isolation between tenant backup repositories and security isolation preventing unauthorized cross-organization access through email matching.

Session Management#

Session management integrates with the organization context system:

Sessions are created with an activeOrganizationId that links users to their organization workspace
Organization provisioning happens automatically during session creation through the ssoIntegration.resolveOrgMembership method (for SSO authentication) or ensureDefaultOrg (for traditional authentication)
AsyncLocalStorage propagates organization ID throughout the request lifecycle
Request context enables multi-tenant isolation without explicit parameter passing

Organization Deletion and Session Cleanup: When an organization is deleted, the system automatically maintains session integrity by reassigning affected users' activeOrganizationId references. The cleanup process:

Identifies Affected Users: Finds all users who are members of organizations being deleted (excluding the user being deleted)
Gathers Alternative Memberships: Queries all organization memberships for affected users to identify potential fallback organizations
Selects Fallback Organizations: For each affected user, selects the first available organization membership that is not being deleted as their new active organization
Updates Sessions: Updates all sessions for affected users whose activeOrganizationId points to a deleted organization, reassigning them to their fallback organization (or null if no alternatives exist)
Transactional Consistency: Wraps the entire operation in a Drizzle ORM transaction to ensure atomicity—either all updates succeed together or none are applied. The transaction uses a synchronous callback pattern (rather than async), executing queries with .sync() (e.g., tx.query.member.findMany({...}).sync()) and mutations with .run() (e.g., tx.delete(organization).where(...).run())

This process ensures sessions never point to deleted organizations, preventing application errors from orphaned references and maintaining consistent session state.

SSO Configuration and Startup#

SSO provider configuration is managed through a dedicated SSO module:

Module Architecture: SSO functionality is implemented as a fully separate module in app/server/modules/sso with its own controller, service, and DTO files. The module integrates with the authentication system through the ssoIntegration interface in app/server/lib/auth.ts rather than being embedded within the auth module
Route Registration: SSO routes are mounted under /api/v1/auth and handled by a dedicated ssoController registered separately in app/server/app.ts via .route("/api/v1/auth", ssoController)
Public Provider Endpoints: The /auth/sso-providers endpoint returns all configured SSO providers with their organization slugs, enabling the login page to display available SSO authentication options
Provider Settings: Admin users can manage SSO providers through the /auth/sso-settings endpoint, which returns provider configurations, pending invitations, and auto-linking settings
Service Layer: The ssoService handles all SSO-related database operations and business logic
Organization Isolation: Each SSO provider is scoped to a specific organization through the organizationId foreign key, ensuring proper multi-tenant isolation
Database Initialization: SSO provider tables are created during Phase 1 database schema migrations and are immediately available for authentication after bootstrap completes

SSO Invitation System#

The SSO invitation system provides access control for SSO-based authentication:

Invitation Requirement: The requireSsoInvitation middleware enforces that users must have a valid, non-expired invitation before they can complete SSO sign-in for an organization
Invitation Validation: Validates that the invitation matches the user's email, is in "pending" status, belongs to the SSO provider's organization, and has not expired
Credential Account Restriction: Valid invitations allow new SSO users to access the organization, but do NOT override the restriction for users with existing credential accounts. The system blocks SSO account linking for existing credential account holders even when they have valid invitations, enforcing a security-first approach to cross-organization access control
Automatic Acceptance: When a user successfully signs in with SSO (and passes all validation checks), their invitation status is automatically updated to "accepted" and an organization membership is created
Invitation Management: Admin users can delete pending invitations through the /auth/sso-invitations/:invitationId endpoint

SSO Provider Management#

Admin users have access to comprehensive SSO provider management capabilities:

Provider Deletion: The /auth/sso-providers/:providerId endpoint allows admin users to delete SSO providers, which also removes all associated accounts
Auto-Linking Configuration: The /auth/sso-providers/:providerId/auto-linking endpoint enables or disables automatic account linking by email for trusted SSO providers
Organization Scoping: All SSO provider management operations are scoped to the admin user's active organization to ensure proper multi-tenant isolation
Provider Limits: The system enforces per-user provider limits—admin users can register up to 10 SSO providers, while non-admin users cannot register providers

Admin User Management#

The admin user management interface provides visibility and control over user accounts:

User Listing: The /auth/admin-users endpoint returns a list of all users with their roles, ban status, and linked authentication accounts
Account Linking Information: For each user, the endpoint includes details about linked accounts (both traditional email/password and SSO accounts), identified by provider ID
Account Unlinking: Admin users can unlink specific authentication accounts through the /auth/admin-users/:userId/accounts/:accountId endpoint
Last Account Protection: The system prevents deletion of a user's last authentication account to ensure users retain access to their accounts

Organization Switching#

The application sidebar includes an organization switcher that enables users to navigate between organizations:

Multi-Organization Access: Users can be members of multiple organizations with different roles (owner, admin, member)
Visual Organization List: The OrganizationSwitcher component displays all organizations the user has access to with organization logos or initials
Active Organization Context: The switcher indicates the currently active organization and allows switching between organizations
Session Persistence: Organization selection is persisted in the user's session through the activeOrganizationId field

Eager Database Loading#

Provisioning System#

The provisioning system allows operators to manage repositories and volumes through a JSON configuration file instead of the UI. Resources are synced at startup and appear as "managed" entries in the normal UI.

Provisioning File Configuration: When the PROVISIONING_PATH environment variable is set, the application reads the JSON configuration file during Phase 3 startup and synchronizes the defined resources with the database.

Secret Resolution: The provisioning service resolves secret references using two protocols:

env://VAR_NAME: Resolves environment variables at sync time
file://SECRET_NAME: Reads secrets from /run/secrets/SECRET_NAME (Docker secrets pattern)

Database Integration: Provisioned resources are stored with a provisioningId field that identifies them as operator-managed. The schema includes unique constraints on (organization_id, provisioning_id) pairs to prevent duplicates.

Resource Management: Provisioned resources appear alongside manually created resources in the UI with "Managed" badges. They are marked read-only to prevent accidental modifications through the UI. Each provisioned resource must specify an existing organizationId for proper multi-tenant isolation.

Sync Behavior: The provisioning sync is atomic—partial failures don't leave inconsistent state. Resources can be marked for deletion by setting "delete": true in the configuration file.

Configuration Schema Updates#

During startup, Zerobyte ensures all entities have the latest configuration schema. This process:

Updates volumes, repositories, and notification destinations to match current schema definitions
Identifies entities using their shortId field (typed as ShortId branded type) when calling service methods
Uses withContext to set organization context for multi-tenancy support
Processes each entity independently—failures for individual entities are logged but don't halt startup
Ensures backward compatibility as configuration schemas evolve

The system calls volumeService.updateVolume(volume.shortId, volume) and repositoriesService.updateRepository(repo.shortId, {}), where volume.shortId and repo.shortId are ShortId branded types that provide compile-time type safety when passing identifiers to service methods.

Volume Auto-Remounting#

To restore the pre-shutdown state, Zerobyte automatically remounts volumes during startup:

Conditional Execution: Auto-remounting only happens when local agent mode is DISABLED (!config.flags.enableLocalAgent). When local agent mode is enabled, volume remounting is handled by the agent itself, not during controller startup
Target Selection: Remounts volumes with status "mounted" OR volumes with autoRemount: true and status "error". The volumes selected for remounting are filtered to only include those with agentId matching LOCAL_AGENT_ID
Backend Support: Handles NFS, SMB, WebDAV, SFTP, and local directory backends
ShortId Identification: Volumes are identified and remounted using their shortId field (typed as ShortId branded type) by calling volumeService.mountVolume(volume.shortId)
Non-Blocking: Each volume mount is wrapped in a .catch() handler—mount failures don't prevent application startup or affect other volumes
One-Time Attempt: Each volume receives exactly one mount attempt during startup; continuous retry is handled by the scheduled VolumeAutoRemountJob

Backup State Recovery#

Zerobyte detects and handles interrupted backups during startup:

Identifies backups that were in "running" state when the application shut down
Marks these backups as "warning" status with explanatory messages
Prevents stuck backup states from blocking new backup execution
Provides visibility into interrupted operations through the UI

Enhanced Error Handling#

Zerobyte implements comprehensive error handling throughout the startup process, ensuring failures are caught, logged, and handled appropriately.

Database Migration Error Handling#

Database schema migrations use a fail-fast approach:

Migrations execute synchronously without catching errors
Any failure bubbles up to the bootstrap error handler
Bootstrap failures terminate the application, as it cannot function without a properly initialized database

This design ensures database consistency—the application never starts with a partially migrated schema.

Application Migration Error Classification#

Application-level migrations use a sophisticated checkpoint-based system with multiple error handling strategies:

Checkpoint System: Each migration is tracked in appMetadataTable to prevent re-execution on restart, ensuring idempotency.

Dependency Validation: Migrations can declare dependencies on other migrations. Missing dependencies cause immediate termination with a clear error message, preventing out-of-order execution.

Critical vs. Maintenance Migrations:

Critical Migrations: Halt the application with process.exit(1) on failure. These migrations are essential for data integrity.
Maintenance Migrations: Log errors but allow startup to continue. These migrations improve the system but aren't essential for operation.

Partial Success Tracking: Individual entity migration failures are collected and logged with detailed information, allowing administrators to identify which specific entities failed and why.

Fresh Install Detection: New installations skip migration execution and auto-checkpoint all migrations, as there's no existing data to transform.

Volume Mount Error Handling#

Volume mount operations implement layered error handling to maximize availability:

Error Isolation: Each volume mount is wrapped in a .catch() handler that logs the error but doesn't halt startup or affect other volumes.

Non-Blocking Failures: Volume mount failures don't prevent application startup or other volumes from mounting. The application remains available even when some storage backends are unreachable.

Status Persistence: Volume mount status is persisted to the database with error messages, providing visibility through the UI and enabling targeted retry attempts.

Timeout Protection#

Operations are protected by timeouts to prevent indefinite hangs:

Volume Backend Operations:

Timeout protection wrapper with 5-second default timeout
Mount, unmount, and health check operations wrapped with withTimeout()
Timeout errors are caught and converted to status "error" with appropriate error messages
SFTP uses a 10-second timeout (2x standard) for higher latency connections
Rclone mount operations use the centralized zbConfig.serverIdleTimeout configuration (via SERVER_IDLE_TIMEOUT environment variable, default 300 seconds), making timeout behavior consistent with repository operations and dynamically configurable

Repository Operations:

Repository initialization timeouts are dynamically configured based on the SERVER_IDLE_TIMEOUT environment variable (multiplied by 1000 for milliseconds) instead of using fixed values
The internal spawn utility has been refactored from exec to safeExec, providing more robust timeout and error detection
Timeout conditions are explicitly detected with a timedOut flag, enabling clearer error messages ("Command timed out before completing")
This dynamic configuration ties repository initialization timeouts to server settings, making timeout behavior more predictable and consistent across the application

Backend Fallback Strategies#

Some volume backends implement intelligent fallback mechanisms:

NFS: Falls back to mount with -i flag (interrupt flag) if initial mount fails, allowing recovery from some mount failures
WebDAV: Implements the same -i flag fallback
WebDAV Error Classification: Provides detailed error detection for connection refused, authentication failures, and configuration errors

Pre-Mount Health Checks#

Volume backends perform health checks before mounting:

If already mounted, returns immediately with success (avoiding duplicate operations)
If in error state, attempts unmount before remounting (clearing stale state)
Prevents duplicate mount attempts and resource conflicts

Structured Logging#

Error logging throughout the startup process provides detailed diagnostic information:

Migration failures include detailed entity information (entity type, ID, error message)
Volume mount errors include volume name and error message
Critical failures receive formatted multi-line messages with visual separators, clear explanations, resolution guidance, and support channels

Retry Logic and Reliability Improvements#

Startup Volume Remount#

During startup, volumes are auto-remounted with the following characteristics:

Target Volumes: Remounts volumes with status "mounted" (to restore pre-shutdown state) OR volumes with autoRemount: true and status "error" (to attempt recovery)
One-Time Attempt: Each volume receives exactly one mount attempt during startup
Error Isolation: Each volume mount is wrapped in .catch() to prevent cascading failures
Non-Blocking: Mount failures don't prevent application startup

Startup remounting provides immediate recovery for transient failures, while persistent failures are handled by scheduled retry jobs.

VolumeAutoRemountJob#

The VolumeAutoRemountJob provides continuous retry capability:

Schedule: Executes every 5 minutes after startup
Target Selection: Only processes volumes with status "error" and autoRemount: true
ShortId Usage: Calls volumeService.mountVolume(volume.shortId) to remount volumes, where volume.shortId is a ShortId branded type
Per-Volume Error Handling: Each volume mount is wrapped in try-catch to prevent one failure from affecting others
Job-Level Error Handling: The scheduler wraps all job execution in error handlers, logging failures without stopping the scheduler
Fixed Interval Strategy: Uses a 5-minute fixed interval rather than exponential backoff, prioritizing consistent recovery attempts over reducing load
No Retry Limits: Auto-remount jobs continue indefinitely—volumes remain eligible for remounting as long as they're in error state with autoRemount enabled

This design ensures volumes automatically recover when storage backends become available again, without requiring manual intervention.

Scheduler Initialization#

The cron-based scheduler is initialized during startup:

Starts the scheduler engine
Clears any existing scheduled tasks to prevent duplicates (important for hot reload scenarios)
Provides the foundation for all periodic background jobs

Background Job Scheduling#

Five periodic jobs are scheduled to maintain system health and execute backups:

Job	Schedule	Purpose
CleanupDanglingMountsJob	Hourly (conditional)	Cleanup of stale volume mounts. Only scheduled when local agent mode is DISABLED (`!config.flags.enableLocalAgent`). When local agent mode is enabled, this job is NOT scheduled on the controller because volume cleanup is agent-owned. The volumes cleaned up are filtered to only include those with `agentId` matching `LOCAL_AGENT_ID`.
VolumeHealthCheckJob	Every 30 minutes	Verify volume accessibility and update status
RepositoryHealthCheckJob	Daily at 12:50 PM	Check repository integrity and availability
BackupExecutionJob	Every minute	Check for scheduled backups to run and execute them
VolumeAutoRemountJob	Every 5 minutes	Attempt to remount failed volumes with auto-remount enabled

These jobs run continuously, providing system maintenance, health monitoring, and backup execution without manual intervention.

Graceful Degradation Philosophy#

Zerobyte is designed to maximize availability through graceful degradation:

Application starts even when:

Some volumes fail to mount
Non-critical migrations error
Configuration schema updates fail for individual entities

Individual failures don't cascade:

Volume mount failures are isolated per-volume
Background jobs continue operating despite individual job failures
Each entity (volume, repository, notification) is updated independently

This architecture ensures that partial failures don't result in total system unavailability. The application remains functional and continues attempting recovery automatically.

Graceful Shutdown#

Shutdown handling ensures clean termination:

Stops the scheduler, canceling all cron jobs
Conditionally unmounts volumes (only when local agent mode is DISABLED):
- When local agent mode is DISABLED (!config.flags.enableLocalAgent), the system unmounts all mounted volumes (NFS, SMB, WebDAV, SFTP) with agentId matching LOCAL_AGENT_ID
- When local agent mode is ENABLED, agent-owned volumes are NOT unmounted during controller shutdown, preserving agent-owned volumes
- Logs unmount status for each volume
Stops the application runtime by calling stopApplicationRuntime(), which stops the agent runtime

Graceful shutdown prevents resource leaks and ensures the local agent is cleanly stopped. Volume unmounting behavior is conditional—when local agent mode is active, the system preserves agent-owned volumes during controller shutdown.

Configuration and Constants#

Environment Variables#

Configuration is loaded from environment variables using the zod validation library:

Variable	Required	Default	Purpose
`APP_SECRET`	Yes	—	Master encryption key for database secrets and authentication
`BASE_URL`	Yes	—	Base URL for the application; determines cookie security mode (HTTPS → secure cookies)
`ZEROBYTE_DATABASE_URL`	No	`/var/lib/zerobyte/data/zerobyte.db`	SQLite database file path
`RESTIC_CACHE_DIR`	No	`/var/lib/zerobyte/restic/cache`	Restic cache directory
`PORT`	No	`4096`	HTTP server port
`PROVISIONING_PATH`	No	—	Path to JSON file containing repository and volume provisioning configuration
`TRUST_PROXY`	No	`false`	When set to `"true"`, trusts existing `X-Forwarded-For` headers from reverse proxies for client IP address determination. When `"false"` (default), uses the direct connection IP and ignores or sets its own `X-Forwarded-For` header. Important for deployments behind reverse proxies where accurate client IP tracking is needed
`ENABLE_LOCAL_AGENT`	No	`false`	When set to `"true"`, starts the local agent runtime during application bootstrap

System Constants#

System constants are defined for:

Volume Mount Paths: Locations where remote volumes are mounted in the filesystem
Repository Storage Locations: Paths for storing backup repositories
Restic Configuration Paths: Locations for Restic configuration files
Timeout Values: Default timeout of 5 seconds for most operations, 10 seconds for SFTP

These constants ensure consistent behavior across the application and provide a single source of truth for system-wide configuration.

Summary#

Zerobyte's startup orchestration system provides a robust foundation for reliable backup automation through:

Structured Initialization: Three-phase bootstrap sequence ensures proper ordering of database, migration, and runtime initialization
Automatic Schema Management: Database schema and data migrations are applied automatically with checkpoint tracking
Secure Authentication: Authentication system with derived secrets, multi-factor support, SSO integration (OIDC/SAML), and automatic organization provisioning
Eager Loading: Configuration updates, volume remounting, and backup state recovery restore system state immediately on startup
Comprehensive Error Handling: Layered error handling with fail-fast for critical failures and graceful degradation for non-critical operations
Continuous Retry: Automatic retry mechanisms for volume mounting with fixed intervals and no retry limits
Graceful Degradation: Application remains available even when individual components fail, with isolated error handling preventing cascading failures

This design ensures Zerobyte can start reliably, recover from failures automatically, and maintain service availability even in degraded conditions.