Workflow Recovery#
Workflow recovery is the mechanism by which DBOS re-executes workflows that were left in a PENDING state — typically because a process crashed or restarted before the workflow completed. Recovery re-runs the workflow from its last committed step, relying on idempotency stored in the system database.
Startup Recovery#
During DBOS._launch(), DBOS queries the system database for all PENDING workflows belonging to the current executor ID and application version, then submits a startup_recovery_thread to the background thread pool. This thread retries recovery for each pending workflow until it succeeds or encounters a non-retryable error.
This path only runs for local deployments — when neither a Conductor key nor the DBOS Cloud flag is set .
The public API recover_pending_workflows(executor_ids) provides the same capability on demand and returns WorkflowHandle objects for each recovered workflow.
Core Recovery Execution: execute_workflow_by_id#
All recovery paths converge on execute_workflow_by_id() in _core.py. Its steps:
- Fetch status from DB — retrieves the stored
workflow_statusrow, includingname,inputs,class_name,config_name, andqueue_name. - Deserialize inputs — deserializes the original arguments; if deserialization fails, immediately marks the workflow
ERRORrather than leaving it stuck inPENDING. - Look up the workflow function — resolves
status["name"]againstdbos._registry.workflow_info_map; raisesDBOSWorkflowFunctionNotFoundErrorif not found . - Inject class context — prepends the correct class instance or class object to
args(see below) . - Dispatch — calls
start_workflow(or its async variant) withis_recovery=True.
For queued workflows, _recover_workflow() first clears the queue assignment and returns a handle without re-dispatching .
Instance Lookup via Composite Key (class_name/config_name)#
Workflows belonging to a DBOSConfiguredInstance store both class_name and config_name in the workflow_status table. During recovery, execute_workflow_by_id constructs the composite key "{class_name}/{config_name}" and looks it up in dbos._registry.instance_info_map . If the key is absent, DBOSWorkflowFunctionNotFoundError is raised.
For non-configured class methods (static/class methods without a config_name), the class object is resolved from dbos._registry.class_info_map by class_name alone, and is only prepended to args when the function type is not DBOSFuncType.Static .
Automatic Instance Registration#
Instances are registered automatically when a DBOSConfiguredInstance subclass is instantiated. The chain is:
DBOSConfiguredInstance.__init__callsDBOS.register_instance(self).DBOSRegistry.register_instancecomputes the composite key and stores the instance ininstance_info_map.
A warning is logged if registration happens after DBOS.launch(), since recovery may already be in progress . Registering two different objects under the same key raises an exception .
All
DBOSConfiguredInstanceobjects must be created beforeDBOS.launch()is called.
Retry Logic for Unresolved Functions#
startup_recovery_thread uses a retry loop with a key distinction :
| Exception | Behavior |
|---|---|
DBOSWorkflowFunctionNotFoundError | Log a warning with the workflow ID and exception message, sleep 1 s and retry — the function may be registered soon by another thread finishing initialization |
| Any other exception | Log the error and remove the workflow from the retry list (no further attempts) |
This retry behavior accommodates late registration — for example, when a class instance is constructed on a background thread shortly after launch().
Once a workflow exceeds its max_recovery_attempts (default: 100, configurable via @DBOS.workflow(max_recovery_attempts=N)) , its status is set to MAX_RECOVERY_ATTEMPTS_EXCEEDED and it will no longer be picked up for recovery.
Key Files#
| File | Purpose |
|---|---|
dbos/_recovery.py | startup_recovery_thread, recover_pending_workflows, _recover_workflow |
dbos/_core.py | execute_workflow_by_id — core recovery dispatch |
dbos/_dbos.py | DBOSRegistry.register_instance, DBOSConfiguredInstance.__init__ |
dbos/_error.py | DBOSRecoveryError, DBOSWorkflowFunctionNotFoundError, MaxRecoveryAttemptsExceededError |