Documents
Workflow Recovery
Workflow Recovery
Type
Topic
Status
Published
Created
Apr 20, 2026
Updated
Apr 20, 2026
Created by
Dosu Bot
Updated by
Dosu Bot

Workflow Recovery#

Workflow recovery is the mechanism by which DBOS re-executes workflows that were left in a PENDING state — typically because a process crashed or restarted before the workflow completed. Recovery re-runs the workflow from its last committed step, relying on idempotency stored in the system database.


Startup Recovery#

During DBOS._launch(), DBOS queries the system database for all PENDING workflows belonging to the current executor ID and application version, then submits a startup_recovery_thread to the background thread pool. This thread retries recovery for each pending workflow until it succeeds or encounters a non-retryable error.

This path only runs for local deployments — when neither a Conductor key nor the DBOS Cloud flag is set .

The public API recover_pending_workflows(executor_ids) provides the same capability on demand and returns WorkflowHandle objects for each recovered workflow.


Core Recovery Execution: execute_workflow_by_id#

All recovery paths converge on execute_workflow_by_id() in _core.py. Its steps:

  1. Fetch status from DB — retrieves the stored workflow_status row, including name, inputs, class_name, config_name, and queue_name .
  2. Deserialize inputs — deserializes the original arguments; if deserialization fails, immediately marks the workflow ERROR rather than leaving it stuck in PENDING .
  3. Look up the workflow function — resolves status["name"] against dbos._registry.workflow_info_map; raises DBOSWorkflowFunctionNotFoundError if not found .
  4. Inject class context — prepends the correct class instance or class object to args (see below) .
  5. Dispatch — calls start_workflow (or its async variant) with is_recovery=True.

For queued workflows, _recover_workflow() first clears the queue assignment and returns a handle without re-dispatching .


Instance Lookup via Composite Key (class_name/config_name)#

Workflows belonging to a DBOSConfiguredInstance store both class_name and config_name in the workflow_status table. During recovery, execute_workflow_by_id constructs the composite key "{class_name}/{config_name}" and looks it up in dbos._registry.instance_info_map . If the key is absent, DBOSWorkflowFunctionNotFoundError is raised.

For non-configured class methods (static/class methods without a config_name), the class object is resolved from dbos._registry.class_info_map by class_name alone, and is only prepended to args when the function type is not DBOSFuncType.Static .


Automatic Instance Registration#

Instances are registered automatically when a DBOSConfiguredInstance subclass is instantiated. The chain is:

  1. DBOSConfiguredInstance.__init__ calls DBOS.register_instance(self).
  2. DBOSRegistry.register_instance computes the composite key and stores the instance in instance_info_map.

A warning is logged if registration happens after DBOS.launch(), since recovery may already be in progress . Registering two different objects under the same key raises an exception .

All DBOSConfiguredInstance objects must be created before DBOS.launch() is called.


Retry Logic for Unresolved Functions#

startup_recovery_thread uses a retry loop with a key distinction :

ExceptionBehavior
DBOSWorkflowFunctionNotFoundErrorLog a warning with the workflow ID and exception message, sleep 1 s and retry — the function may be registered soon by another thread finishing initialization
Any other exceptionLog the error and remove the workflow from the retry list (no further attempts)

This retry behavior accommodates late registration — for example, when a class instance is constructed on a background thread shortly after launch().

Once a workflow exceeds its max_recovery_attempts (default: 100, configurable via @DBOS.workflow(max_recovery_attempts=N)) , its status is set to MAX_RECOVERY_ATTEMPTS_EXCEEDED and it will no longer be picked up for recovery.


Key Files#

FilePurpose
dbos/_recovery.pystartup_recovery_thread, recover_pending_workflows, _recover_workflow
dbos/_core.pyexecute_workflow_by_id — core recovery dispatch
dbos/_dbos.pyDBOSRegistry.register_instance, DBOSConfiguredInstance.__init__
dbos/_error.pyDBOSRecoveryError, DBOSWorkflowFunctionNotFoundError, MaxRecoveryAttemptsExceededError