How can I use Docling's REST API to convert a PDF from a URL to Markdown with OpenAI-generated image descriptions, and is base64 encoding required for URL sources?

Example: Convert PDF from URL to Markdown with AI-generated image descriptions#

Specify the PDF's URL in your JSON payload using the sources array with kind: "http". To control how image alt text is generated in the Markdown output, use the image_alt_mode option:

static (default): Alt text is always "Image".
caption: Uses the image's caption text as alt text (if available).
description: Uses the AI-generated description as alt text (if available).

When using description, enable image description generation by setting do_picture_description: true and configuring either picture_description_preset (recommended), picture_description_custom_config, or (deprecated) picture_description_api.

Note: Using picture_description_custom_config requires administrator permission. Administrators must set the environment variable DOCLING_SERVE_ALLOW_CUSTOM_PICTURE_DESCRIPTION_CONFIG=true to enable custom picture description configurations. By default, this is disabled (false), and attempting to use custom configurations will result in a 422 error response. Verify with your Docling Serve administrator that custom configurations are enabled before using this feature.

Option	Type	Description
`to_formats`	list	Output formats. Allowed values: `md`, `json`, `yaml`, `html`, `html_split_page`, `text`, `doctags`, `vtt`. Example: `["md"]`.
`do_picture_description`	bool	If enabled, describe pictures in documents. Boolean. Optional, defaults to false.
`picture_description_preset`	str or null	Preset ID for picture description. Recommended for most users.
`picture_description_custom_config`	dict or null	Custom configuration for picture description (advanced). Requires administrator permission - see note above.
`picture_description_api`	dict or null	DEPRECATED: API details for using a vision-language model in the picture description. Please migrate to `picture_description_preset` or `picture_description_custom_config`.
`picture_description_area_threshold`	float	Minimum percentage of the area for a picture to be processed with the models.
`image_alt_mode`	str	Controls how alt text is generated for images: `static`, `caption`, or `description`.
`chart_extraction_model`	str or null	Enables chart data extraction using the specified model. Optional, defaults to `null` (disabled). Valid values: `"granite-vision"` or `"granite-vision-v4"`.
`callbacks`	list	Optional list of callback/webhook specifications (`CallbackSpec`, from `docling_jobkit.datamodel.callback`) to invoke during or after document processing. Available on both conversion (`ConvertDocumentsRequest`) and chunking (`BaseChunkDocumentsRequest`) requests. Defaults to `[]`.

{
  "options": {
    "to_formats": ["md"],
    "do_picture_description": true,
    "picture_description_preset": "granite_vision", // or "default" or another preset
    // "picture_description_custom_config": { ... }, // advanced, see usage docs
    // "picture_description_api": { ... }, // deprecated
    "picture_description_area_threshold": 0,
    "image_alt_mode": "description",
    "chart_extraction_model": "granite-vision" // or "granite-vision-v4" for GV4 model
  },
  "sources": [
    { "kind": "http", "url": "https://example.com/your.pdf" }
  ],
  "callbacks": []
}

// For legacy compatibility, you may still use:
// "picture_description_api": { ... } (deprecated)

// Example with custom config and classification filters:
{
  "options": {
    "to_formats": ["md"],
    "do_picture_description": true,
    "picture_description_custom_config": {
      "api_key": "sk-...",
      "model": "gpt-4o",
      "classification_allow": ["Photo"],
      "classification_min_confidence": 0.8
    },
    "image_alt_mode": "description"
  },
  "sources": [
    { "kind": "http", "url": "https://example.com/your.pdf" }
  ],
  "callbacks": []
}

Chart Extraction Models#

Use chart_extraction_model to extract data from bar, pie, and line charts:

"granite-vision": Uses the original Granite Vision model to extract chart data as a table.
"granite-vision-v4": Uses the Granite Vision 4.0 model (ibm-granite/granite-4.0-3b-vision); supports CSV, Python code, and summary outputs for charts.

To disable chart extraction, omit the field or set it to null.

Alt Text Behavior#

If image_alt_mode is set to description and an AI-generated description is available, it will be used as the alt text in the Markdown image tag (e.g., ![AI-generated description](...)).
If no description is available, it falls back to "Image".
If set to caption, the image's caption will be used as alt text if present; otherwise, it falls back to "Image".
The default (static) always uses "Image" as alt text.

Picture Classification Filters#

When using picture_description_custom_config, you can control which images are described based on Docling's built-in picture classifier. These filters help optimize API usage and costs by selectively describing only certain types of images or images where the classifier is confident:

classification_allow: List of picture classification labels (e.g., ["Photo", "Chart"]). Only images whose predicted class is in this allow-list will be described.
classification_deny: List of picture classification labels. Images whose predicted class is in this deny-list will not be described.
classification_min_confidence: Minimum classification confidence (0.0-1.0). Only images where the classifier meets or exceeds this confidence threshold will be described.

These filters work with Docling's picture classifier, which categorizes images into types such as Photo, Chart, Diagram, and others. By combining these filters, you can fine-tune which images receive AI-generated descriptions based on their type and the classifier's confidence.

Legacy Source Array#

You can also use the legacy http_sources array:

{
  "options": {
    ...
    "image_alt_mode": "caption"
  },
  "http_sources": [
    { "url": "https://example.com/your.pdf" }
  ]
}

Base64 Encoding#

Not required for URL sources (kind: "http"). Docling will fetch the file directly.
Required only for direct file uploads (kind: "file").
You may include HTTP headers if the URL requires authentication.

Task Status Polling and Error Handling#

When using asynchronous conversion endpoints, you can poll the task status endpoint to track conversion progress. The task status response provides information about the current state of your conversion task.

Docling-serve includes automatic zombie task reconciliation and watchdog-based failure detection that prevents infinite polling loops. When an RQ job expires or is lost (due to Redis eviction, worker restart, etc.) while metadata persists, the system automatically detects this condition and reconciles the state, marking orphaned tasks as FAILURE with a descriptive error message.

Task Status Response#

The task status response includes:

task_id: The unique identifier for your task
task_status: Current status (PENDING, STARTED, SUCCESS, FAILURE)
task_position: Queue position (if still queued)
task_meta: Processing metadata (document counts, progress)
error_message: Descriptive error information when tasks fail (available when task_status is FAILURE)

Task Status Resolution#

The system resolves task status using a three-step process designed to prevent stale RQ statuses from overwriting watchdog-published terminal states:

Redis terminal-state gate: Redis is checked first for terminal states (SUCCESS, FAILURE, CANCELLED). If a terminal state is found in Redis, it is returned immediately without consulting RQ. This prevents stale RQ STARTED statuses (which can persist up to 4 hours after a worker kill) from overwriting watchdog-published FAILURE states.
RQ authoritative check: RQ is consulted only for tasks not already in a terminal state. RQ provides the authoritative status for actively running jobs. Returns a Task object when the job is non-PENDING, None for PENDING jobs, or a special marker when the job is not found (NoSuchJobError).
Redis fallback: Reached only when RQ had no useful answer (e.g., job is PENDING or has expired). Handles job-gone reconciliation and stale-status cross-checks using the same Redis key as step 1, but serving a different role in the resolution process.

Watchdog Failure Detection#

Docling-serve includes a watchdog mechanism (part of the docling-jobkit worker heartbeat thread and orchestrator watchdog task) that monitors worker health:

Heartbeat monitoring: Workers send periodic heartbeat signals to indicate they are actively processing tasks
Automatic failure marking: When a worker's heartbeat expires beyond the grace period, the watchdog automatically publishes a FAILURE state to Redis
Terminal state protection: Watchdog-published FAILURE states are authoritative and protected from being overwritten by stale RQ statuses through the terminal-state gate (step 1 above)

This ensures that tasks whose workers have died or become unresponsive are promptly marked as failed, even if the RQ job metadata still shows STARTED.

Automatic Zombie Task Detection#

Docling-serve automatically detects and reconciles zombie/orphaned tasks:

Automatic reconciliation: When the system detects that an RQ job has expired but status metadata shows PENDING or STARTED, it automatically marks the task as FAILURE
Orphaned task error messages: Tasks orphaned due to RQ job expiry receive a descriptive error_message such as: "Task orphaned: RQ job expired while status was STARTED. Likely caused by worker restart or Redis eviction."
Background cleanup: A background reaper process runs periodically (default: every 5 minutes) to remove completed tasks from memory that are older than a threshold (default: 1 hour)
TTL alignment: Task metadata TTL matches the result TTL at 4 hours, reducing the window where a task shows SUCCESS but the result returns 404

Error Message Field#

The error_message field provides detailed diagnostic information when tasks fail, helping you understand what went wrong:

Watchdog failures: Tasks marked as FAILURE by the watchdog mechanism when worker heartbeats expire, indicating the worker died or became unresponsive during processing
Task orphaning: Automatically detected when RQ jobs expire while tasks show PENDING or STARTED status, with error messages indicating the likely cause (worker restart or Redis eviction)
Infrastructure issues: Details about worker restarts, Redis connection problems, or job TTL expiry
General failures: Descriptive error messages for task processing failures

Best Practices for Async Conversions#

Poll for status: Regularly check the task status endpoint to monitor conversion progress
Check error messages: When a task returns FAILURE status, inspect the error_message field for diagnostic details, including watchdog-detected worker failures
Terminal state reliability: Once a task reaches a terminal state (SUCCESS, FAILURE, CANCELLED), the status is authoritative and will not change, even if stale data exists in RQ
Implement timeouts: Set reasonable polling timeouts to avoid indefinite waiting
Retry logic: For infrastructure-related failures (indicated in error_message), consider implementing retry logic

Example: Polling Task Status#

After submitting a conversion request, poll the status endpoint:

GET /api/v1/tasks/{task_id}/status

Response when successful:

{
  "task_id": "abc123",
  "task_status": "SUCCESS",
  "task_meta": {
    "num_docs": 1,
    "num_processed": 1,
    "num_succeeded": 1,
    "num_failed": 0
  }
}

Response when failed:

{
  "task_id": "abc123",
  "task_status": "FAILURE",
  "error_message": "Task orphaned: RQ job expired while status was STARTED. Likely caused by worker restart or Redis eviction.",
  "task_meta": {
    "num_docs": 1,
    "num_processed": 0,
    "num_succeeded": 0,
    "num_failed": 1
  }
}

References#

Summary#

To generate Markdown with dynamic image alt text, set image_alt_mode to static, caption, or description in your API payload. For AI-generated alt text, use description and configure the OpenAI API options. Base64 encoding is only needed for direct file uploads, not for URLs.

For asynchronous conversions, poll the task status endpoint to monitor progress. The system uses a three-step status resolution process that protects terminal states from being overwritten by stale RQ data. Watchdog-based failure detection automatically marks tasks as failed when worker heartbeats expire, and zombie task reconciliation handles orphaned tasks. Check the error_message field when tasks fail to diagnose issues such as watchdog-detected worker failures, task orphaning, or infrastructure problems.