Example: Convert PDF from URL to Markdown with AI-generated image descriptions#
Specify the PDF's URL in your JSON payload using the sources array with kind: "http". To control how image alt text is generated in the Markdown output, use the image_alt_mode option:
static(default): Alt text is always "Image".caption: Uses the image's caption text as alt text (if available).description: Uses the AI-generated description as alt text (if available).
When using description, enable image description generation by setting do_picture_description: true and configuring either picture_description_preset (recommended), picture_description_custom_config, or (deprecated) picture_description_api.
Note: Using
picture_description_custom_configrequires administrator permission. Administrators must set the environment variableDOCLING_SERVE_ALLOW_CUSTOM_PICTURE_DESCRIPTION_CONFIG=trueto enable custom picture description configurations. By default, this is disabled (false), and attempting to use custom configurations will result in a 422 error response. Verify with your Docling Serve administrator that custom configurations are enabled before using this feature.
| Option | Type | Description |
|---|---|---|
to_formats | list | Output formats. Allowed values: md, json, yaml, html, html_split_page, text, doctags, vtt. Example: ["md"]. |
do_picture_description | bool | If enabled, describe pictures in documents. Boolean. Optional, defaults to false. |
picture_description_preset | str or null | Preset ID for picture description. Recommended for most users. |
picture_description_custom_config | dict or null | Custom configuration for picture description (advanced). Requires administrator permission - see note above. |
picture_description_api | dict or null | DEPRECATED: API details for using a vision-language model in the picture description. Please migrate to picture_description_preset or picture_description_custom_config. |
picture_description_area_threshold | float | Minimum percentage of the area for a picture to be processed with the models. |
image_alt_mode | str | Controls how alt text is generated for images: static, caption, or description. |
chart_extraction_model | str or null | Enables chart data extraction using the specified model. Optional, defaults to null (disabled). Valid values: "granite-vision" or "granite-vision-v4". |
callbacks | list | Optional list of callback/webhook specifications (CallbackSpec, from docling_jobkit.datamodel.callback) to invoke during or after document processing. Available on both conversion (ConvertDocumentsRequest) and chunking (BaseChunkDocumentsRequest) requests. Defaults to []. |
{
"options": {
"to_formats": ["md"],
"do_picture_description": true,
"picture_description_preset": "granite_vision", // or "default" or another preset
// "picture_description_custom_config": { ... }, // advanced, see usage docs
// "picture_description_api": { ... }, // deprecated
"picture_description_area_threshold": 0,
"image_alt_mode": "description",
"chart_extraction_model": "granite-vision" // or "granite-vision-v4" for GV4 model
},
"sources": [
{ "kind": "http", "url": "https://example.com/your.pdf" }
],
"callbacks": []
}
// For legacy compatibility, you may still use:
// "picture_description_api": { ... } (deprecated)
// Example with custom config and classification filters:
{
"options": {
"to_formats": ["md"],
"do_picture_description": true,
"picture_description_custom_config": {
"api_key": "sk-...",
"model": "gpt-4o",
"classification_allow": ["Photo"],
"classification_min_confidence": 0.8
},
"image_alt_mode": "description"
},
"sources": [
{ "kind": "http", "url": "https://example.com/your.pdf" }
],
"callbacks": []
}
Chart Extraction Models#
Use chart_extraction_model to extract data from bar, pie, and line charts:
"granite-vision": Uses the original Granite Vision model to extract chart data as a table."granite-vision-v4": Uses the Granite Vision 4.0 model (ibm-granite/granite-4.0-3b-vision); supports CSV, Python code, and summary outputs for charts.
To disable chart extraction, omit the field or set it to null.
Alt Text Behavior#
- If
image_alt_modeis set todescriptionand an AI-generated description is available, it will be used as the alt text in the Markdown image tag (e.g.,). - If no description is available, it falls back to "Image".
- If set to
caption, the image's caption will be used as alt text if present; otherwise, it falls back to "Image". - The default (
static) always uses "Image" as alt text.
Picture Classification Filters#
When using picture_description_custom_config, you can control which images are described based on Docling's built-in picture classifier. These filters help optimize API usage and costs by selectively describing only certain types of images or images where the classifier is confident:
classification_allow: List of picture classification labels (e.g.,["Photo", "Chart"]). Only images whose predicted class is in this allow-list will be described.classification_deny: List of picture classification labels. Images whose predicted class is in this deny-list will not be described.classification_min_confidence: Minimum classification confidence (0.0-1.0). Only images where the classifier meets or exceeds this confidence threshold will be described.
These filters work with Docling's picture classifier, which categorizes images into types such as Photo, Chart, Diagram, and others. By combining these filters, you can fine-tune which images receive AI-generated descriptions based on their type and the classifier's confidence.
Legacy Source Array#
You can also use the legacy http_sources array:
{
"options": {
...
"image_alt_mode": "caption"
},
"http_sources": [
{ "url": "https://example.com/your.pdf" }
]
}
Base64 Encoding#
- Not required for URL sources (
kind: "http"). Docling will fetch the file directly. - Required only for direct file uploads (
kind: "file"). - You may include HTTP headers if the URL requires authentication.
Task Status Polling and Error Handling#
When using asynchronous conversion endpoints, you can poll the task status endpoint to track conversion progress. The task status response provides information about the current state of your conversion task.
Docling-serve includes automatic zombie task reconciliation and watchdog-based failure detection that prevents infinite polling loops. When an RQ job expires or is lost (due to Redis eviction, worker restart, etc.) while metadata persists, the system automatically detects this condition and reconciles the state, marking orphaned tasks as FAILURE with a descriptive error message.
Task Status Response#
The task status response includes:
task_id: The unique identifier for your tasktask_status: Current status (PENDING,STARTED,SUCCESS,FAILURE)task_position: Queue position (if still queued)task_meta: Processing metadata (document counts, progress)error_message: Descriptive error information when tasks fail (available whentask_statusisFAILURE)
Task Status Resolution#
The system resolves task status using a three-step process designed to prevent stale RQ statuses from overwriting watchdog-published terminal states:
-
Redis terminal-state gate: Redis is checked first for terminal states (
SUCCESS,FAILURE,CANCELLED). If a terminal state is found in Redis, it is returned immediately without consulting RQ. This prevents stale RQSTARTEDstatuses (which can persist up to 4 hours after a worker kill) from overwriting watchdog-publishedFAILUREstates. -
RQ authoritative check: RQ is consulted only for tasks not already in a terminal state. RQ provides the authoritative status for actively running jobs. Returns a
Taskobject when the job is non-PENDING,NoneforPENDINGjobs, or a special marker when the job is not found (NoSuchJobError). -
Redis fallback: Reached only when RQ had no useful answer (e.g., job is
PENDINGor has expired). Handles job-gone reconciliation and stale-status cross-checks using the same Redis key as step 1, but serving a different role in the resolution process.
Watchdog Failure Detection#
Docling-serve includes a watchdog mechanism (part of the docling-jobkit worker heartbeat thread and orchestrator watchdog task) that monitors worker health:
- Heartbeat monitoring: Workers send periodic heartbeat signals to indicate they are actively processing tasks
- Automatic failure marking: When a worker's heartbeat expires beyond the grace period, the watchdog automatically publishes a
FAILUREstate to Redis - Terminal state protection: Watchdog-published
FAILUREstates are authoritative and protected from being overwritten by stale RQ statuses through the terminal-state gate (step 1 above)
This ensures that tasks whose workers have died or become unresponsive are promptly marked as failed, even if the RQ job metadata still shows STARTED.
Automatic Zombie Task Detection#
Docling-serve automatically detects and reconciles zombie/orphaned tasks:
- Automatic reconciliation: When the system detects that an RQ job has expired but status metadata shows
PENDINGorSTARTED, it automatically marks the task asFAILURE - Orphaned task error messages: Tasks orphaned due to RQ job expiry receive a descriptive
error_messagesuch as: "Task orphaned: RQ job expired while status was STARTED. Likely caused by worker restart or Redis eviction." - Background cleanup: A background reaper process runs periodically (default: every 5 minutes) to remove completed tasks from memory that are older than a threshold (default: 1 hour)
- TTL alignment: Task metadata TTL matches the result TTL at 4 hours, reducing the window where a task shows
SUCCESSbut the result returns 404
Error Message Field#
The error_message field provides detailed diagnostic information when tasks fail, helping you understand what went wrong:
- Watchdog failures: Tasks marked as
FAILUREby the watchdog mechanism when worker heartbeats expire, indicating the worker died or became unresponsive during processing - Task orphaning: Automatically detected when RQ jobs expire while tasks show
PENDINGorSTARTEDstatus, with error messages indicating the likely cause (worker restart or Redis eviction) - Infrastructure issues: Details about worker restarts, Redis connection problems, or job TTL expiry
- General failures: Descriptive error messages for task processing failures
Best Practices for Async Conversions#
- Poll for status: Regularly check the task status endpoint to monitor conversion progress
- Check error messages: When a task returns
FAILUREstatus, inspect theerror_messagefield for diagnostic details, including watchdog-detected worker failures - Terminal state reliability: Once a task reaches a terminal state (
SUCCESS,FAILURE,CANCELLED), the status is authoritative and will not change, even if stale data exists in RQ - Implement timeouts: Set reasonable polling timeouts to avoid indefinite waiting
- Retry logic: For infrastructure-related failures (indicated in
error_message), consider implementing retry logic
Example: Polling Task Status#
After submitting a conversion request, poll the status endpoint:
GET /api/v1/tasks/{task_id}/status
Response when successful:
{
"task_id": "abc123",
"task_status": "SUCCESS",
"task_meta": {
"num_docs": 1,
"num_processed": 1,
"num_succeeded": 1,
"num_failed": 0
}
}
Response when failed:
{
"task_id": "abc123",
"task_status": "FAILURE",
"error_message": "Task orphaned: RQ job expired while status was STARTED. Likely caused by worker restart or Redis eviction.",
"task_meta": {
"num_docs": 1,
"num_processed": 0,
"num_succeeded": 0,
"num_failed": 1
}
}
References#
Summary#
To generate Markdown with dynamic image alt text, set image_alt_mode to static, caption, or description in your API payload. For AI-generated alt text, use description and configure the OpenAI API options. Base64 encoding is only needed for direct file uploads, not for URLs.
For asynchronous conversions, poll the task status endpoint to monitor progress. The system uses a three-step status resolution process that protects terminal states from being overwritten by stale RQ data. Watchdog-based failure detection automatically marks tasks as failed when worker heartbeats expire, and zombie task reconciliation handles orphaned tasks. Check the error_message field when tasks fail to diagnose issues such as watchdog-detected worker failures, task orphaning, or infrastructure problems.