How can I enable and use GPU acceleration with Docling?

To enable GPU acceleration in Docling (v2.12.0 or later), you can configure it either in your Python code or via environment variables:

In Python code:

from docling.datamodel.accelerator_options import AcceleratorOptions, AcceleratorDevice
from docling.datamodel.pipeline_options import ThreadedPdfPipelineOptions

pipeline_options = ThreadedPdfPipelineOptions(
    accelerator_options=AcceleratorOptions(
        device=AcceleratorDevice.CUDA,
    ),
    ocr_batch_size=4,
    layout_batch_size=64,
    table_batch_size=4,
)

Via environment variable:
Set DOCLING_DEVICE=cuda (or cuda:0, cuda:1 for specific GPUs).

Supported devices:

cuda (NVIDIA GPUs)
mps (Apple Silicon)
xpu (Intel GPUs)
auto (automatic detection, default)

Docker usage:
Use CUDA-enabled images such as ghcr.io/docling-project/docling-serve-cu124, cu126, or cu128. The cu128 image supports both linux/amd64 and linux/arm64 platforms, enabling GPU acceleration on ARM64 systems with NVIDIA GPUs (such as NVIDIA Jetson devices or ARM-based cloud instances with GPU support). For Docker, ensure you have the NVIDIA Container Toolkit installed.

Optional: Flash Attention 2
For Ampere+ NVIDIA GPUs, you can enable Flash Attention 2 for improved speed and memory usage by setting cuda_use_flash_attention2=True in AcceleratorOptions or DOCLING_CUDA_USE_FLASH_ATTENTION2=true as an environment variable.

Requirements:

An NVIDIA GPU
Correct drivers installed
(For Docker) NVIDIA Container Toolkit