To enable GPU acceleration in Docling (v2.12.0 or later), you can configure it either in your Python code or via environment variables:
In Python code:
from docling.datamodel.accelerator_options import AcceleratorOptions, AcceleratorDevice
from docling.datamodel.pipeline_options import ThreadedPdfPipelineOptions
pipeline_options = ThreadedPdfPipelineOptions(
accelerator_options=AcceleratorOptions(
device=AcceleratorDevice.CUDA,
),
ocr_batch_size=4,
layout_batch_size=64,
table_batch_size=4,
)
Via environment variable:
Set DOCLING_DEVICE=cuda (or cuda:0, cuda:1 for specific GPUs).
Supported devices:
cuda(NVIDIA GPUs)mps(Apple Silicon)xpu(Intel GPUs)auto(automatic detection, default)
Docker usage:
Use CUDA-enabled images such as ghcr.io/docling-project/docling-serve-cu124, cu126, or cu128. The cu128 image supports both linux/amd64 and linux/arm64 platforms, enabling GPU acceleration on ARM64 systems with NVIDIA GPUs (such as NVIDIA Jetson devices or ARM-based cloud instances with GPU support). For Docker, ensure you have the NVIDIA Container Toolkit installed.
Optional: Flash Attention 2
For Ampere+ NVIDIA GPUs, you can enable Flash Attention 2 for improved speed and memory usage by setting cuda_use_flash_attention2=True in AcceleratorOptions or DOCLING_CUDA_USE_FLASH_ATTENTION2=true as an environment variable.
Requirements:
- An NVIDIA GPU
- Correct drivers installed
- (For Docker) NVIDIA Container Toolkit