Documents
How can I enable and use GPU acceleration with Docling?
How can I enable and use GPU acceleration with Docling?
Type
Answer
Status
Published
Created
Feb 17, 2026
Updated
Feb 23, 2026
Created by
Dosu Bot
Updated by
Dosu Bot

To enable GPU acceleration in Docling (v2.12.0 or later), you can configure it either in your Python code or via environment variables:

In Python code:

from docling.datamodel.accelerator_options import AcceleratorOptions, AcceleratorDevice
from docling.datamodel.pipeline_options import ThreadedPdfPipelineOptions

pipeline_options = ThreadedPdfPipelineOptions(
    accelerator_options=AcceleratorOptions(
        device=AcceleratorDevice.CUDA,
    ),
    ocr_batch_size=4,
    layout_batch_size=64,
    table_batch_size=4,
)

Via environment variable:
Set DOCLING_DEVICE=cuda (or cuda:0, cuda:1 for specific GPUs).

Supported devices:

  • cuda (NVIDIA GPUs)
  • mps (Apple Silicon)
  • xpu (Intel GPUs)
  • auto (automatic detection, default)

Docker usage:
Use CUDA-enabled images such as ghcr.io/docling-project/docling-serve-cu124, cu126, or cu128. The cu128 image supports both linux/amd64 and linux/arm64 platforms, enabling GPU acceleration on ARM64 systems with NVIDIA GPUs (such as NVIDIA Jetson devices or ARM-based cloud instances with GPU support). For Docker, ensure you have the NVIDIA Container Toolkit installed.

Optional: Flash Attention 2
For Ampere+ NVIDIA GPUs, you can enable Flash Attention 2 for improved speed and memory usage by setting cuda_use_flash_attention2=True in AcceleratorOptions or DOCLING_CUDA_USE_FLASH_ATTENTION2=true as an environment variable.

Requirements:

  • An NVIDIA GPU
  • Correct drivers installed
  • (For Docker) NVIDIA Container Toolkit
How can I enable and use GPU acceleration with Docling? | Dosu