Documents
How can I enable and use GPU acceleration with Docling?
How can I enable and use GPU acceleration with Docling?
Type
Answer
Status
Published
Created
Feb 17, 2026
Updated
Jun 1, 2026
Created by
Dosu Bot
Updated by
Dosu Bot

To enable GPU acceleration in Docling (v2.12.0 or later), you can configure it either in your Python code or via environment variables:

In Python code:

from docling.datamodel.accelerator_options import AcceleratorOptions, AcceleratorDevice
from docling.datamodel.pipeline_options import ThreadedPdfPipelineOptions

pipeline_options = ThreadedPdfPipelineOptions(
    accelerator_options=AcceleratorOptions(
        device=AcceleratorDevice.CUDA,
    ),
    ocr_batch_size=4,
    layout_batch_size=64,
    table_batch_size=4,
)

Via environment variable:
Set DOCLING_DEVICE=cuda (or cuda:0, cuda:1 for specific GPUs).

Supported devices:

  • cuda (NVIDIA GPUs)
  • mps (Apple Silicon)
  • xpu (Intel GPUs)
  • rocm (AMD GPUs)
  • auto (automatic detection, default)

Docker usage:

NVIDIA GPUs:
Use CUDA-enabled images such as ghcr.io/docling-project/docling-serve-cu128 (CUDA 12.8) or cu130 (CUDA 13.0). Both images support linux/amd64 and linux/arm64 platforms, enabling GPU acceleration on ARM64 systems with NVIDIA GPUs (such as NVIDIA Jetson devices or ARM-based cloud instances with GPU support). For Docker, ensure you have the NVIDIA Container Toolkit installed.

AMD GPUs:
Use ROCm-enabled images for AMD GPU support. Two ROCm variants are available:

  • ghcr.io/docling-project/docling-serve-rocm (ROCm 6.3)
  • ghcr.io/docling-project/docling-serve-rocm72 (ROCm 7.2)

Example Docker Compose configuration for AMD GPUs:

services:
  docling-serve:
    image: ghcr.io/docling-project/docling-serve-rocm72:main
    devices:
      - /dev/kfd:/dev/kfd
      - /dev/dri:/dev/dri
    environment:
      ## Enable experimental Flash/Mem-Efficient attention kernels on AMD GPU (aotriton)
      TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL: "1"
      ## Optional: for older cards
      # HSA_OVERRIDE_GFX_VERSION: "11.0.0"
      # HSA_ENABLE_SDMA: "0"

See docs/deploy-examples/compose-amd.yaml for a complete working example.

Optional: Flash Attention 2
For Ampere+ NVIDIA GPUs, you can enable Flash Attention 2 for improved speed and memory usage by setting cuda_use_flash_attention2=True in AcceleratorOptions or DOCLING_CUDA_USE_FLASH_ATTENTION2=true as an environment variable.

Requirements:

  • A compatible GPU (NVIDIA, AMD, Intel, or Apple Silicon)
  • Correct drivers installed
  • (For Docker with NVIDIA) NVIDIA Container Toolkit
How can I enable and use GPU acceleration with Docling? | Dosu