Handling Models in Docling Serve#

When enabling steps in Docling Serve that require extra models (such as picture classification, picture description, table detection, code recognition, formula extraction, or vision-language modules), you must ensure those models are available in the runtime environment. The standard container image includes only the default models. Any additional models must be downloaded and made available before use. If required models are missing, Docling Serve will raise runtime errors rather than downloading them automatically.

With Docling v2.56.x and later, EasyOCR models are now included in the default set of models downloaded by the auto-ocr feature. You no longer need to explicitly request EasyOCR models unless you are customizing the download set.

Model Storage Location#

Docling Serve loads models from the directory specified by the DOCLING_SERVE_ARTIFACTS_PATH environment variable. This path must be consistent across model download and runtime. When running with multiple workers or reload enabled, you must use the environment variable rather than the CLI argument for configuration [source].

Approaches for Making Extra Models Available#

There are several ways to ensure required models are present:

1. Disable Local Models (Trigger Auto-Download)#

You can configure the container to download all models at startup by clearing the artifacts path:

docker run -d --gpus all -p 5001:5001 --name docling-serve \
  -e DOCLING_SERVE_ARTIFACTS_PATH="" \
  -e DOCLING_SERVE_ENABLE_UI=true \
  quay.io/docling-project/docling-serve

This approach is simple for local development but not recommended for production, as it increases startup time and depends on network availability.

With auto-ocr enabled, EasyOCR models are now included in the default auto-download set.

2. Build a Custom Image with Pre-Downloaded Models#

You can create a new image that includes the required models:

FROM quay.io/docling-project/docling-serve
RUN docling-tools models download smolvlm easyocr granite_chart_extraction granite_chart_extraction_v4 tableformerv2

This method is suitable for production, as it ensures all models are present in the image and avoids runtime downloads.

3. Update the Entrypoint to Download Models Before Startup#

You can override the entrypoint to download models before starting the service:

podman run -p 5001:5001 -e DOCLING_SERVE_ENABLE_UI=true \
  quay.io/docling-project/docling-serve \
  -- sh -c 'exec docling-tools models download smolvlm easyocr granite_chart_extraction granite_chart_extraction_v4 tableformerv2 && exec docling-serve run'

This is useful for environments where you want to keep the base image unchanged but still automate model preparation.

4. Mount a Volume with Pre-Downloaded Models#

Download models locally and mount them into the container:

# Download the models locally
docling-tools models download --all -o models
# Or specify models explicitly, e.g.:
docling-tools models download smolvlm easyocr granite_chart_extraction granite_chart_extraction_v4 tableformerv2 -o models

# Start the container with the local models folder
podman run -p 5001:5001 \
  -v ./models:/opt/app-root/src/models \
  -e DOCLING_SERVE_ARTIFACTS_PATH="/opt/app-root/src/models" \
  -e DOCLING_SERVE_ENABLE_UI=true \
  quay.io/docling-project/docling-serve

This approach is robust for both local and production deployments, especially when using persistent storage.

Kubernetes/Cluster Deployments#

For Kubernetes or OpenShift clusters, the recommended approach is to use a PersistentVolumeClaim (PVC) for model storage, a Kubernetes Job to download models, and mount the volume into the deployment. This ensures models persist across pod restarts and scale-out scenarios [source].

Example: PersistentVolumeClaim

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: docling-model-cache-pvc
spec:
  accessModes:
    - ReadWriteOnce
  volumeMode: Filesystem
  resources:
    requests:
      storage: 10Gi

Example: Model Download Job

apiVersion: batch/v1
kind: Job
metadata:
  name: docling-model-cache-load
spec:
  template:
    spec:
      containers:
        - name: loader
          image: ghcr.io/docling-project/docling-serve-cpu:main
          command:
            - docling-tools
            - models
            - download
            - '--output-dir=/modelcache'
            - 'layout'
            - 'tableformer'
            - 'tableformerv2'
            - 'code_formula'
            - 'picture_classifier'
            - 'smolvlm'
            - 'granite_vision'
            - 'granite_chart_extraction'
            - 'granite_chart_extraction_v4'
            - 'easyocr' # EasyOCR is included by default in auto-ocr, but can be specified explicitly if needed
          volumeMounts:
            - name: docling-model-cache
              mountPath: /modelcache
      volumes:
        - name: docling-model-cache
          persistentVolumeClaim:
            claimName: docling-model-cache-pvc
      restartPolicy: Never

Example: Deployment with Mounted Volume

spec:
  template:
    spec:
      containers:
        - name: api
          env:
            - name: DOCLING_SERVE_ARTIFACTS_PATH
              value: '/modelcache'
          volumeMounts:
            - name: docling-model-cache
              mountPath: /modelcache
      volumes:
        - name: docling-model-cache
          persistentVolumeClaim:
            claimName: docling-model-cache-pvc

The value of DOCLING_SERVE_ARTIFACTS_PATH must match the mount path where models are stored.

Health and Readiness Probes#

Docling Serve provides endpoints for health and readiness checks, which are particularly important when model loading is enabled:

GET /ready (and alias /readyz): Returns 200 when the pod is ready to serve traffic, or 503 otherwise
- For LocalOrchestrator: gates on warm_up_caches() completion (ML model loading)
- For RQOrchestrator: checks Redis connectivity in addition to instant model readiness (models are loaded in worker pods, not the API pod)
GET /health (and alias /livez): Lightweight liveness check that returns 200 when the application is running

When deploying to Kubernetes or OpenShift, configure probes to prevent traffic before models are loaded:

startupProbe: Use /ready to allow slow model loading without killing the pod
readinessProbe: Use /ready to gate traffic on actual readiness
livenessProbe: Use /health or /livez for lightweight checks without dependency verification

This is especially important when load_models_at_boot=true.

Example: Probe Configuration in Deployment

spec:
  template:
    spec:
      containers:
        - name: api
          startupProbe:
            httpGet:
              path: /ready
              port: http
            initialDelaySeconds: 5
            periodSeconds: 10
            failureThreshold: 30
            timeoutSeconds: 2
          readinessProbe:
            httpGet:
              path: /ready
              port: http
            periodSeconds: 5
            timeoutSeconds: 2
            successThreshold: 1
            failureThreshold: 3
          livenessProbe:
            httpGet:
              path: /health
              port: http
            periodSeconds: 10
            timeoutSeconds: 2
            successThreshold: 1
            failureThreshold: 3

Local Docker Execution#

For local Docker or Podman execution, you can use any of the approaches above. Mounting a local directory with pre-downloaded models is the most reliable for repeated runs and avoids network dependencies. EasyOCR models are included by default in auto-ocr workflows.

Troubleshooting and Best Practices#

If a required model is missing from the artifacts path, Docling Serve will raise a runtime error.
Always ensure the value of DOCLING_SERVE_ARTIFACTS_PATH matches the directory where models are stored and mounted.
For multi-worker or reload scenarios, use the environment variable, not the CLI argument, to set the artifacts path.
For production and cluster environments, prefer persistent storage and pre-loading models via a dedicated job.
EasyOCR models are now included by default in auto-ocr; explicit inclusion is only needed for custom workflows.
Use the /ready endpoint for startupProbe and readinessProbe to prevent traffic before models are loaded and dependencies are available.
- For chart extraction, users can choose between 'granite_chart_extraction' (GraniteVision 3.3: ibm-granite/granite-vision-3.3-2b-chart2csv-preview) and 'granite_chart_extraction_v4' (GraniteVision 4.0: ibm-granite/granite-4.0-3b-vision). The V4 model is the default when do_chart_extraction=True.
Users can choose between 'tableformer' (TableFormer V1), 'tableformerv2' (TableFormer V2), and 'granite_vision_table' (Granite Vision VLM) for table structure extraction. The Granite Vision model uses VLM technology for OTSL-based table structure recognition.

For more details and YAML manifest examples, see the pre-loading models documentation and deployment documentation.