To mount a PVC in docling-serve on OpenShift for EasyOCR models, you need to edit the Deployment manifest directly (there is no Operator-specific CRD for this). Add the PVC as a volume and mount it in the container, then set the DOCLING_SERVE_ARTIFACTS_PATH environment variable to the mount path. Here is an example snippet:
spec:
template:
spec:
containers:
- name: api
env:
- name: DOCLING_SERVE_ARTIFACTS_PATH
value: '/modelcache'
volumeMounts:
- name: docling-model-cache
mountPath: /modelcache
startupProbe:
httpGet:
path: /ready
port: http
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 30
timeoutSeconds: 2
readinessProbe:
httpGet:
path: /ready
port: http
periodSeconds: 5
timeoutSeconds: 2
successThreshold: 1
failureThreshold: 3
livenessProbe:
httpGet:
path: /health
port: http
periodSeconds: 30
timeoutSeconds: 2
failureThreshold: 3
volumes:
- name: docling-model-cache
persistentVolumeClaim:
claimName: docling-model-cache-pvc
Your PVC should be defined like this:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: docling-model-cache-pvc
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 10Gi
Ensure your EasyOCR models are present in /modelcache (or your chosen path) and the directory structure matches what docling expects. If a required model is missing, docling-serve will raise a runtime error. It's recommended to preload models into the PVC using a Kubernetes Job before starting docling-serve. For more details, see the official docling-serve documentation.
Health probe configuration#
The deployment manifest includes health probes to ensure docling-serve is fully ready before accepting traffic:
-
startupProbe (
/ready): Allows sufficient time for model loading during pod startup. WithfailureThreshold: 30andperiodSeconds: 10, the pod has up to 5 minutes to complete initialization, which is critical when preloading large EasyOCR models from the PVC. This prevents Kubernetes from killing the pod during the extended startup time required for model loading. -
readinessProbe (
/ready): Gates traffic on actual readiness. The/readyendpoint returns 200 only after model loading completes (whenload_models_at_bootis enabled). This eliminates timeout errors during rollouts by ensuring the pod doesn't receive requests until models are fully loaded. -
livenessProbe (
/health): Lightweight liveness check that verifies the API is responsive without checking model or dependency status. Used to detect and restart crashed pods.
Metrics endpoint configuration#
docling-serve supports serving Prometheus metrics on a separate port from the main API via the DOCLING_SERVE_METRICS_PORT environment variable. When set, this starts a dedicated HTTP server on the specified port that serves the /metrics endpoint. This is useful for production deployments where you want to expose metrics on a different port with separate network policies.
Example deployment configuration:
spec:
template:
spec:
containers:
- name: api
env:
- name: DOCLING_SERVE_METRICS_PORT
value: "9090"
ports:
- name: http
containerPort: 5000
protocol: TCP
- name: metrics
containerPort: 9090
protocol: TCP
With this configuration, the main API remains accessible on port 5000 while Prometheus can scrape metrics from port 9090. This allows you to apply different network policies or service configurations for API traffic versus monitoring traffic.
Troubleshooting tips:
- Ensure your PVC is
Boundand healthy (oc get pvc). - Inspect pod events and logs for mount errors (
oc describe pod ...). - Confirm the mount path in
volumeMountsmatchesDOCLING_SERVE_ARTIFACTS_PATH. - Check permissions on the PVC (use
fsGroupor an initContainer to set permissions if needed). - Make sure the PVC and pod are in the same namespace.