Azure Container Apps is the better option for running Docling.
docling-serveis designed as a containerized service and runs natively in Azure Container Apps.- Container Apps allow for easy model management via persistent volumes or custom images with pre-downloaded models. Environment variable configuration (like
DOCLING_SERVE_ARTIFACTS_PATH) works seamlessly. - GPU acceleration is possible using CUDA-enabled images.
- In contrast, Azure Function App (Windows Premium) is not designed for containerized workloads, has limited local storage (making model file management difficult), and does not support GPU acceleration. Function Apps also have timeout and memory constraints, which are problematic for resource-intensive workloads like Docling.
Deployment tip: Build a custom image with pre-downloaded models or mount them via persistent storage to avoid startup delays. Set UVICORN_WORKERS=1 for stability and monitor memory usage for long-running sessions.
Health probe configuration: Configure health probes to handle model loading and ensure traffic is only routed when the service is ready:
- startupProbe and readinessProbe: Use the
/readyendpoint, which returns 200 only when model loading completes (LocalOrchestrator) or Redis is reachable (RQOrchestrator). This prevents traffic from being routed before the service can handle requests. The/readyzalias is also available for Kubernetes convention compatibility. - livenessProbe: Use the
/healthor/livezendpoint for lightweight checks that don't gate on dependencies. - Using
/readyfor readiness checks specifically addresses startup delay issues by ensuring pods aren't marked ready until models are loaded and dependencies are available.
For more details, see the referenced model management guide and GPU support documentation.