To set up and run docling-serve on a MacBook Pro using Docker, follow these steps:
-
Install Docker: Use the Apple Silicon (ARM64) version if you have an M1/M2 MacBook Pro.
-
Download Required Models Locally:
pip install docling-tools docling-tools models download --all -o modelsThis creates a
modelsdirectory with all necessary models. -
Increase Docker Memory Allocation: In Docker Desktop, go to Preferences > Resources > Memory and allocate at least 8GB for stability.
-
Run the Docker Container:
docker run -p 5001:5001 \ -v $(pwd)/models:/opt/app-root/src/models \ -e DOCLING_SERVE_ARTIFACTS_PATH="/opt/app-root/src/models" \ -e DOCLING_SERVE_ENABLE_UI=true \ -e UVICORN_WORKERS=1 \ quay.io/docling-project/docling-serveUVICORN_WORKERS=1is important for stability on macOS.- To use more CPU threads for a single document, add
-e DOCLING_NUM_THREADS=4(or up to your CPU core count).
-
Access the UI: Open http://localhost:5001 in your browser.
Performance:
- On a modern MacBook Pro (M1/M2, 12 cores, 32GB RAM), expect to process about 5 PDFs (127 pages, 6.2MB total) in roughly 10 minutes (CPU-only). Increasing CPU cores, RAM, or worker count does not significantly improve throughput due to Python's concurrency limitations.
Stability and Troubleshooting:
-
Use ARM64 images and ensure all models are present to avoid runtime errors.
-
Memory management has been improved through the integration of mimalloc (a high-performance memory allocator), which significantly reduces memory growth during document processing. While earlier versions experienced memory leaks, this optimization addresses those concerns. Still, monitor memory usage for long-running sessions and restart the container if you observe unusual resource consumption.
Memory Debugging Endpoints:
Docling Serve provides built-in memory debugging endpoints to help diagnose memory issues in long-running sessions or containerized environments:
/v1/memory/stats- Returns memory statistics including RSS, anonymous memory, file-backed memory, slab memory, and cgroup total memory (in MB)/v1/memory/counts- Returns garbage collection statistics, object counts, asyncio task counts, and the top 20 most common object types
These endpoints are controlled by the
DOCLING_SERVE_ENABLE_MANAGEMENT_ENDPOINTSenvironment variable (defaults tofalse). When disabled, these endpoints return a 403 Forbidden error.To enable the memory debugging endpoints, add the environment variable when running Docker:
docker run -p 5001:5001 \ -v $(pwd)/models:/opt/app-root/src/models \ -e DOCLING_SERVE_ARTIFACTS_PATH="/opt/app-root/src/models" \ -e DOCLING_SERVE_ENABLE_UI=true \ -e DOCLING_SERVE_ENABLE_MANAGEMENT_ENDPOINTS=true \ -e UVICORN_WORKERS=1 \ quay.io/docling-project/docling-serveAccess the endpoints using curl:
curl http://localhost:5001/v1/memory/stats curl http://localhost:5001/v1/memory/counts -
For file download issues in the UI, set
DOCLING_SERVE_SCRATCH_PATHand optionallyGRADIO_TEMP_DIRto a writable volume. Adjust Gradio cache settings if you want downloads to remain available longer.
References:
- docling-serve issue #257
- docling-serve issue #369
- docling-serve issue #474
- docling-serve issue #251
- docling-serve pull #470
For large workloads or high throughput, docling-serve is designed for distributed/cloud setups rather than a single MacBook Pro.