Documents
How can I set up and run docling-serve on a MacBook Pro using Docker, and what performance and stability considerations should I be aware of?
How can I set up and run docling-serve on a MacBook Pro using Docker, and what performance and stability considerations should I be aware of?
Type
Answer
Status
Published
Created
Feb 2, 2026
Updated
Feb 24, 2026
Created by
Dosu Bot
Updated by
Dosu Bot

To set up and run docling-serve on a MacBook Pro using Docker, follow these steps:

  1. Install Docker: Use the Apple Silicon (ARM64) version if you have an M1/M2 MacBook Pro.

  2. Download Required Models Locally:

    pip install docling-tools
    docling-tools models download --all -o models
    

    This creates a models directory with all necessary models.

  3. Increase Docker Memory Allocation: In Docker Desktop, go to Preferences > Resources > Memory and allocate at least 8GB for stability.

  4. Run the Docker Container:

    docker run -p 5001:5001 \
      -v $(pwd)/models:/opt/app-root/src/models \
      -e DOCLING_SERVE_ARTIFACTS_PATH="/opt/app-root/src/models" \
      -e DOCLING_SERVE_ENABLE_UI=true \
      -e UVICORN_WORKERS=1 \
      quay.io/docling-project/docling-serve
    
    • UVICORN_WORKERS=1 is important for stability on macOS.
    • To use more CPU threads for a single document, add -e DOCLING_NUM_THREADS=4 (or up to your CPU core count).
  5. Access the UI: Open http://localhost:5001 in your browser.

Performance:

  • On a modern MacBook Pro (M1/M2, 12 cores, 32GB RAM), expect to process about 5 PDFs (127 pages, 6.2MB total) in roughly 10 minutes (CPU-only). Increasing CPU cores, RAM, or worker count does not significantly improve throughput due to Python's concurrency limitations.

Stability and Troubleshooting:

  • Use ARM64 images and ensure all models are present to avoid runtime errors.

  • Memory management has been improved through the integration of mimalloc (a high-performance memory allocator), which significantly reduces memory growth during document processing. While earlier versions experienced memory leaks, this optimization addresses those concerns. Still, monitor memory usage for long-running sessions and restart the container if you observe unusual resource consumption.

    Memory Debugging Endpoints:

    Docling Serve provides built-in memory debugging endpoints to help diagnose memory issues in long-running sessions or containerized environments:

    • /v1/memory/stats - Returns memory statistics including RSS, anonymous memory, file-backed memory, slab memory, and cgroup total memory (in MB)
    • /v1/memory/counts - Returns garbage collection statistics, object counts, asyncio task counts, and the top 20 most common object types

    These endpoints are controlled by the DOCLING_SERVE_ENABLE_MANAGEMENT_ENDPOINTS environment variable (defaults to false). When disabled, these endpoints return a 403 Forbidden error.

    To enable the memory debugging endpoints, add the environment variable when running Docker:

    docker run -p 5001:5001 \
      -v $(pwd)/models:/opt/app-root/src/models \
      -e DOCLING_SERVE_ARTIFACTS_PATH="/opt/app-root/src/models" \
      -e DOCLING_SERVE_ENABLE_UI=true \
      -e DOCLING_SERVE_ENABLE_MANAGEMENT_ENDPOINTS=true \
      -e UVICORN_WORKERS=1 \
      quay.io/docling-project/docling-serve
    

    Access the endpoints using curl:

    curl http://localhost:5001/v1/memory/stats
    curl http://localhost:5001/v1/memory/counts
    
  • For file download issues in the UI, set DOCLING_SERVE_SCRATCH_PATH and optionally GRADIO_TEMP_DIR to a writable volume. Adjust Gradio cache settings if you want downloads to remain available longer.

References:

For large workloads or high throughput, docling-serve is designed for distributed/cloud setups rather than a single MacBook Pro.