How does Docling support parallel/multiprocessing for document conversion, and what do the key performance parameters (`page_chunk_size`, `doc_batch_concurrency`, `doc_batch_size`) do?

Docling has built-in parallel processing support via configurable settings. Here's how to use it and what each parameter means:

Enabling Parallel Processing#

For processing multiple documents in parallel:

from docling.datamodel.settings import settings

settings.perf.doc_batch_concurrency = 10 # Process 10 docs in parallel
settings.perf.doc_batch_size = 10

# Use convert_all() for parallel processing
for result in converter.convert_all(list_of_documents):
    # process results

For large PDFs (splitting pages across workers):

from docling.datamodel.pipeline_options import PdfPipelineOptions

pipeline_options = PdfPipelineOptions()
pipeline_options.page_chunk_size = 50 # Split into 50-page chunks

settings.perf.doc_batch_concurrency = 10 # Process chunks in parallel

Parameter Breakdown#

Parameter	Description
`page_chunk_size`	Splits a large PDF into smaller chunks of N pages each. A 500-page PDF with `page_chunk_size=50` becomes 10 separate 50-page "mini-documents." Primarily helps with memory management by avoiding loading the entire document at once.
`doc_batch_concurrency`	How many documents (or chunks) are processed in parallel at the same time. Controls the degree of parallelism.
`doc_batch_size`	How many documents are grouped together in each batch submitted to the pipeline. Controls batching granularity.

How They Work Together#

page_chunk_size splits a large PDF into N chunks.
doc_batch_size groups those chunks into batches.
doc_batch_concurrency runs multiple batches/chunks simultaneously.

Example: A 500-page PDF with page_chunk_size=50 and doc_batch_concurrency=10 splits into 10 chunks that all process in parallel, with results streaming back as each chunk completes.

Important: When using page_chunk_size, you must use convert_all() instead of convert() — the latter only returns the first chunk.

For docling-serve (REST API)#

UVICORN_WORKERS — configures worker processes
DOCLING_NUM_THREADS — configures CPU threads per document