Docling has built-in parallel processing support via configurable settings. Here's how to use it and what each parameter means:
Enabling Parallel Processing#
For processing multiple documents in parallel:
from docling.datamodel.settings import settings
settings.perf.doc_batch_concurrency = 10 # Process 10 docs in parallel
settings.perf.doc_batch_size = 10
# Use convert_all() for parallel processing
for result in converter.convert_all(list_of_documents):
# process results
For large PDFs (splitting pages across workers):
from docling.datamodel.pipeline_options import PdfPipelineOptions
pipeline_options = PdfPipelineOptions()
pipeline_options.page_chunk_size = 50 # Split into 50-page chunks
settings.perf.doc_batch_concurrency = 10 # Process chunks in parallel
Parameter Breakdown#
| Parameter | Description |
|---|---|
page_chunk_size | Splits a large PDF into smaller chunks of N pages each. A 500-page PDF with page_chunk_size=50 becomes 10 separate 50-page "mini-documents." Primarily helps with memory management by avoiding loading the entire document at once. |
doc_batch_concurrency | How many documents (or chunks) are processed in parallel at the same time. Controls the degree of parallelism. |
doc_batch_size | How many documents are grouped together in each batch submitted to the pipeline. Controls batching granularity. |
How They Work Together#
page_chunk_sizesplits a large PDF into N chunks.doc_batch_sizegroups those chunks into batches.doc_batch_concurrencyruns multiple batches/chunks simultaneously.
Example: A 500-page PDF with page_chunk_size=50 and doc_batch_concurrency=10 splits into 10 chunks that all process in parallel, with results streaming back as each chunk completes.
Important: When using
page_chunk_size, you must useconvert_all()instead ofconvert()— the latter only returns the first chunk.
For docling-serve (REST API)#
UVICORN_WORKERS— configures worker processesDOCLING_NUM_THREADS— configures CPU threads per document