Yes, you can use a custom OCR model in Docling, but you must configure it according to the OCR engine you choose (EasyOCR, Tesseract, RapidOCR, or KServe v2). Each engine has its own options class where you can specify custom model or binary paths:
- EasyOCR: Set
model_storage_directoryinEasyOcrOptionsto your custom model directory.from docling.datamodel.pipeline_options import PdfPipelineOptions, EasyOcrOptions pipeline_options = PdfPipelineOptions( do_ocr=True, ocr_options=EasyOcrOptions( lang=["en", "ru"], model_storage_directory="/path/to/your/easyocr/models" ) ) - Tesseract: Set
pathinTesseractCliOcrOptionsto your custom Tesseract binary or data path.from docling.datamodel.pipeline_options import PdfPipelineOptions, TesseractCliOcrOptions pipeline_options = PdfPipelineOptions( do_ocr=True, ocr_options=TesseractCliOcrOptions( lang=["eng", "rus"], path="/custom/path/to/tesseract" ) ) - RapidOCR: Set
det_model_path,cls_model_path,rec_model_path, etc., inRapidOcrOptionsto your custom model files. - KServe v2 (
KserveV2OcrOptions, kind:"kserve_v2_ocr"): Connects to a remote KServe v2-compatible inference server (such as NVIDIA Triton) via gRPC or HTTP to perform OCR, without requiring local model downloads.Key configuration fields:from docling.datamodel.pipeline_options import PdfPipelineOptions, KserveV2OcrOptions pipeline_options = PdfPipelineOptions() pipeline_options.do_ocr = True pipeline_options.ocr_options = KserveV2OcrOptions( url="localhost:8001", # gRPC endpoint (host:port) model_name="ocr", # model name registered on the server transport="grpc", # "grpc" (default) or "http" lang=["english"], scale=2.0, )url(required, the endpoint),transport("grpc"or"http"),model_name,lang,scale, and optionally TLS/auth settings (grpc_use_tls,grpc_metadata,headers,timeout).
You then pass pipeline_options to your DocumentConverter as usual.
If you want to use a completely custom OCR engine beyond the built-in options above, you can implement a plugin following Docling's plugin system. See this example and discussion for details.
For all engines, if your custom model files are not in the default locations, ensure your directory structure matches what the engine expects, and set the DOCLING_SERVE_ARTIFACTS_PATH environment variable to the parent directory containing all model subfolders if you want Docling to use local models for offline use. See this guide for details.
You cannot set a generic "custom OCR model path" directly in PdfPipelineOptions; you must use the engine-specific options as shown above.