What is the recommended architecture and implementation for processing a mixed PDF bundle (digital + scanned pages) for an enterprise financial payment verification system (like MRT Jakarta's AI-Augmented Payment Flow), with constraints of no GPU and limited cost?
What is the recommended architecture and implementation for processing a mixed PDF bundle (digital + scanned pages) for an enterprise financial payment verification system (like MRT Jakarta's AI-Augmented Payment Flow), with constraints of no GPU and limited cost?
Type
Answer
Status
Published
Created
May 23, 2026
Updated
May 23, 2026
Created by
Dosu Bot
Updated by
Dosu Bot
Recommended Architecture for Mixed PDF Processing (No GPU, Enterprise Financial)#
Requester upload PDF bundle
↓
PyMuPDF split by keyword (< 1 second)
↓
Docling extract per logical document (CPU, ~6s/scanned page)
↓
Structured JSON/Markdown per document type
↓
AI Agent (OpenAI) for business logic verification
↓
Recommendation → Human validation
Docling automatically handles mixed PDFs: It detects digital vs. scanned pages per page — digital pages use native text extraction, scanned pages trigger OCR automatically. No manual per-page configuration needed.
Split PDF bundle first using simple rule-based keyword detection (not AI classifiers). Financial/government documents have consistent formats.
No need to combine PyMuPDF + EasyOCR + Tesseract + VLM. Docling handles OCR + native text internally. VLM Pipeline is not needed and too costly without GPU.