Using Docling in Python on WSL2#

What is Docling?#

Docling is a free, open-source document conversion library (MIT License, copyright IBM) for GenAI applications. It supports PDF, DOCX, PPTX, XLSX, HTML, Markdown, LaTeX, images, and audio/video, with capabilities including:

Advanced layout and table extraction via deep learning models
OCR for scanned documents
Chunking optimized for RAG (Retrieval-Augmented Generation)
Export to Markdown, JSON, HTML, and plain text
Official integrations with LlamaIndex and LangChain

1. Prerequisites & WSL2 Setup#

Enable WSL2 (PowerShell as Administrator)#

wsl --install
wsl --set-default-version 2

Restart and install Ubuntu from the Microsoft Store.

Install dependencies (Ubuntu)#

sudo apt update && sudo apt upgrade -y
sudo apt install -y python3 python3-pip python3-venv \
    tesseract-ocr libtesseract-dev leptonica-dev \
    pkg-config ffmpeg build-essential

Requires Python 3.9+

Create a virtual environment#

python3 -m venv ~/docling-env
source ~/docling-env/bin/activate

2. Installation#

# Basic
pip install docling

# With optional extras
pip install "docling[tesserocr]" # High-quality OCR
pip install "docling[vlm]" # Vision-Language Models
pip install "docling[htmlrender]" # Headless HTML rendering
pip install "docling[asr]" # Audio/video transcription
pip install "docling[tesserocr,vlm,htmlrender,asr]" # Full install

3. Basic Usage#

Simple PDF conversion#

from docling.document_converter import DocumentConverter

converter = DocumentConverter()
result = converter.convert("documento.pdf")

# Export to Markdown
print(result.document.export_to_markdown())

Export to multiple formats#

md = result.document.export_to_markdown()
text = result.document.export_to_text()
html = result.document.export_to_html()

result.document.save_as_markdown("output.md")
result.document.save_as_json("output.json")
result.document.save_as_html("output.html")

Convert multiple documents#

sources = ["doc1.pdf", "doc2.docx", "presentation.pptx"]
results = converter.convert_all(sources)

for res in results:
    print(res.document.export_to_markdown()[:500])

4. Advanced Usage#

OCR Configuration#

from docling.datamodel.pipeline_options import PdfPipelineOptions, EasyOcrOptions
from docling.datamodel.base_models import InputFormat
from docling.document_converter import DocumentConverter, PdfFormatOption

pipeline_options = PdfPipelineOptions()
pipeline_options.do_ocr = True
pipeline_options.ocr_options = EasyOcrOptions(
    lang=["en", "pt"],
    use_gpu=True,
    confidence_threshold=0.6,
)

converter = DocumentConverter(
    format_options={InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)}
)

Chunking for RAG#

from docling.chunking import HybridChunker

chunker = HybridChunker(max_tokens=512)
chunks = list(chunker.chunk(dl_doc=result.document))

for chunk in chunks:
    enriched_text = chunker.contextualize(chunk=chunk)
    print(enriched_text[:200])

LangChain Integration#

pip install langchain-docling

from langchain_docling import DoclingLoader
loader = DoclingLoader("documentos/")
docs = loader.load()

LlamaIndex Integration#

pip install llama-index-readers-docling llama-index-node-parser-docling

from llama_index.readers.docling import DoclingReader
from llama_index.node_parser.docling import DoclingNodeParser

reader = DoclingReader()
docs = reader.load_data(file_path="documento.pdf")

5. Integration with an AI Agent (e.g., Hermes Agent)#

Option A: Python script as a skill#

#!/usr/bin/env python3
import sys, json
from docling.document_converter import DocumentConverter

def convert_document(file_path: str, output_format: str = "markdown") -> dict:
    converter = DocumentConverter()
    result = converter.convert(file_path)
    exporters = {
        "markdown": result.document.export_to_markdown,
        "text": result.document.export_to_text,
        "html": result.document.export_to_html,
    }
    content = exporters.get(output_format, exporters["markdown"])()
    return {"status": "success", "format": output_format, "content": content}

if __name__ == "__main__":
    file_path = sys.argv[1]
    fmt = sys.argv[2] if len(sys.argv) > 2 else "markdown"
    print(json.dumps(convert_document(file_path, fmt), ensure_ascii=False, indent=2))

Usage: python docling_convert.py documento.pdf markdown

Option B: docling-serve as an HTTP API#

pip install "docling-serve[ui]"
docling-serve run --host 0.0.0.0 --port 5001

import requests

# Convert via file upload
with open("documento.pdf", "rb") as f:
    response = requests.post(
        "http://localhost:5001/v1/convert/file",
        files={"files": ("documento.pdf", f, "application/pdf")},
        data={"to_formats": "md"},
    )
print(response.json())

# Convert via URL
response = requests.post(
    "http://localhost:5001/v1/convert/source",
    json={
        "options": {"to_formats": ["md"]},
        "sources": [{"kind": "http", "url": "https://arxiv.org/pdf/2408.09869"}],
    },
)

6. Troubleshooting on WSL2#

Issue	Solution
Docling freezes on WSL2	Use `PyPdfiumDocumentBackend`
PyTorch/CUDA errors	Pin `pip install torch==2.5.1`
Memory leak on successive conversions	Set `pipeline_options.generate_parsed_pages = False`; call `gc.collect()` after each conversion
Slow filesystem performance	Always work in native Linux paths (`~/docs/`), not `/mnt/c/`
WSL2 memory limits	Configure `%USERPROFILE%\.wslconfig` with `memory=8GB`

Fix for Docling freezing:#

from docling.backend.pypdfium2_backend import PyPdfiumDocumentBackend
converter = DocumentConverter(
    format_options={InputFormat.PDF: PdfFormatOption(backend=PyPdfiumDocumentBackend)}
)

Note: Official Windows/WSL2 support is limited. The Docling team recommends native Linux for production use.