How can a student build a local, open-source, privacy-preserving AI system to extract and query technical PDF documents (PMS/IDC) on a PC with limited resources?

Local AI Pipeline for Technical PDF Processing (PMS/IDC)#

A student can build this system using Docling + ChromaDB + Ollama + LangChain + Streamlit, all running locally with no cloud API calls. The full pipeline is:

PDF (PMS/IDC)
    ↓ [Docling + OCR + TableFormer]
Structured Text (JSON/Markdown)
    ↓ [HybridChunker]
Chunks
    ↓ [Sentence-Transformers]
Embeddings (vectors)
    ↓ [ChromaDB]
Vector Database
    ↓ [LangChain + Ollama LLM]
Intelligent Answers / Datasheets
    ↓ [Streamlit]
Web UI

Requirements#

Python 3.10+ (Docling does NOT work on Python 3.7/3.8/3.9 — SyntaxError on walrus operator :=)
Windows users should use PowerShell with a virtual environment

Installation#

cd C:\your\project\folder
py -3.11 -m venv venv
.\venv\Scripts\activate
pip install docling langchain-docling langchain-community langchain-chroma chromadb sentence-transformers streamlit torch

⚠️ Always activate the venv before running scripts: .\venv\Scripts\activate

Step 1 — Extract PDF to JSON/Markdown (Docling)#

Docling runs 100% locally. No data is sent externally by default.

from docling.document_converter import DocumentConverter
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.datamodel.base_models import InputFormat
from docling.document_converter import PdfFormatOption
from docling.backend.pypdfium2_backend import PyPdfiumDocumentBackend
from pathlib import Path

pdf_options = PdfPipelineOptions()
pdf_options.do_ocr = True
pdf_options.do_table_structure = True
pdf_options.generate_page_images = False # saves RAM
pdf_options.generate_picture_images = False

converter = DocumentConverter(
    format_options={
        InputFormat.PDF: PdfFormatOption(
            backend=PyPdfiumDocumentBackend,
            pipeline_options=pdf_options
        )
    }
)

# Process pages 1–10 only (to avoid std::bad_alloc on low-RAM machines)
result = converter.convert("documents/PMS.pdf", page_range=(1, 10))

Path("outputs").mkdir(exist_ok=True)
result.document.save_as_json("outputs/PMS.json")
md = result.document.export_to_markdown()
Path("outputs/PMS.md").write_text(md, encoding="utf-8")

⚠️ On machines with only 8GB RAM, processing all pages at once causes std::bad_alloc errors. Use page_range to process in batches of 10 pages and use PyPdfiumDocumentBackend to reduce memory usage.

Step 2 — Chunking (split document into smart pieces)#

The LLM cannot read 74 pages at once. HybridChunker splits the document into ~512-token pieces while keeping tables intact and preserving section context.

from docling.chunking import HybridChunker

chunker = HybridChunker(max_tokens=512)
chunks = list(chunker.chunk(dl_doc=result.document))
enriched_texts = [chunker.contextualize(chunk=c) for c in chunks]

Step 3–4 — Embeddings + Vector Database (ChromaDB)#

from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_chroma import Chroma
from langchain.schema import Document

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

langchain_docs = [
    Document(page_content=text, metadata={"source": "PMS.pdf"})
    for text in enriched_texts
]

vectorstore = Chroma.from_documents(langchain_docs, embeddings, persist_directory="./chroma_db")

Step 5 — RAG with Local LLM (Ollama)#

RAG (Retrieval-Augmented Generation) works as follows:

You ask a question
The system finds the relevant chunks from your documents
Those chunks + your question are sent to the LLM
The LLM answers based on your actual document content

from langchain_community.llms import Ollama
from langchain.chains import RetrievalQA

# Install Ollama from https://ollama.ai, then: ollama pull phi3:mini
llm = Ollama(model="phi3:mini") # lightweight for 8GB RAM

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vectorstore.as_retriever(search_kwargs={"k": 4}),
    return_source_documents=True
)

result = qa_chain({"query": "What is the material for pipe class A1?"})
print(result['result'])

Step 6 — Streamlit Web UI#

streamlit run app.py
# Opens at http://localhost:8501

Recommended LLM models (local, via Ollama)#

Model	RAM needed	Use case
`mistral`	~6 GB	Best quality, needs more RAM
`phi3:mini`	~3 GB	Good balance for 8GB RAM
`tinyllama`	~2 GB	Very low RAM, lower quality

Privacy#

Docling is fully local by default. All OCR, layout detection, and table extraction run on your machine. Remote services are only activated if you explicitly set enable_remote_services=True, which you should never do for confidential documents.

To run completely offline, pre-download all models:

pip install docling-tools
docling-tools models download --all

Project Folder Structure#

project/
├── venv/
├── documents/ # Your PMS.pdf, IDC.pdf files
├── outputs/ # Extracted JSON and Markdown
├── chroma_db/ # Vector database
├── test_extraction.py
├── rag_pipeline.py
└── app.py