Documents
Project Structure and Development Setup
Project Structure and Development Setup
Type
Document
Status
Published
Created
Jan 23, 2026
Updated
Mar 30, 2026
Updated by
Dosu Bot

Project Directory Layout#

The insideLLMs project is organized to support modular development, clear separation of concerns, and ease of extension. The key directories are:

insideLLMs/
├── insideLLMs/ # Main Python package and CLI entrypoint
│ ├── models/ # Model implementations (OpenAI, Anthropic, Gemini, Cohere, HuggingFace, local, etc.)
│ ├── probes/ # Probe implementations (logic, bias, attack, code, etc.)
│ ├── nlp/ # NLP utilities (dependencies.py, feature_extraction.py, similarity.py, etc.)
│ ├── cli/ # CLI package (parser + command modules)
│ │ ├── __init__.py # Main entrypoint
│ │ └── commands/ # Individual CLI commands (harness, run, report, attest, etc.)
│ ├── runtime/ # Execution/runtime package
│ │ └── runner.py # Probe execution engine (canonical path)
│ ├── registry.py # Plugin registry system for models, probes, datasets
│ ├── caching.py # Unified caching infrastructure
│ ├── types.py # Type definitions
│ ├── exceptions.py # Exception hierarchy
│ └── ... # Additional core modules (infra, templates, etc.)
├── tests/ # Pytest suite and shared fixtures (test_*.py, conftest.py)
├── examples/ # Runnable scripts and example configs (example_quickstart.py, harness.yaml)
├── data/ # Datasets and assets for examples and experiments
├── ci/ # Deterministic harness inputs for CI diff-gating (harness.yaml, harness_dataset.jsonl)
├── docs/, wiki/ # Documentation and planning notes
├── benchmarks/ # Benchmark assets and run artifacts
├── compliance_intelligence/ # Multi-agent AML/KYC demo (LangGraph, separate scope)

See: AGENTS.md, CONTRIBUTING.md, README.md

Directory Purposes#

  • insideLLMs/: Core library code, CLI, and extension points.
  • insideLLMs/cli/: CLI package with main entrypoint and command modules (harness, run, report, attest, etc.).
  • insideLLMs/runtime/: Execution/runtime package with probe runner and workflow orchestration.
  • insideLLMs/nlp/: NLP-specific utilities (resource management, feature extraction, similarity metrics).
  • tests/: All test code, organized by module, with fixtures in conftest.py.
  • examples/: Quickstart scripts and configuration files to demonstrate usage.
  • data/: Example datasets for running probes and experiments.
  • ci/: Minimal configs and datasets for CI-based behavioral diff-gating.
  • benchmarks/: Benchmark definitions and run outputs.
  • compliance_intelligence/: Multi-agent AML/KYC demo (LangGraph, separate scope).
  • docs/, wiki/: User and developer documentation.

Rationale Behind Refactoring#

Recent refactoring focused on improving modularity, maintainability, and extensibility. Key changes include:

  • Lazy loading for optional and heavy dependencies (e.g., HuggingFace, local model backends) to reduce startup overhead and make optional features truly optional.
  • Expanded registries for models and probes, supporting new providers (Gemini, Cohere, LlamaCpp, Ollama, VLLM, OpenRouter) and new probe types (code, instruction following, jailbreak, judge-based evaluation, etc.).
  • Clarified CLI and reporting workflows to make behavioral diff-gating and artifact management more robust and deterministic.
  • Improved test coverage and artifact isolation, with coverage enforced at 95% and new tests for all major subsystems.
  • Pipeline/middleware architecture (ongoing) to standardize execution, enable composable middleware (retry, rate limiting, caching, cost tracking), and make batch/async execution explicit.
  • Artifact management improvements, including new .gitignore entries for local and benchmark run outputs.
  • Module reorganization as part of branch synthesis work to improve code organization: CLI moved to insideLLMs/cli/, runner consolidated into insideLLMs/runtime/, and caching unified into insideLLMs/caching.py.

These changes make it easier to add new models, probes, and datasets, and to maintain and extend the system as new LLM providers and evaluation techniques emerge.
See: ARCHITECTURE.md, PR #8, PR #9, PR #12, PR #26

  • Core logic is in insideLLMs/, with models/ and probes/ as the main extension points.
  • CLI commands are in insideLLMs/cli/commands/ (harness, run, report, attest, sign, verify, trend, etc.).
  • Runner and orchestration logic is in insideLLMs/runtime/runner.py (canonical path; old insideLLMs/runner.py has been removed).
  • Caching is unified in insideLLMs/caching.py (old insideLLMs/caching_unified.py was renamed).
  • NLP utilities are in insideLLMs/nlp/ (e.g., dependencies.py for resource management, feature_extraction.py for vectorization, similarity.py for text similarity).
  • Registries for models, probes, and datasets are managed in insideLLMs/registry.py.
  • Type definitions are in insideLLMs/types.py.
  • Examples and quickstart configs are in examples/.
  • Tests are in tests/, organized by module.
  • Datasets for experiments are in data/.
  • Documentation is in docs/ and the GitHub Wiki.

To add a new model or probe, create a new file in the appropriate subdirectory, export it in the module’s __init__.py, register it in registry.py, and add tests in tests/ (see CONTRIBUTING.md).

Import paths: Always use canonical paths for new code. Import from insideLLMs.runtime.runner (not insideLLMs.runner), insideLLMs.caching (not insideLLMs.caching_unified), etc. See docs/IMPORT_PATHS.md for the full migration matrix.

Development Environment Setup#

The project requires Python 3.10 or higher.

Create and activate a virtual environment, then install the project in editable mode with all optional dependencies:

python3 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -e ".[all]"

Alternatively, you can install only specific extras:

pip install -e ".[nlp]"
pip install -e ".[visualization]"
pip install -e ".[dev]"

See: README.md, Getting Started Wiki

Install pre-commit hooks to enforce code quality:

pre-commit install

Run insidellms doctor as a post-install sanity check. This command reports optional dependency gaps as warnings — in a [dev]-only environment you may see nltk/pydantic-related warnings, but the command will not fail outright.

The Makefile defaults to PYTHON=python3; Makefile targets can be overridden with make <target> PYTHON=python if you have a python symlink:

make check-fast PYTHON=python

Running Tests#

Run the test suite with:

pytest

For coverage:

pytest --cov=insideLLMs --cov-report=term

To skip slow or integration tests:

pytest -m "not slow and not integration"

Tests that create run artifacts should use an isolated root directory, e.g.:

INSIDELLMS_RUN_ROOT=.tmp/insidellms_runs pytest

Coverage is enforced in CI (minimum 95%). The test suite uses pytest and pytest-asyncio, with markers for slow and integration tests.
See: CONTRIBUTING.md, AGENTS.md

Contributing#

Follow the conventional commit style (feat(scope): ..., fix: ..., test: ..., docs: ..., chore: ...). Keep commits atomic and PRs focused. Add or adjust tests for any behavior changes. PRs should follow the template: clear description, linked issue (if any), test notes, and screenshots for UI/report changes.

Coding style guidelines:

  • Python 3.10, 4-space indentation, Ruff formatting (line length 100)
  • Type hints on public APIs
  • Explicit config surfaces (often via dataclasses)
  • snake_case for modules/functions, PascalCase for types
  • Sorted imports via Ruff

Never commit credentials. Configure providers via environment variables (e.g., OPENAI_API_KEY, ANTHROPIC_API_KEY, GOOGLE_API_KEY, CO_API_KEY/COHERE_API_KEY, HUGGINGFACEHUB_API_TOKEN). For vulnerability reports, follow SECURITY.md (do not open public issues for security findings).

For more details, see CONTRIBUTING.md and AGENTS.md.

Example: Running the Harness#

Create a config file (e.g., harness.yaml):

models:
  - type: openai
    args:
      model_name: gpt-4o
probes:
  - type: logic
    args: {}
dataset:
  format: jsonl
  path: data/questions.jsonl
max_examples: 20
output_dir: results

Run the harness and generate a report:

insidellms harness harness.yaml --run-dir ./runs/candidate
insidellms report ./runs/candidate

This produces records.jsonl, summary.json, and report.html for analysis and comparison.
See: README.md, Getting Started Wiki