Documents
SQL-to-IPC Detection Architecture
SQL-to-IPC Detection Architecture
Type
External
Status
Published
Created
Mar 8, 2026
Updated
Apr 3, 2026
Updated by
Dosu Bot

AI summary
This document, "DaemonEye Spec -- SQL-to-IPC Detection Architecture," defines the model for translating SQL detection rules into instructions for collectors to produce security alerts.
Key Concepts:

  • Purpose: To standardize the SQL to IPC to storage to Alert path, allowing collectors (e.g., procmond) and the daemoneye-agent to evolve independently.
  • Mental Model (Virtual Database): Collectors stream event data as "records" via IPC into virtual schemas (logical, read-only tables). The agent exposes a global catalog of these tables.
  • Detection Flow: The Agent loads SQL rules, translates them into DetectionTasks for Collectors with pushdown optimization, stores events, and generates alerts.
  • Specialty Collectors: YARA integration, supplemental rule data for eBPF/ETW/EndpointSecurity.
  • Dynamic Reactive Pipeline: Non-DAG reactive system where SQL JOINs trigger cascading analysis.
  • Performance Bounds for Regex: Strict bounds including 10ms per-pattern latency, 100ms compilation timeout, 1MB memory limit per pattern.

1) Purpose & Non-Goals#

Purpose. Define a precise model for how SQL rules are translated into collector instructions and evaluated to produce alerts. This spec standardizes the SQL to IPC to storage to Alert path so collectors and the agent evolve independently without breaking rules.

Non-Goals#

  • Building a general RDBMS. The agent executes a constrained, read-only SQL dialect for detections.
  • Requiring kernel hooks. Baseline collectors may be user-mode (e.g., sysinfo), with advanced sources added by tier (eBPF/ETW/EndpointSecurity).

2) Mental Model (Virtual Database)#

Each collector contributes a virtual schema---logical, read-only tables backed by event streams. The agent exposes a global catalog (namespaced) used by rules. Examples:

  • processes.* (e.g., processes.snapshots, processes.events.exec)
  • network.* (e.g., network.connections)
  • kernel.* (e.g., kernel.image_loads)
    Collectors "insert" records into these tables by streaming IPC messages to the agent. The agent persists events in a lightweight embedded store and runs validated SQL rules over the persisted store only when the entire rule cannot be pushed down to the collector.
    Audit Trail: The collector is responsible for logging matches in the write-only audit log using the collector-core framework, independent of the agent's further analysis.

Key Architectural Principles#

  • Logical Views: SQL rules are treated as logical views, not materialized tables
  • Pushdown Optimization: Simple predicates and projections are pushed to collectors
  • Operator Pipeline: Complex operations (joins, aggregations) execute in the agent's operator pipeline
  • Bounded Memory: All operations use cardinality caps and time windows to prevent unbounded growth

3) End-to-End Flow#

The end-to-end flow operates on a host where a Collector (procmond or others) communicates via IPC (protobuf + framing) to the daemoneye-agent. The agent persists data and evaluates rules using the Operator Pipeline over redb, then sends alerts to Alert Sinks (stdout/syslog/webhook). Rule Packs / SQL Rules feed into the agent.

Two Layers#

  1. Stream Layer (IPC) --- Low-latency filtering; collectors push only the needed records.
  2. Operator Pipeline Layer --- SQL parsed into logical plans, mapped onto scan/query operators over redb. This enables joins, grouping, time windows, and dedupe.

4) Query Lifecycle & Pushdown#

The query lifecycle proceeds as follows:

  1. Operator/Rule Pack loads/enables SQL rules
  2. Agent parses SQL to AST, validates dialect and safety
  3. Agent plans query against virtual catalog
  4. Agent derives DetectionTasks with pushdown for Collector
  5. Collector streams matching records via IPC
  6. Agent appends events to tables and executes operator pipeline
  7. Agent generates alerts with dedupe and metadata

Pushdown Rules#

Agent derives from the AST:

  • Tables/aliases - which collectors to engage
  • Columns - projection list for each collector
  • Predicates - pre-filter on collector side when feasible
  • Windows/Aggregations/Joins - kept for the agent's operator pipeline
    If a collector cannot pushdown a predicate, it sends its minimal projection and the agent filters on ingest.

4.1) Collector Schema Contract (Virtual Catalog)#

Purpose#

Standardize how collectors announce their tables, columns, and pushdown capabilities so the agent can plan rules deterministically.

Descriptor (sent at startup & on change)#

{
  "collector_id": "<name>@<platform-arch>",
  "version": "semver",
  "tables": [
    {
      "name": "<domain>.<table>",
      "columns": { "col": "type[?]", "collection_time": "datetime" },
      "keys": {
        "primary": ["collection_time", "seq"],
        "secondary": ["pid", "ppid", "name"]
      },
      "join_hints": [
        { "left": "ppid", "right_table": "processes.snapshots", "right": "pid", "relation": "parent_of" }
      ],
      "pushdown": {
        "predicates": ["=", "!=", ">", "<", ">=", "<=", "IN", "LIKE", "REGEXP"],
        "project": true,
        "order_by": ["collection_time"],
        "limits": { "max_rate": 200000 }
      }
    }
  ]
}

Agent Behavior#

  • Validates schema, registers/updates global catalog
  • Unknown optional columns are ignored; missing required columns reject the table
  • Capabilities become the upper bound for pushdown during planning

Evolution#

  • Additive columns allowed anytime
  • Breaking changes require version bump and a deprecation grace period

Health & Metrics#

Collectors MUST expose: send rate, drops, queue depth, last ACK seq_no, and schema version.

4.2) Minimum Contracts by Collector Type#

Processes Collector (procmond)#

Table: processes.snapshots
Required columns: pid, ppid?, name, executable_path?, command_line?, start_time?, executable_hash?, collection_time
Secondary keys (min): pid, ppid, name, executable_hash
Join hints: ppid -> processes.snapshots.pid (parent_of)
Pushdown (min): =, !=, IN, LIKE (including % wildcards), REGEXP, projection; optional > >= < <= on numeric/time

Network Collector#

Table: network.connections
Required columns: pid?, proto(tcp,udp), src_ip, src_port, dst_ip, dst_port, state?, bytes_sent?, bytes_recv?, collection_time

Image/Module Load Collector#

Table: kernel.image_loads
Required columns: pid?, image_path, image_hash?, collection_time

File System Collector (optional baseline)#

Table: fs.events
Required columns: pid?, op(create,write,read,delete,rename), path, size?, collection_time

Auth/Security Events Collector#

Table: security.auth
Required columns: user, host?, event(logon,logoff,lock,unlock,auth_fail), status(success,fail)?, pid?, collection_time

General Requirements (All Collectors)#

  • Clock: timestamps MUST be monotonic per host; include clock_skew_ms in heartbeat
  • Sequence: seq_no strictly increasing per task_id
  • Types: adhere to documented scalar types; ? denotes optional
  • Backpressure: respect agent credits; cap local buffers; report drops with counters

Regex Implementation Requirements#

All collectors MUST implement regex pattern matching with:

  • Standard regex engine (e.g., Rust regex crate, RE2, PCRE)
  • Case-sensitive and case-insensitive matching via (?i) flag
  • Configurable per-pattern latency bounds
  • Bounded compilation and execution to prevent DoS
  • Compilation caching with LRU eviction

Configurable Performance Bounds#

Default Configuration:

  • Per-Pattern Latency: 10ms
  • Compilation Timeout: 100ms
  • Memory Limit: 1MB per pattern
detection:
  regex:
    max_pattern_latency_ms: 10
    compilation_timeout_ms: 100
    max_memory_per_pattern: 1048576
    enable_precompilation: true
    cache_size: 1000
    monitoring:
      enable_metrics: true
      alert_threshold_ms: 5

Pattern Complexity Guidelines#

Acceptable:

  • Simple patterns: basic literals, character classes, quantifiers
  • Moderate patterns: alternation with limited branches
  • Complex patterns: backreferences, lookahead/lookbehind (with execution limits)
    Avoid: Exponential complexity patterns, catastrophic backtracking, excessive alternation, multiple lookaheads.

4.7) Complex Pattern Matching (YARA Integration)#

YARA Integration Architecture#

YARA rules represent a fundamentally different detection paradigm from SQL-based key-value predicates. The architecture handles YARA through the "agent-only operations" principle with specialized collectors.

YARA Engine Collector#

{
  "collector_id": "yara-engine@linux-x86_64",
  "tables": [{
    "name": "yara.scan_results",
    "columns": {
      "file_path": "string", "rule_name": "string",
      "matches": "json", "scan_time": "datetime", "file_hash": "hex256"
    },
    "pushdown": { "predicates": ["=", "!=", "IN"], "project": true }
  }]
}

Hybrid Detection Patterns#

-- Complex rule combining simple filters + YARA results
SELECT p.pid, p.name, f.path, y.rule_name, y.matches
FROM processes.snapshots p
JOIN fs.events f ON f.pid = p.pid
JOIN yara.scan_results y ON y.file_path = f.path
WHERE p.name = 'rundll32.exe'
  AND f.path LIKE '/tmp/%'
  AND y.rule_name = 'suspicious_behavior'
  AND f.collection_time > datetime('now', '-30 minutes');

Performance Considerations#

  • YARA scanning happens per-file in the collector
  • Only scan results (not file content) stream to agent
  • Agent stores lightweight YARA results in redb
  • Collector can implement sampling for high-volume files

4.8) Supplemental Rule Data for Specialty Collectors#

Specialty collectors receive supplemental rule data alongside standard SQL pushdown. This includes YARA rules, eBPF programs, network analysis filters, and platform-specific rule data.

pub enum SupplementalRuleData {
    YaraRules { version: String, rules: Vec<YaraRule>, scan_options: YaraScanOptions },
    NetworkAnalysis { version: String, analysis_type: NetworkAnalysisType, filters: Vec<NetworkFilter>, patterns: Vec<NetworkPattern>, config: NetworkAnalysisConfig },
    PlatformSpecific { platform: String, rule_type: String, data: serde_json::Value },
}

pub trait SpecialtyRuleEngine {
    type RuleData;
    type ExecutionResult;
    async fn load_rules(&mut self, rules: Self::RuleData) -> Result<(), EngineError>;
    async fn execute_rules(&self, data: &[u8]) -> Result<Vec<Self::ExecutionResult>, EngineError>;
    async fn cleanup(&mut self) -> Result<(), EngineError>;
}

4.9) Dynamic Reactive Pipeline (Non-DAG Architecture)#

The SQL-to-IPC detection engine operates as a reactive pipeline where:

  • Initial Triggers - Base collectors detect events
  • JOIN-Driven Collection - SQL JOINs trigger collection of the second half of the join
  • Cascading Analysis - Results from one collector trigger additional analysis
  • Dynamic JOINs - Agent orchestrates cross-collector correlation
  • Feedback Loops - Results can trigger deeper analysis of the same data

Agent Orchestration Logic#

pub struct ReactiveOrchestrator {
    collectors: HashMap<String, Box<dyn Collector>>,
    trigger_rules: Vec<TriggerRule>,
    analysis_queue: VecDeque<AnalysisTask>,
}

pub enum AnalysisType {
    PeAnalysis { file_path: String },
    YaraScanning { file_path: String, rules: Vec<String> },
    MemoryAnalysis { pid: u32, regions: Vec<MemoryRegion> },
    NetworkDeepPacket { connection_id: String },
}

4.10) Automatic JOIN Triggers (Implicit Correlation)#

SQL rules can specify automatic JOIN triggers using special syntax:

-- Automatic network socket collection for processes
SELECT p.pid, p.name, n.dst_ip, n.dst_port, n.protocol
FROM processes.snapshots p
AUTO JOIN network.connections n ON n.pid = p.pid
WHERE p.name = 'suspicious.exe'
  AND n.dst_port > 1024;

-- Conditional Auto-JOIN with WHEN
SELECT p.pid, p.name, n.dst_ip, n.dst_port
FROM processes.snapshots p
AUTO JOIN network.connections n ON n.pid = p.pid
  WHEN p.name LIKE '%suspicious%' OR p.cpu_usage > 80.0
WHERE p.name = 'suspicious.exe';

The sqlparser crate is extended with a custom DaemonEye dialect that maintains SQLite compatibility while adding AUTO JOIN and WHEN clause support.

9) Storage & Execution Model#

Why Not a Full RDBMS?#

Embedded SQL engines like SQLite are heavyweight and don't align with the zero-network design. redb is a key-value store, not a relational engine.

Chosen Approach: Operator Pipeline#

  • Parse SQL using sqlparser (SQLite dialect)
  • Translate AST to internal logical plan to chain of operators (scan, filter, project, join, aggregate)
  • Execute operators directly against redb
  • Support streaming ingestion with append-only tables + time-indexed keys

Smart Joins (No Mandatory Time Window)#

Joins are supported without requiring time windows, using bounded state:
Physical Strategies:

  1. Index Nested-Loop (INLJ) - Default, selective. Build from smaller/filtered input, probe via secondary index. O(n log m).
  2. Bounded Symmetric Hash Join (SHJ) - For concurrent streams. Hash both sides in bounded LRU maps. Cardinality budget (e.g., max 100k keys).
  3. Materialized Relation Cache (MRC) - Parent/child fast path. Compact map (pid -> {ppid, parent_name, start_time}). Single keyed lookup.
    Bounding Without Time Windows:
  • Cardinality caps per-join with configurable memory budgets
  • Automatic fallback from SHJ to INLJ when caps reached
  • Optional spill to KV-backed scratch area

Write-Through & Persistence Semantics#

Ingest Transaction: On every IPC StreamRecord: validate/normalize, append to redb base table + update secondary indexes, ACK to collector after commit.
No-Join Path: Evaluate predicate on ingested row, compute dedupe key, emit alert immediately.
Join Path: Execute join strategy over redb + in-memory state, dedupe and emit alert.
Aggregation Path: Maintain in-memory hash aggregates, flush snapshots for recovery, emit alert on HAVING satisfaction.
Crash Safety: Ingest commit is atomic; aggregation state restored from snapshots; MRC rebuilt from recent events.

Materialization Policy#

Core Principle: Treat every rule as a logical view. Store the parts (base tables + minimal derived state), not the fully realized join results.
What We Persist:

  • Base event tables per collector plus secondary indexes
  • Alert log with rule ID, dedupe key, pointers to source rows
  • Operator state (aggregation snapshots, MRC)
    What We Do Not Persist: Materialized join outputs (computed on-demand, used for alerts, then discarded).

redb Performance Playbook#

Physical Layout:

  • Partitions: One base table per source per time bucket (daily/hourly)
  • Primary Key: (ts_ms: u64, seq: u32) - 16 bytes, strictly increasing
  • Secondary Indexes: idx, idx, idx (hash), idx
    Query Planning: Time first (resolve partitions), driving index by selectivity, set-based intersections of posting lists.
    Writer Architecture: Single writer thread with group commit (N=2,000 records or T=5-10ms). Pipeline: IPC -> lock-free MPSC -> ring buffer -> writer. MVCC snapshots for readers.
    Key Encoding:
#[repr(C, packed)]
struct RowKey { ts_ms: u64, seq: u32, pad: u32 }

#[repr(C, packed)]
struct NameIdxKey { name_hash128_hi: u64, name_hash128_lo: u64, ts_ms: u64, seq: u32 }

Tunables (Sane Defaults):

  • partition_kind = hourly if > ~2M events/day; else daily
  • writer_batch_records = 2,000, writer_batch_ms = 7
  • posting_cache_bytes = 128MiB
  • mrc_window = 30m
  • join_cardinality_cap = 100k keys, rows_per_key_cap = 64

Rule Lifecycle & Management#

  • Rule Packs: Collections of related SQL rules with metadata
  • Hot Reloading: Rules can be added/removed/updated without agent restart
  • Rule States: Draft -> Active -> Disabled -> Archived
  • Versioning: Rule changes create new versions with deprecation periods

Error Handling & Recovery#

  • Connection Loss: Agent marks tables as unavailable
  • Schema Mismatch: Version incompatibility handling
  • Rate Limiting: Backpressure triggers
  • Automatic Reconnection: Exponential backoff with jitter

Monitoring & Observability#

  • Rule execution latency, match rates, error counts
  • IPC message rates, queue depths
  • redb write latency, index maintenance, compaction stats
  • Health, metrics, debug, and catalog endpoints

11) IPC Message Contracts#

DetectionTask (Agent -> Collector)#

{
  "task_id": "t_9482",
  "rule_id": "r_process_name_exact",
  "table": "processes.snapshots",
  "project": ["pid", "name", "executable_path", "cpu_usage", "collection_time"],
  "where": { "and": [{ "eq": ["name", "minidump.exe"] }, { "gt": ["cpu_usage", 0.0] }] },
  "rate_limit": { "per_sec": 25000 },
  "ttl_ms": 600000
}

StreamRecord (Collector -> Agent)#

Envelope: { seq_no, task_id, table, checksum, record }. Record is a typed row matching the advertised schema.

Persistence & Query (Agent-Side)#

  • Write path: append-only into redb tables with periodic checkpoints
  • Read path: prepared, read-only statements with timeouts and memory quotas
  • Dedupe: rule-scoped dedupe key to avoid alert storms

12) SQL Dialect (Constrained)#

We do not embed SQLite. We only use the SQLite dialect for rule authoring and parsing. The agent parses rules with sqlparser, pushes down projections/predicates to collectors, and executes remaining logic against the operator pipeline over redb.

  • Allowed Statements: SELECT only
  • Banned Functions: load_extension, readfile, system, random, printf
  • Allowed Functions: Basic aggregations, string ops, date/time helpers
  • Security: AST validation enforces constraints before execution

13) Examples#

-- Simple Name Match (pushdown heavy)
SELECT pid, name, executable_path, collection_time
FROM processes.snapshots WHERE name = 'minidump.exe';

-- Regex Pattern Matching
SELECT pid, name, executable_path, collection_time
FROM processes.snapshots WHERE name REGEXP '(?i)malware|trojan|virus|backdoor';

-- Parent/Child Relationship (join in operator pipeline)
SELECT c.pid, c.name, p.pid AS ppid, p.name AS parent
FROM processes.snapshots c
LEFT JOIN processes.snapshots p ON p.pid = c.ppid
WHERE c.name = 'rundll32.exe' AND parent = 'winword.exe';

-- Aggregation / Rare Events
SELECT name, COUNT(*) AS launches
FROM processes.snapshots
WHERE collection_time BETWEEN :t_start AND :t_end
GROUP BY name HAVING launches > 50;

14) Performance & Backpressure#

  • Targets: <100ms/rule eval, >1k records/sec sustained write per agent
  • Channels: bounded MPSC with credit-based flow control
  • Rate Caps: agent can issue rate_limit hints per task
  • Overload Policy: drop oldest buffered streams per-task before global backoff

15) Reliability & Security#

  • Framing: length-delimited protobuf, CRC32, monotonically increasing seq_no
  • AuthZ: per-collector identity; table-level ACLs in the catalog
  • Sandbox: read-only query engine, allow-listed functions, strict limits
  • Tamper-evidence: audit ledger chained with cryptographic integrity

16) Testing & Validation#

  • AST fuzzing of SQL parser/validator and pushdown planner
  • Golden tests: SQL to DetectionTask JSON snapshots
  • Soak tests: synthetic high-rate streams with backpressure assertions
  • E2E: collector to agent to redb to alert sinks

17) Versioning & Compatibility#

  • Contracts: catalog@vN, detection_task@vN, stream_record@vN
  • Forward compat: agent tolerates unknown optional fields
  • Deprecation: announce in catalog.changes feed

18) Open Items#

  • Implement operator pipeline with redb as backend
  • Optimize join and aggregation strategies
  • Sliding/windowed aggregations helper primitives
  • Query plan caching and statistics collection

19) TL;DR (Operator-Facing)#

Collectors act like table providers. The agent turns SQL rules into simple tasks so collectors only ship what matters. The agent then compiles SQL into an operator pipeline over redb for joins/aggregations, and emits deduped alerts.
Key Benefits:

  • Performance: Pushdown reduces network traffic; operator pipeline handles complex logic efficiently
  • Scalability: Bounded memory usage prevents resource exhaustion
  • Flexibility: SQL-based rules are familiar to security operators
  • Reliability: ACK-after-commit semantics ensure data consistency
  • Debugging: Alert records contain pointers to reconstruct join results