Query Pipeline and SQL Dialect

Overview#

DaemonEye implements a sophisticated SQL-to-IPC Translation pipeline that allows operators to write complex SQL detection rules while maintaining strict security boundaries and optimal performance. This document explains how the query pipeline works and the limitations of the supported SQL dialect.

Query Pipeline Architecture#

DaemonEye's query processing follows a two-phase approach:

Phase 1: SQL-to-IPC Translation#

SQL Parsing: User-written SQL detection rules are parsed using the sqlparser crate
AST Analysis: The Abstract Syntax Tree is analyzed to extract collection requirements
Task Generation: Simple protobuf collection tasks are generated for procmond
Overcollection Strategy: procmond may collect more data than strictly needed to ensure comprehensive detection

Phase 2: Data Collection & Analysis#

Process Collection: procmond executes the protobuf tasks to collect process data
Data Storage: Collected data is stored in the redb event store
SQL Execution: The original SQL rule is executed against the collected data
Alert Generation: Detection results trigger alert generation and delivery

Supported SQL Dialect#

DaemonEye supports a restricted SQL dialect optimized for process monitoring and security. The dialect is based on SQLite syntax with specific limitations and extensions.

Allowed SQL Constructs#

Basic Queries#

-- Simple SELECT queries
SELECT * FROM processes WHERE name = 'suspicious-process';

-- Aggregations
SELECT COUNT(*) as process_count, name
FROM processes
GROUP BY name
HAVING COUNT(*) > 10;

-- Joins (when applicable)
SELECT p.name, p.pid, s.start_time
FROM processes p
JOIN scans s ON p.scan_id = s.id;

Supported Functions#

String Functions (useful for process data analysis):

-- String length analysis
SELECT name, LENGTH(command_line) as cmd_length
FROM processes WHERE LENGTH(command_line) > 100;

-- Substring extraction
SELECT name, SUBSTR(executable_path, 1, 10) as path_prefix
FROM processes WHERE executable_path IS NOT NULL;

-- Pattern matching
SELECT * FROM processes
WHERE name LIKE '%suspicious%'
   OR executable_path LIKE '/tmp/%';

-- String search
SELECT * FROM processes WHERE INSTR(command_line, 'malicious') > 0;

Encoding Functions (useful for hash analysis):

-- Hexadecimal encoding/decoding
SELECT name, HEX(executable_hash) as hash_hex
FROM processes WHERE executable_hash IS NOT NULL;

-- Binary data analysis
SELECT name, UNHEX(executable_hash) as hash_binary
FROM processes WHERE LENGTH(executable_hash) = 64;

Mathematical Functions:

SELECT name, cpu_usage, memory_usage
FROM processes
WHERE cpu_usage > 50.0 OR memory_usage > 1073741824;

Banned SQL Constructs#

Security-Critical Functions#

load_extension() - SQLite extension loading
eval() - Code evaluation
exec() - Command execution
system() - System calls
shell() - Shell execution

File System Operations#

readfile() - File reading
writefile() - File writing
edit() - File editing

Complex Pattern Matching#

glob() - Glob patterns
regexp() - Regular expressions (performance concerns)
match() - Pattern matching

Mathematical Functions (Not Applicable)#

abs(), random(), randomblob()

Formatting Functions (Not Applicable)#

quote(), printf(), format(), char(), unicode(), soundex(), difference()

Process Data Schema#

CREATE TABLE processes (
    id INTEGER PRIMARY KEY,
    scan_id INTEGER NOT NULL,
    collection_time INTEGER NOT NULL,
    pid INTEGER NOT NULL,
    ppid INTEGER,
    name TEXT NOT NULL,
    executable_path TEXT,
    command_line TEXT,
    start_time INTEGER,
    cpu_usage REAL,
    memory_usage INTEGER,
    status TEXT,
    executable_hash TEXT, -- SHA-256 hash in hex format
    hash_algorithm TEXT, -- Usually 'sha256'
    user_id INTEGER,
    group_id INTEGER,
    accessible BOOLEAN,
    file_exists BOOLEAN,
    environment_vars TEXT, -- JSON string
    metadata TEXT, -- JSON string
    platform_data TEXT -- JSON string
);

Example Detection Rules#

Basic Process Monitoring#

SELECT pid, name, executable_path, command_line
FROM processes
WHERE name LIKE '%suspicious%'
   OR name LIKE '%malware%'
   OR name LIKE '%backdoor%';

Resource Usage Analysis#

SELECT pid, name, cpu_usage, memory_usage, command_line
FROM processes
WHERE cpu_usage > 80.0
   OR memory_usage > 2147483648
ORDER BY memory_usage DESC;

Hash-Based Detection#

SELECT pid, name, executable_path, executable_hash
FROM processes
WHERE executable_hash IN (
    'a1b2c3d4e5f6789012345678901234567890abcdef1234567890abcdef',
    'f1e2d3c4b5a6978012345678901234567890abcdef1234567890abcdef'
);

Command Line Analysis#

SELECT pid, name, command_line
FROM processes
WHERE command_line LIKE '%nc -l%'
   OR command_line LIKE '%wget%'
   OR command_line LIKE '%curl%'
   OR command_line LIKE '%base64%'
   OR LENGTH(command_line) > 1000;

Path-Based Detection#

SELECT pid, name, executable_path
FROM processes
WHERE executable_path LIKE '/tmp/%'
   OR executable_path LIKE '/var/tmp/%'
   OR executable_path LIKE '/dev/shm/%'
   OR executable_path LIKE '%.exe'
   OR executable_path IS NULL;

Performance Considerations#

Query Optimization#

Indexing: Time-based indexes are automatically created for efficient querying
Batch Processing: Large result sets are processed in batches to prevent memory issues
Query Timeouts: All queries have configurable timeouts to prevent system hangs

Resource Limits#

Memory Usage: Queries are limited to prevent excessive memory consumption
CPU Usage: Complex queries are throttled to maintain system performance
Result Size: Large result sets are paginated to prevent memory exhaustion

Security Considerations#

SQL Injection Prevention#

AST Validation: All SQL is parsed and validated before execution
Prepared Statements: All queries use parameterized statements
Function Whitelist: Only approved functions are allowed
Sandboxed Execution: Queries run in read-only database connections

Data Privacy#

Field Masking: Sensitive fields can be masked in logs and exports
Command Line Redaction: Command lines can be redacted for privacy
Access Control: Database access is restricted by component

Best Practices#

Writing Effective Detection Rules#

Use Specific Patterns: Avoid overly broad patterns that generate false positives
Leverage Hash Detection: Use executable hashes for precise malware detection
Combine Multiple Criteria: Use multiple conditions to reduce false positives
Test Thoroughly: Validate rules against known good and bad processes

Performance Optimization#

Use Indexes: Leverage time-based and field-based indexes
Limit Result Sets: Use LIMIT clauses for large queries
Avoid Complex Joins: Keep queries simple and focused
Monitor Resource Usage: Watch for queries that consume excessive resources

Security Guidelines#

Validate Input: Always validate user-provided SQL fragments
Use Parameterized Queries: Never concatenate user input into SQL
Review Function Usage: Ensure only approved functions are used
Monitor Query Performance: Watch for queries that might indicate attacks

Troubleshooting#

Common Issues#

Query Syntax Errors: Check SQL syntax against supported dialect. Ensure all functions are in the allowed list. Verify table and column names.
Performance Issues: Add appropriate indexes. Simplify complex queries. Use LIMIT clauses for large result sets.
Security Violations: Review banned function usage. Check for SQL injection attempts. Validate input parameters.

Debugging Queries#

-- Use EXPLAIN to understand query execution
EXPLAIN SELECT * FROM processes WHERE name LIKE '%test%';

-- Check query performance
SELECT COUNT(*) as total_processes FROM processes;
SELECT COUNT(*) as recent_processes FROM processes
WHERE collection_time > (strftime('%s', 'now') - 3600) * 1000;