procmond Process Collection Architecture#

The procmond daemon now uses an actor-based architecture for process event collection, supporting both coordinated operation with a broker (agent) and standalone mode. This refactoring improves reliability, extensibility, and crash recovery by leveraging message-passing, coordinated startup, dynamic backpressure handling, and configuration hot-reload.

Dual-Mode Operation#

procmond can operate in two modes, determined at startup:

Actor Mode (Broker-Coordinated): If the DAEMONEYE_BROKER_SOCKET environment variable is set, procmond starts in actor mode. It initializes an EventBusConnector for crash-recoverable event delivery, coordinates startup with the agent, and dynamically adjusts collection intervals in response to backpressure. Actor mode can also be forced into standalone behavior using the --standalone CLI flag or PROCMOND_STANDALONE=1 environment variable, bypassing agent coordination.
Standalone Mode: If no broker socket is configured, procmond falls back to standalone operation using the legacy ProcessEventSource with the collector-core framework.

Actor Pattern: ProcmondMonitorCollector#

In actor mode, process collection is managed by ProcmondMonitorCollector, which runs as an actor in a dedicated task. Communication is performed via a bounded mpsc channel (capacity: 100), ensuring sequential message processing and consistent state management. The actor exposes a typed interface (ActorHandle) for sending messages, supporting request/response patterns via oneshot channels.

Actor Messages#

The actor processes the following message types, typically received via the RpcServiceHandler (which subscribes to control topics and forwards RPC requests):

HealthCheck: Returns detailed health and state information.
UpdateConfig: Atomically updates configuration at the next cycle boundary (supports hot-reload for selected parameters).
GracefulShutdown: Initiates a coordinated shutdown, completing the current cycle before stopping.
BeginMonitoring: Signals the collector to start monitoring after agent startup coordination.
AdjustInterval: Dynamically adjusts the collection interval in response to backpressure signals.

These messages are triggered either by the agent (via RPC) or by internal system events (such as backpressure).

Startup Coordination#

On startup, the actor subscribes to control topics and waits for a BeginMonitoring command from the agent before starting collection. This ensures all collectors are ready and the agent has completed privilege dropping. The subscription is established after registration succeeds to prevent race conditions where the agent's broadcast could arrive before the collector is listening. If the subscription fails or if the --standalone flag or PROCMOND_STANDALONE=1 environment variable is set, procmond falls back to immediate-start standalone mode without agent coordination.

EventBusConnector Integration#

In actor mode, events are published to the broker via EventBusConnector, which uses a write-ahead log (WAL) for crash-recoverable delivery. If the broker is unavailable, events are buffered and replayed when connectivity is restored. The connector also provides backpressure signals to the actor, triggering dynamic interval adjustment (1.5x slowdown during backpressure).

With the introduction of registration and RPC services, the architecture now uses a shared EventBusConnector (wrapped in Arc<RwLock<...>>) for control-plane messages (registration, heartbeat, RPC), while the collector itself owns a separate connector for process event publishing. This separation allows for future enhancements where the connector supports both event and generic message publishing.

Actor Pattern: ProcmondMonitorCollector#

Registration and RPC Service Architecture#

procmond now includes two key control-plane components:

RegistrationManager: Handles the full registration lifecycle with the daemoneye-agent, including:
- Initial registration on startup (with retries and exponential backoff)
- Periodic heartbeat publishing to a dedicated topic (with health and metrics)
- Graceful deregistration on shutdown (notifying the agent and updating state)
- State machine tracking (Unregistered, Registering, Registered, Deregistering, Failed)
- Statistics tracking for registration attempts, heartbeats, and failures
The RegistrationManager runs a background heartbeat task that only publishes when registered. Heartbeat messages include health status, buffer usage, and connection state. Deregistration is triggered during graceful shutdown, ensuring the agent is notified.
RpcServiceHandler: Subscribes to the collector's control topic and handles incoming RPC requests from the agent, including:
- HealthCheck: Returns detailed health and component status
- UpdateConfig: Applies configuration changes at runtime (with validation)
- GracefulShutdown: Initiates a coordinated shutdown of the collector
- Error handling, timeout management, and statistics tracking for all RPC operations
The RpcServiceHandler parses incoming requests, dispatches them to the actor, and prepares responses (to be published when generic message support is available in the connector). All control-plane operations (health, config, shutdown) are now driven via RPC, enabling remote management and monitoring.

These components are integrated into the main actor-mode startup flow, sharing the event bus connector for control messages and coordinating shutdown and heartbeat behavior. This design enables robust, observable, and remotely managed operation in agent-coordinated deployments.

Dynamic Interval Adjustment (Backpressure)#

A background task monitors backpressure signals from the event bus. When activated, the collection interval is increased by 1.5x to reduce event throughput. When released, the interval is restored to its original value. This mechanism helps prevent event loss and overload during downstream congestion.

Configuration Hot-Reload#

Selected configuration parameters can be updated at runtime via ActorHandle::update_config(). Hot-reloadable settings include:

collection_interval (frequency)
max_events_in_flight (backpressure limit)
Lifecycle detection thresholds

Other parameters, such as excluded PIDs or event-driven mode, require a restart to take effect.

CLI Options#

The CLI exposes configuration parameters for both modes:

--database <path>: Path to the process event database (default: /var/lib/daemoneye/processes.db)
--log-level <level>: Logging verbosity (info, debug, etc.; default: info)
--interval <seconds>: Collection interval in seconds (minimum: 5, maximum: 3600; default: 30)
--max-processes <n>: Maximum number of processes to collect per cycle (0 for unlimited; default: 0)
--enhanced-metadata: Enable collection of enhanced process metadata (requires privileges)
--compute-hashes: Enable computation of executable hashes for collected processes
--standalone: Start monitoring immediately without waiting for an agent BeginMonitoring signal (also: PROCMOND_STANDALONE=1)

Standalone Mode: ProcessEventSource#

If no broker is configured, procmond uses the legacy ProcessEventSource with collector-core. This mode provides basic process event collection and lifecycle management, but does not support coordinated startup, crash-recoverable event delivery, or dynamic backpressure.

Testing Strategies#

Testing for the actor-based architecture includes:

Actor Message Handling: Unit and integration tests for all actor message types, including health checks, configuration updates, shutdown, and interval adjustment.
Startup Coordination: Tests for agent-coordinated startup, ensuring collection does not begin until signaled.
EventBusConnector and WAL: Tests for crash recovery, event replay, and backpressure signaling.
Configuration Hot-Reload: Tests for atomic configuration updates at cycle boundaries and correct application of hot-reloadable parameters.
Dual-Mode Operation: Tests for correct fallback to standalone mode when broker is unavailable.
CLI Tests: Validation of argument parsing and mapping to configuration parameters.

Example actor message test:

#[tokio::test]
async fn test_actor_health_check() {
    let db_manager = create_test_database().await;
    let config = ProcmondMonitorConfig::default();
    let (collector, handle) = create_collector_with_channel(db_manager, config).unwrap();
    let health = handle.health_check().await.unwrap();
    assert_eq!(health.state, CollectorState::WaitingForAgent);
}

Example backpressure adjustment test:

#[tokio::test]
async fn test_backpressure_adjustment() {
    let db_manager = create_test_database().await;
    let config = ProcmondMonitorConfig::default();
    let (collector, handle) = create_collector_with_channel(db_manager, config).unwrap();
    let new_interval = Duration::from_secs(45);
    handle.adjust_interval(new_interval).unwrap();
    // Verify collector interval updated
}

For implementation details, see main.rs and monitor_collector.rs.

This architecture provides robust, coordinated, and crash-recoverable process event collection for DaemonEye, supporting both agent-managed and standalone deployments.