System Health Monitoring#
Overview#
Implement system health monitoring dashboard with service status checks, diagnostic links, and cache stampede prevention. This ticket delivers the system health monitoring flow from spec:50650885-e043-4e99-960b-672342fc4139/c565e255-83e7-4d16-a4ec-d45011fa5cad.
Scope#
Included:
- SystemHealthController with health check logic
- System health dashboard view
- Service health checks (PostgreSQL, Redis, MinIO, Sidekiq)
- Redis lock for cache stampede prevention
- SystemHealthCardComponent for service status display
- Diagnostic links to Sidekiq dashboard and logs
- Sidebar navigation link to system health
Excluded:
- Historical health data (no database persistence)
- Alerting or notifications (out of scope)
- Automated remediation (out of scope)
- Health check API endpoints (web UI only)
Acceptance Criteria#
SystemHealthController#
- Controller created: app/controllers/system_health_controller.rb
- Action:
indexdisplays health dashboard - Authorization: requires authentication, uses CanCanCan
- Health checks cached with 1-minute TTL
- Redis lock prevents cache stampede (only one request runs checks)
- Concurrent requests return cached value or "checking" status
Service Health Checks#
PostgreSQL:
- Check:
ActiveRecord::Base.connection.execute("SELECT 1") - Returns: status (/), latency (ms)
- Logs errors on failure
Redis:
- Check:
Redis.current.ping - Returns: status (/), latency (ms)
- Logs errors on failure
MinIO:
- Check:
ActiveStorage::Blob.service.exist?("health_check") - Returns: status (/), latency (ms)
- Logs errors on failure
Sidekiq:
- Check:
Sidekiq::Stats.new - Returns: status (/), workers count, queues count, enqueued jobs
- Logs errors on failure
Redis Lock Implementation#
- Lock key:
system_health_check_lock - Lock acquired with:
Redis.current.set(lock_key, "locked", nx: true, ex: 10) - Lock expires after 10 seconds (prevents deadlock)
- If lock not acquired, wait 100ms and return cached value
- If no cached value, return "checking" status for all services
- Lock released after health checks complete
System Health Dashboard#
-
Grid layout with 4 service cards (one per service)
-
Each card shows:
- Service name (PostgreSQL, Redis, MinIO, Sidekiq)
- Status indicator (green checkmark or red X)
- Latency (if healthy)
- Error message (if unhealthy)
- Additional metrics (for Sidekiq: workers, queues, enqueued)
-
Diagnostic links section:
- Link to Sidekiq dashboard (existing
/sidekiqroute) - Link to Rails logs (informational, view in terminal)
- Link to Sidekiq dashboard (existing
-
Manual refresh button (optional, cache auto-expires)
-
Skeleton loader shown while loading
-
Empty state if health checks fail to run
Navigation#
- Sidebar link added: "System Health" or "Monitoring"
- Link visible to all authenticated users
- Link highlights when active (current page)
Components#
-
SystemHealthCardComponentcreated with:- Service name display
- Status badge (success/danger)
- Status icon (check-circle/x-circle)
- Latency display (if available)
- Error message display (if unhealthy)
-
Components use plain Bootstrap HTML and utility classes directly (Railsboot abstraction layer has been removed)
-
Component tested with component specs
Testing#
- Request spec: GET /system_health, verify response
- Request spec: Verify caching works (second request uses cache)
- Request spec: Verify Redis lock prevents concurrent checks
- System test: View system health page, verify all services shown
- System test: Mock service failure, verify error displayed
- Component test for SystemHealthCardComponent
Technical References#
- Core Flows: spec:50650885-e043-4e99-960b-672342fc4139/c565e255-83e7-4d16-a4ec-d45011fa5cad (Flow 4: System Health Monitoring)
- Tech Plan: spec:50650885-e043-4e99-960b-672342fc4139/f3c30678-d7af-45ab-a95b-0d0714906b9e (SystemHealthController, Redis Lock, Component Architecture)
Dependencies#
Requires:
- ticket:50650885-e043-4e99-960b-672342fc4139/[UI Components & Loading States] - Needs SystemHealthCardComponent
Blocks:
- None (other features can be implemented in parallel)
Implementation Notes#
- Create new controller: app/controllers/system_health_controller.rb
- Create new view: app/views/system_health/index.html.erb
- Add route:
resource :system_health, only: [:index] - Update sidebar partial: file/views/partials/_sidebar.html.erb
- Test Redis lock with concurrent requests (use threads in test)
- Ensure health checks work in air-gapped environment
- Consider adding timeout to health checks (5 seconds) to prevent hanging
Component Development Guidelines:
The Railsboot component abstraction layer has been fully removed from the codebase (PR #706). All components must now follow these patterns:
- Use ViewComponent framework for reusable logic and organization
- Render Bootstrap HTML and utility classes directly in component templates (no abstraction layer)
- Reference Bootstrap documentation for component patterns
- Use plain Bootstrap utility classes (e.g.,
card,card-body,badge,text-bg-success,text-body-secondary) - Do not create abstraction layers; use Bootstrap classes directly in ERB templates
- Refer to AGENTS.md for frontend development patterns and layout guidelines
- Refer to GOTCHAS.md for accessibility considerations and common pitfalls
Architectural Note:
The Railsboot component abstraction layer was removed in PR #706. All views and components now use plain ERB templates with Bootstrap utility classes. For example, SystemHealthCardComponent renders direct Bootstrap markup (<div class="card">) instead of using wrapper components like Railsboot::CardComponent.
Estimated Effort#
1-2 days (controller + health checks + Redis lock + view + tests)