Testing Strategy and Framework

Three-Tier Testing Architecture in CipherSwarm#

CipherSwarm employs a comprehensive three-tier testing architecture to ensure reliability, security, and maintainability throughout its development lifecycle. The tiers are: backend (unit/integration), frontend (component/UI), and end-to-end (E2E) tests. Each tier is organized for clarity, coverage, and automation, with integration into continuous integration (CI) workflows.

Test Organization#

Foreign Key Cascade and Nullify Strategy:

CipherSwarm enforces referential integrity using database-level on_delete: :cascade and on_delete: :nullify rules for foreign keys, especially for ephemeral tables (telemetry, statuses) and parent-child relationships.
This approach ensures that child records are automatically deleted or nullified when their parent is removed, regardless of whether deletion occurs via Rails or directly in the database. It prevents orphaned rows and foreign key violations, particularly when using bulk deletes (delete_all) or DB-level cascades that bypass Rails callbacks.
Ephemeral child tables (e.g., DeviceStatus, AgentError, HashcatBenchmark) are set to cascade with their parent. Join tables and parent-child resources (e.g., ProjectUser, HashItem, HashList) also use cascade rules for consistency and defense-in-depth.
When a table has multiple foreign keys to the same parent, the column: option is always specified explicitly in migration helpers to avoid ambiguity.

DB Constraint Testing Best Practices:

When testing FK cascade/nullify behavior, always use delete (not destroy) to ensure that Rails callbacks do not mask missing or incorrect constraints. This verifies that the database itself enforces the intended referential integrity.
Model and job specs include explicit tests for cascade deletion and nullification, such as verifying that DeviceStatus records are removed when their parent HashcatStatus is deleted by a cleanup job, and that HashcatBenchmark records are removed when their parent Agent is deleted.
Tests verify correct uniqueness constraints. For example, the HashcatBenchmark model uses (agent_id, hash_type, device) as the natural unique key, replacing the previous (agent_id, benchmark_date, hash_type) index that included a mutable timestamp and prevented proper deduplication. This demonstrates the importance of testing that uniqueness constraints match the domain model's natural keys.
Migrations that tighten constraints on existing data use deduplication patterns such as DELETE ... WHERE id NOT IN (SELECT DISTINCT ON ...) to clean up duplicates before applying stricter unique indexes. The ChangeHashcatBenchmarksUniqueIndex migration demonstrates this pattern, using DISTINCT ON (agent_id, hash_type, device) to select the latest row (by benchmark_date DESC, id DESC) before applying the new unique index.
This strategy is documented in both migration comments and developer guidance to ensure consistent application and testing of FK rules across the codebase.

Backend Tests#

Backend tests include model, service object, component, and request specs. The suite now includes:

Expanded request specs for API endpoints, authorization, caching, and Turbo Stream UI updates (spec/requests/)
Component specs for all major UI components (spec/components/)
Service specs for business logic and system health (spec/services/)
Job specs with comprehensive coverage of atomic operations, race conditions, error rollback, partial failure recovery, and idempotency patterns (spec/jobs/)
Coverage verification specs (spec/coverage/coverage_verification_spec.rb) that enforce the presence of tests for all controllers, components, services, and system flows
Deployment validation specs (spec/deployment/air_gapped_checklist_spec.rb) that verify air-gapped readiness, asset pipeline integrity, and offline documentation
Performance specs (spec/performance/page_load_performance_spec.rb) that enforce page load and query count SLAs
OpenAPI specification validation using vacuum to ensure RSwag-generated API documentation conforms to OpenAPI 3.0 standards

These tests ensure correctness of business logic, UI rendering, API behavior, and system health, including campaign progress bars, error modals, Turbo Stream updates, and real-time monitoring. The backend suite also verifies database constraint and cascade/nullify behavior, as well as coverage and deployment requirements.

Authorization Testing Best Practices:

Authorization tests verify that access control rules are properly enforced across controllers and concerns. When writing authorization tests, follow these best practices:

Use correct HTTP status codes based on the failure type:
- HTTP 401 Unauthorized: For authentication failures (user not logged in, invalid credentials, no session)
- HTTP 403 Forbidden: For authorization failures (user authenticated but lacks permissions to access the resource)
- When CanCan::AccessDenied is raised, the application returns HTTP 403 Forbidden. Tests must expect :forbidden (403) status, not :unauthorized (401)
Test Turbo Frame-aware error handling:
- Authorization tests should verify that turbo_frame_request? is checked when handling CanCan::AccessDenied
- When a request is made via Turbo Frame (header Turbo-Frame present), the response should render the errors/_not_authorized_frame.html.erb partial within the frame to prevent perpetual "Loading..." states
- Verify that the response includes HTTP 403 Forbidden status, the frame ID from request.headers["Turbo-Frame"], and appropriate error messaging using the i18n key errors.not_authorized
- This pattern ensures that Turbo Frame requests receive a properly formed <turbo-frame> tag with error content, rather than a full-page template that would cause stuck "Loading..." indicators
Test Downloadable concern authorization:
- The Downloadable concern (app/controllers/concerns/downloadable.rb) enforces authorization on all three actions: download, view_file, and view_file_content. Each action calls authorize! to verify CanCanCan permissions before proceeding
- Tests should verify that unauthorized access attempts return HTTP 403 Forbidden for all three methods
- Test coverage should include: project members with access (expect success), users outside the project (expect :forbidden), sensitive vs. public resources (based on ability definitions), admin access (expect success), and unauthenticated users (expect redirect to login)
- When testing Turbo Frame requests to Downloadable actions, verify that the response renders the errors/_not_authorized_frame.html.erb partial with HTTP 403 status
- Example resources using Downloadable: WordLists, RuleLists, MaskLists

These patterns ensure that authorization is consistently tested and that error responses provide appropriate feedback to users in both full-page and Turbo Frame contexts.

Atomic Operation and Job Idempotency Testing:

The test suite includes comprehensive coverage of atomic lock patterns and job idempotency, particularly in ProcessHashListJob (~60 RSpec examples). These tests verify:

Atomic lock behavior using UPDATE ... WHERE processed=false to prevent duplicate processing when after_commit fires multiple times
Race condition handling when multiple jobs attempt to claim the same work
Error rollback and retry scenarios, ensuring the processed flag is reset on failure
Partial failure recovery, where incomplete hash item insertions are cleaned up before retry
Idempotent batch processing with graceful handling of record deletion during processing

The benchmark submission endpoint demonstrates idempotent upsert patterns using upsert_all with unique_by: [:agent_id, :hash_type, :device]. This ensures that re-submissions update existing rows instead of creating duplicates, and allows multi-batch submissions to accumulate rows without duplication. Tests verify partial-success scenarios where invalid entries are filtered out before upsert, and all-invalid payloads return 422 without changing agent state. The pattern includes pre-validation filtering via build_valid_benchmark_records to ensure only valid data reaches the database operation.

This testing pattern demonstrates best practices for background jobs and API endpoints that require exactly-once processing semantics and is recommended for similar atomic operation scenarios across the codebase.

OpenAPI Documentation Testing:

CipherSwarm uses RSwag to generate OpenAPI 3.0 documentation from request specs, and vacuum to lint the generated specification. This validation is enforced in the CI pipeline via the lint_api GitHub Actions job, which runs after scan_ruby and validates that:

The swagger/v1/swagger.json file conforms to OpenAPI 3.0 standards
API documentation stays synchronized with implementation
RSwag specs are properly formatted for OpenAPI 3.0 requestBody generation
PRs must pass vacuum linting before merging

The vacuum linter validates the generated swagger.json file and reports quality scores, preventing API documentation drift. When writing RSwag specs, use the request_body_json helper (defined inside the HTTP method block: post, put, patch) to specify request bodies. The helper accepts schema:, required:, description:, and examples: parameters. The custom ruleset in vacuum-ruleset.yaml disables rules that conflict with Rails conventions (snake_case properties, underscore paths).

System Tests (UI/UX)#

System tests now cover all major user workflows, including:

Agent fleet monitoring (spec/system/agent_fleet_monitoring_spec.rb)
Campaign progress monitoring (spec/system/campaign_progress_monitoring_spec.rb)
Task management and detail investigation (spec/system/tasks_spec.rb, spec/system/task_detail_investigation_spec.rb)
Error investigation and modal flows (spec/system/error_investigation_spec.rb)
System health dashboard and diagnostics (spec/system/system_health_spec.rb)
Campaign creation workflow (spec/system/campaign_creation_workflow_spec.rb)
Loading and feedback patterns (spec/system/loading_feedback_patterns_spec.rb)

System tests use the Page Object Pattern for maintainability and include accessibility/ARIA checks. The suite is validated by coverage verification specs to ensure all critical flows are tested. System tests are run locally and can be excluded from CI as needed (see skip: ENV["CI"].present?).

Frontend Tests#

Frontend tests now include:

JavaScript unit/component tests using Vitest (spec/javascript/), covering all Stimulus controllers (e.g., health_refresh_controller.test.js, tabs_controller.test.js, toast_controller.test.js, select_controller.test.js)
Coverage verification specs to ensure all controllers have corresponding JS tests
Playwright E2E tests for full browser workflows (frontend/e2e/)

Vitest tests are run with just test-js or yarn test:js and are integrated into CI. The test suite ensures all UI logic and controller behaviors are covered, including Turbo Stream and real-time update patterns.

Tom Select Integration Testing:

The select_controller.js (app/javascript/controllers/select_controller.js) provides searchable dropdown functionality using the Tom Select library. The controller is tested with Vitest (spec/javascript/controllers/select_controller.test.js, 8 tests) covering:

Initialization and TomSelect instance creation
Configuration values (allowEmpty, maxOptions) with data attributes
Disconnect cleanup and duplicate connect prevention
Custom attribute handling (e.g., data-select-allow-empty-value, data-select-max-options-value)
Error handling for initialization failures with retry prevention

This controller is used for the hash type dropdown, displaying hashcat mode ID with names (e.g., "0 - MD5") with searchable filtering. The test suite demonstrates best practices for testing Stimulus controllers that wrap third-party JavaScript libraries, including mocking external dependencies (TomSelect) and verifying lifecycle behavior (connect/disconnect).

End-to-End (E2E) Tests#

E2E tests use Playwright to drive the application through real browser sessions against a Dockerized backend. The suite is validated by coverage verification specs to ensure all critical user workflows are covered, including authentication, dashboard, campaign management, agent monitoring, system health, and error handling. E2E tests are run against a seeded test environment and are tracked in the coverage verification and test coverage plan.

Test Coverage#

CipherSwarm targets 100% coverage of user-visible workflows, all user roles (admin, project admin, user), and major browsers (Chromium, Firefox, WebKit) across desktop and mobile viewports. The test plan is divided into phases, prioritizing authentication, dashboard, campaign creation, attack configuration, and resource management first, followed by advanced features, integration, and performance validation. With the addition of campaign progress monitoring, error modals, and recent cracks, coverage now includes real-time progress bars, ETA summaries, error handling flows, and accessibility/ARIA labeling for all campaign and attack monitoring features. Component and system specs ensure all UI states and edge cases are tested. Coverage gaps and priorities are tracked in the E2E test coverage plan.

Test Execution Commands#

Backend#

To run backend tests (RSpec):

just test

To run all tests (RSpec + JavaScript):

just test-all

To run only JavaScript tests:

just test-js

To run only component specs (including CampaignProgressComponent and ErrorModalComponent):

bundle exec rspec spec/components/

To run only system tests (including campaign progress monitoring, agent fleet monitoring, task management, error investigation, system health, campaign creation, and loading/feedback patterns):

bundle exec rspec spec/system/

To run only request specs (including API, authorization, caching, and Turbo Stream updates):

bundle exec rspec spec/requests/

To run coverage verification, deployment, and performance specs:

bundle exec rspec spec/coverage/ spec/deployment/ spec/performance/

To lint the OpenAPI specification locally:

just lint-api

This command validates that the generated OpenAPI specification (swagger/v1/swagger.json) complies with OpenAPI 3.0 standards using vacuum. The same validation is automatically enforced in CI via the lint_api job. The custom ruleset in vacuum-ruleset.yaml disables rules that conflict with Rails conventions (snake_case properties, underscore paths).

In CI, tests are executed with:

bin/bundle exec rspec --profile 10 --format RspecJunitFormatter --out /tmp/test-results/rspec.xml --format progress

See the justfile for additional quality and test commands.

Frontend and E2E#

Frontend and E2E tests are run using Playwright and Vitest. Example commands:

To run Playwright E2E tests:

npx playwright test

To run JavaScript unit/component tests (Stimulus controllers, UI logic):

yarn test:js

just test-js

To run only a specific JS controller test:

yarn vitest run spec/javascript/controllers/health_refresh_controller.test.js

Tests can be targeted to specific files or directories as needed. Vitest configuration is in vitest.config.js. Coverage verification specs ensure all controllers and E2E flows are tested.

Test Fixtures#

CipherSwarm uses FactoryBot for generating test data and DatabaseCleaner for maintaining a clean state between tests. The test suite includes:

Coverage verification specs (spec/coverage/coverage_verification_spec.rb) that enforce the presence of tests for all controllers, components, services, and system flows
Deployment validation specs (spec/deployment/air_gapped_checklist_spec.rb) that verify air-gapped readiness, asset pipeline integrity, and offline documentation
Performance specs (spec/performance/page_load_performance_spec.rb) that enforce page load and query count SLAs

Test factories and model specs are updated to reflect the new database-level foreign key on_delete: :cascade and on_delete: :nullify rules. Factories for ephemeral and parent-child tables (e.g., DeviceStatus, AgentError, HashcatBenchmark, HashItem, ProjectUser) now match the DB schema, and tests verify correct cascade/nullify behavior. This ensures that test data and cleanup accurately represent production referential integrity.

The test suite includes comprehensive coverage of model concerns, such as Agent::Benchmarking. The benchmarking? method determines when an agent is actively running initial benchmarks based on pending state, recent activity, and absence of benchmark records. The last_benchmarks method now returns all current benchmarks without date filtering, since the unique index on (agent_id, hash_type, device) ensures only one benchmark exists per agent/hash_type/device combination. This change reflects the migration from mutable timestamp-based uniqueness to immutable natural keys.

Agent Benchmark Control Testing:

The agent API includes a required benchmarks_needed boolean field that controls whether agents run benchmarks during startup and configuration reload workflows. This represents a significant behavior change: agents can skip benchmark execution when the server already has valid benchmark results on file. Tests should verify:

Configuration mapping: Tests in lib/agentClient_test.go verify that the benchmarks_needed field is correctly mapped from the API response to the agent's internal configuration struct. The TestMapConfiguration suite includes test cases with benchmarks_needed set to both true and false, ensuring proper handling in all pointer combinations (nil, non-nil, mixed).
Agent startup behavior: When benchmarks_needed=false, agents should log "Server reports valid benchmarks on file, skipping benchmark run", set the BenchmarksSubmitted flag to true, and skip the benchmark execution entirely. When benchmarks_needed=true, agents should proceed with normal benchmark execution.
Configuration reload behavior: During server-initiated reloads, agents should check the benchmarks_needed flag before re-running benchmarks. When false, agents should skip the re-run and log "Server reports valid benchmarks on file, skipping benchmark re-run".
Benchmark manager integration: Tests should verify that the benchmark manager's UpdateBenchmarks method is only called when benchmarks_needed=true, and that the agent state is updated correctly in both paths.

This pattern ensures that agents respect server-side benchmark cache validity and avoid unnecessary benchmark execution, improving startup time and reducing resource usage when valid benchmarks already exist.

Array Length Validation Testing for DoS Prevention:

CipherSwarm implements array length validations to prevent denial-of-service (DoS) attacks through unbounded array payloads. These validations are tested comprehensively in model specs and backed by both Rails model validations and OpenAPI schema constraints (maxItems/minItems) for defense in depth. The test suite includes 11 new specs covering:

Agent model (spec/models/agent_spec.rb): 3 tests for the devices array validation
- Accepts up to 64 devices
- Rejects more than 64 devices with error message "must have at most 64 entries"
- Accepts empty arrays
- Implementation: Custom validation devices_length_within_limit enforces maximum 64 entries
HashcatStatus model (spec/models/hashcat_status_spec.rb): 8 tests for multiple array length validations
- Fixed-length arrays (exact length required since hashcat always emits exactly 2 values):
  - progress array must have exactly 2 entries
  - recovered_hashes array must have exactly 2 entries
  - recovered_salts array must have exactly 2 entries
- Variable-length array:
  - device_statuses array limited to maximum 64 entries
- Tests verify both valid cases (correct length) and invalid cases (too few, too many, or nil values)
- Implementation: Custom validations array_lengths_within_limits and device_statuses_count_within_limit

These array length constraints are mirrored in the OpenAPI specification (spec/swagger_helper.rb and swagger/v1/swagger.json) with maxItems: 64 for variable-length arrays and minItems: 2, maxItems: 2 for fixed-length arrays. This dual-layer approach ensures API clients receive schema validation errors before reaching the Rails layer, while model validations provide a second line of defense against malformed data. The test suite demonstrates best practices for security-focused validation testing, ensuring both data integrity and DoS prevention.

Test Data Isolation Best Practices:

To prevent test mutation and flakiness, tests should generate fresh test data for each test invocation rather than sharing mutable global state. The benchmark cache tests (lib/benchmark/cache_test.go) demonstrate this pattern:

Problem: Global variables containing test data (e.g., sampleBenchmarkResults) can be mutated during test execution, causing state to leak between subtests and leading to flaky test failures.
Solution: Replace global test data with helper functions that return fresh copies for each invocation. For example, newSampleBenchmarkResults() returns a new slice of benchmark results each time it's called, preventing the Submitted flag from leaking across subtests.
Pattern: Use helper functions prefixed with new (e.g., newSampleBenchmarkResults(), newTestConfig()) to create fresh test fixtures. These functions should be called at the start of each test or subtest to ensure isolation.

This pattern is particularly important when testing stateful operations such as cache submission, where flags or fields may be modified during test execution. Applying this pattern prevents inter-test dependencies and ensures that tests pass consistently regardless of execution order.

Component specs use Pagy::Offset for pagination testing (e.g., Pagy::Offset.new(count: items.size, page: 1, limit: 10)) to match the offset-based pagination implementation used in production.

For E2E tests, the environment is seeded with known users, projects, and resources to ensure deterministic results. Test data management includes standardized roles, predictable project/campaign/resource data, and mechanisms to switch between mock and real data.

System Test Helpers for Tom Select:

The Page Object Pattern includes a helper method for Tom Select dropdowns: tom_select_fill_and_choose(select_id, text) (defined in spec/support/page_objects/base_page.rb). This helper interacts with Tom Select dropdowns by clicking to open, typing to filter, and selecting a match. The helper requires the dropdown_input plugin and is used in system tests for hash list creation and other forms with searchable dropdowns (e.g., hash_list_form_page.rb). This demonstrates best practices for abstracting JavaScript component interactions in system tests using the Page Object Pattern.

System Test Refactoring#

System test refactoring in CipherSwarm focuses on improving maintainability, reliability, and coverage. Refactoring efforts include migrating to SSR session-based authentication for realistic E2E flows, consolidating duplicate test files, enhancing test isolation, and expanding negative and edge case coverage. The test coverage plan documents ongoing and planned improvements, including filling gaps in authentication, user management, access control, and real-time features.

RSwag 3.0.0.pre Migration#

CipherSwarm upgraded to rswag 3.0.0.pre to support OpenAPI 3.0 native features. This pre-release version required custom polyfills and compatibility bridges:

request_body_json helper: A polyfill implemented in spec/support/rswag_polyfills.rb to provide a clean DSL for defining request bodies in RSwag specs. Call inside HTTP method blocks (post, put, patch) with schema:, required:, description:, and examples: parameters. The helper wraps rswag's internal consumes + parameter in: :body mechanism; the formatter converts this to valid OpenAPI 3.0 requestBody output. The helper includes a guard that raises an error if called at the path level (outside an HTTP method block), preventing misuse.
LetFallbackHash bridge class: Maintains compatibility with rswag 2.x parameter resolution behavior. rswag 3.x resolves parameters via params.fetch(name) from request_params, while 2.x resolved directly from let blocks via example.send(name). The bridge falls back to let blocks when a parameter is not found in request_params, ensuring existing specs continue to work. Implemented in spec/support/rswag_polyfills.rb.
openapi_helper.rb compatibility shim: A new file (spec/openapi_helper.rb) provides backward compatibility for rswag 3.0.0.pre, which loads openapi_helper by default. This shim delegates to spec/swagger_helper.rb to maintain compatibility with existing specs and CI scripts.
Validation changes: The openapi_strict_schema_validation option has been removed. It is replaced by openapi_no_additional_properties (enabled) and openapi_all_properties_required (enabled) in spec/swagger_helper.rb. Vacuum linter (just lint-api) provides document-level OpenAPI validation.
Version guard: A version guard in spec/support/rswag_polyfills.rb checks that rswag-specs is exactly version 3.0.0.pre. The guard ensures the monkey-patch fails loudly if rswag is upgraded, prompting removal of polyfills and verification of native support.

The polyfills are implemented in spec/support/rswag_polyfills.rb (loaded by spec/swagger_helper.rb) and are intended as temporary bridges until stable rswag 3.x releases include native support for the DSL patterns used in CipherSwarm's API specs. The version guard ensures these polyfills are only used with the pre-release version and will be removed when rswag 3.0.0 stable is released.

CI Workflow Integration#

CipherSwarm uses GitHub Actions for CI. The workflow (.github/workflows/CI.yml) runs on pull requests and pushes to main and develop branches. It sets up the environment, installs dependencies, prepares the database, and executes backend and frontend tests. The CI pipeline includes a dedicated lint_api job that validates the OpenAPI specification using vacuum, ensuring that the RSwag-generated swagger.json conforms to OpenAPI 3.0 standards and preventing API documentation drift. Artifacts such as test results and screenshots from failed system tests are uploaded for debugging. Coverage results are sent to Code Climate if configured. The CI pipeline enforces quality gates, requiring all critical path tests to pass before release (CI workflow).

Importance of Testing in the Development Lifecycle#

Testing is central to CipherSwarm’s development process. It ensures feature correctness, security, and reliability, and supports rapid iteration without regressions. The architecture emphasizes test isolation, reproducibility, and coverage of real-world workflows. Automated tests in CI provide immediate feedback, enforce quality standards, and enable safe, efficient releases. Regular refactoring and maintenance of the test suite are prioritized to keep pace with evolving features and to ensure long-term project health.

For further details on test plans, priorities, and implementation strategies, see the E2E test coverage plan.