Three-Tier Testing Architecture in CipherSwarm#
CipherSwarm employs a comprehensive three-tier testing architecture to ensure reliability, security, and maintainability throughout its development lifecycle. The tiers are: backend (unit/integration), frontend (component/UI), and end-to-end (E2E) tests. Each tier is organized for clarity, coverage, and automation, with integration into continuous integration (CI) workflows.
Test Organization#
Test Organization#
Foreign Key Cascade and Nullify Strategy:
- CipherSwarm enforces referential integrity using database-level
on_delete: :cascadeandon_delete: :nullifyrules for foreign keys, especially for ephemeral tables (telemetry, statuses) and parent-child relationships. - This approach ensures that child records are automatically deleted or nullified when their parent is removed, regardless of whether deletion occurs via Rails or directly in the database. It prevents orphaned rows and foreign key violations, particularly when using bulk deletes (
delete_all) or DB-level cascades that bypass Rails callbacks. - Ephemeral child tables (e.g.,
DeviceStatus,AgentError,HashcatBenchmark) are set to cascade with their parent. Join tables and parent-child resources (e.g.,ProjectUser,HashItem,HashList) also use cascade rules for consistency and defense-in-depth. - When a table has multiple foreign keys to the same parent, the
column:option is always specified explicitly in migration helpers to avoid ambiguity.
DB Constraint Testing Best Practices:
- When testing FK cascade/nullify behavior, always use
delete(notdestroy) to ensure that Rails callbacks do not mask missing or incorrect constraints. This verifies that the database itself enforces the intended referential integrity. - Model and job specs include explicit tests for cascade deletion and nullification, such as verifying that
DeviceStatusrecords are removed when their parentHashcatStatusis deleted by a cleanup job, and thatHashcatBenchmarkrecords are removed when their parentAgentis deleted. - Tests verify correct uniqueness constraints. For example, the
HashcatBenchmarkmodel uses(agent_id, hash_type, device)as the natural unique key, replacing the previous(agent_id, benchmark_date, hash_type)index that included a mutable timestamp and prevented proper deduplication. This demonstrates the importance of testing that uniqueness constraints match the domain model's natural keys. - Migrations that tighten constraints on existing data use deduplication patterns such as
DELETE ... WHERE id NOT IN (SELECT DISTINCT ON ...)to clean up duplicates before applying stricter unique indexes. TheChangeHashcatBenchmarksUniqueIndexmigration demonstrates this pattern, usingDISTINCT ON (agent_id, hash_type, device)to select the latest row (bybenchmark_date DESC, id DESC) before applying the new unique index. - This strategy is documented in both migration comments and developer guidance to ensure consistent application and testing of FK rules across the codebase.
Backend Tests#
Backend tests include model, service object, component, and request specs. The suite now includes:
- Expanded request specs for API endpoints, authorization, caching, and Turbo Stream UI updates (spec/requests/)
- Component specs for all major UI components (spec/components/)
- Service specs for business logic and system health (spec/services/)
- Job specs with comprehensive coverage of atomic operations, race conditions, error rollback, partial failure recovery, and idempotency patterns (spec/jobs/)
- Coverage verification specs (spec/coverage/coverage_verification_spec.rb) that enforce the presence of tests for all controllers, components, services, and system flows
- Deployment validation specs (spec/deployment/air_gapped_checklist_spec.rb) that verify air-gapped readiness, asset pipeline integrity, and offline documentation
- Performance specs (spec/performance/page_load_performance_spec.rb) that enforce page load and query count SLAs
- OpenAPI specification validation using vacuum to ensure RSwag-generated API documentation conforms to OpenAPI 3.0 standards
These tests ensure correctness of business logic, UI rendering, API behavior, and system health, including campaign progress bars, error modals, Turbo Stream updates, and real-time monitoring. The backend suite also verifies database constraint and cascade/nullify behavior, as well as coverage and deployment requirements.
Authorization Testing Best Practices:
Authorization tests verify that access control rules are properly enforced across controllers and concerns. When writing authorization tests, follow these best practices:
-
Use correct HTTP status codes based on the failure type:
- HTTP 401 Unauthorized: For authentication failures (user not logged in, invalid credentials, no session)
- HTTP 403 Forbidden: For authorization failures (user authenticated but lacks permissions to access the resource)
- When
CanCan::AccessDeniedis raised, the application returns HTTP 403 Forbidden. Tests must expect:forbidden(403) status, not:unauthorized(401)
-
Test Turbo Frame-aware error handling:
- Authorization tests should verify that
turbo_frame_request?is checked when handlingCanCan::AccessDenied - When a request is made via Turbo Frame (header
Turbo-Framepresent), the response should render theerrors/_not_authorized_frame.html.erbpartial within the frame to prevent perpetual "Loading..." states - Verify that the response includes HTTP 403 Forbidden status, the frame ID from
request.headers["Turbo-Frame"], and appropriate error messaging using the i18n keyerrors.not_authorized - This pattern ensures that Turbo Frame requests receive a properly formed
<turbo-frame>tag with error content, rather than a full-page template that would cause stuck "Loading..." indicators
- Authorization tests should verify that
-
Test Downloadable concern authorization:
- The Downloadable concern (
app/controllers/concerns/downloadable.rb) enforces authorization on all three actions:download,view_file, andview_file_content. Each action callsauthorize!to verify CanCanCan permissions before proceeding - Tests should verify that unauthorized access attempts return HTTP 403 Forbidden for all three methods
- Test coverage should include: project members with access (expect success), users outside the project (expect
:forbidden), sensitive vs. public resources (based on ability definitions), admin access (expect success), and unauthenticated users (expect redirect to login) - When testing Turbo Frame requests to Downloadable actions, verify that the response renders the
errors/_not_authorized_frame.html.erbpartial with HTTP 403 status - Example resources using Downloadable: WordLists, RuleLists, MaskLists
- The Downloadable concern (
These patterns ensure that authorization is consistently tested and that error responses provide appropriate feedback to users in both full-page and Turbo Frame contexts.
Atomic Operation and Job Idempotency Testing:
The test suite includes comprehensive coverage of atomic lock patterns and job idempotency, particularly in ProcessHashListJob (~60 RSpec examples). These tests verify:
- Atomic lock behavior using
UPDATE ... WHERE processed=falseto prevent duplicate processing whenafter_commitfires multiple times - Race condition handling when multiple jobs attempt to claim the same work
- Error rollback and retry scenarios, ensuring the
processedflag is reset on failure - Partial failure recovery, where incomplete hash item insertions are cleaned up before retry
- Idempotent batch processing with graceful handling of record deletion during processing
The benchmark submission endpoint demonstrates idempotent upsert patterns using upsert_all with unique_by: [:agent_id, :hash_type, :device]. This ensures that re-submissions update existing rows instead of creating duplicates, and allows multi-batch submissions to accumulate rows without duplication. Tests verify partial-success scenarios where invalid entries are filtered out before upsert, and all-invalid payloads return 422 without changing agent state. The pattern includes pre-validation filtering via build_valid_benchmark_records to ensure only valid data reaches the database operation.
This testing pattern demonstrates best practices for background jobs and API endpoints that require exactly-once processing semantics and is recommended for similar atomic operation scenarios across the codebase.
OpenAPI Documentation Testing:
CipherSwarm uses RSwag to generate OpenAPI 3.0 documentation from request specs, and vacuum to lint the generated specification. This validation is enforced in the CI pipeline via the lint_api GitHub Actions job, which runs after scan_ruby and validates that:
- The swagger/v1/swagger.json file conforms to OpenAPI 3.0 standards
- API documentation stays synchronized with implementation
- RSwag specs are properly formatted for OpenAPI 3.0 requestBody generation
- PRs must pass vacuum linting before merging
The vacuum linter validates the generated swagger.json file and reports quality scores, preventing API documentation drift. When writing RSwag specs, use the request_body_json helper (defined inside the HTTP method block: post, put, patch) to specify request bodies. The helper accepts schema:, required:, description:, and examples: parameters. The custom ruleset in vacuum-ruleset.yaml disables rules that conflict with Rails conventions (snake_case properties, underscore paths).
System Tests (UI/UX)#
System tests now cover all major user workflows, including:
- Agent fleet monitoring (spec/system/agent_fleet_monitoring_spec.rb)
- Campaign progress monitoring (spec/system/campaign_progress_monitoring_spec.rb)
- Task management and detail investigation (spec/system/tasks_spec.rb, spec/system/task_detail_investigation_spec.rb)
- Error investigation and modal flows (spec/system/error_investigation_spec.rb)
- System health dashboard and diagnostics (spec/system/system_health_spec.rb)
- Campaign creation workflow (spec/system/campaign_creation_workflow_spec.rb)
- Loading and feedback patterns (spec/system/loading_feedback_patterns_spec.rb)
System tests use the Page Object Pattern for maintainability and include accessibility/ARIA checks. The suite is validated by coverage verification specs to ensure all critical flows are tested. System tests are run locally and can be excluded from CI as needed (see skip: ENV["CI"].present?).
Frontend Tests#
Frontend tests now include:
- JavaScript unit/component tests using Vitest (spec/javascript/), covering all Stimulus controllers (e.g., health_refresh_controller.test.js, tabs_controller.test.js, toast_controller.test.js, select_controller.test.js)
- Coverage verification specs to ensure all controllers have corresponding JS tests
- Playwright E2E tests for full browser workflows (frontend/e2e/)
Vitest tests are run with just test-js or yarn test:js and are integrated into CI. The test suite ensures all UI logic and controller behaviors are covered, including Turbo Stream and real-time update patterns.
Tom Select Integration Testing:
The select_controller.js (app/javascript/controllers/select_controller.js) provides searchable dropdown functionality using the Tom Select library. The controller is tested with Vitest (spec/javascript/controllers/select_controller.test.js, 8 tests) covering:
- Initialization and TomSelect instance creation
- Configuration values (
allowEmpty,maxOptions) with data attributes - Disconnect cleanup and duplicate connect prevention
- Custom attribute handling (e.g.,
data-select-allow-empty-value,data-select-max-options-value) - Error handling for initialization failures with retry prevention
This controller is used for the hash type dropdown, displaying hashcat mode ID with names (e.g., "0 - MD5") with searchable filtering. The test suite demonstrates best practices for testing Stimulus controllers that wrap third-party JavaScript libraries, including mocking external dependencies (TomSelect) and verifying lifecycle behavior (connect/disconnect).
End-to-End (E2E) Tests#
E2E tests use Playwright to drive the application through real browser sessions against a Dockerized backend. The suite is validated by coverage verification specs to ensure all critical user workflows are covered, including authentication, dashboard, campaign management, agent monitoring, system health, and error handling. E2E tests are run against a seeded test environment and are tracked in the coverage verification and test coverage plan.
Test Coverage#
CipherSwarm targets 100% coverage of user-visible workflows, all user roles (admin, project admin, user), and major browsers (Chromium, Firefox, WebKit) across desktop and mobile viewports. The test plan is divided into phases, prioritizing authentication, dashboard, campaign creation, attack configuration, and resource management first, followed by advanced features, integration, and performance validation. With the addition of campaign progress monitoring, error modals, and recent cracks, coverage now includes real-time progress bars, ETA summaries, error handling flows, and accessibility/ARIA labeling for all campaign and attack monitoring features. Component and system specs ensure all UI states and edge cases are tested. Coverage gaps and priorities are tracked in the E2E test coverage plan.
Test Execution Commands#
Backend#
To run backend tests (RSpec):
just test
To run all tests (RSpec + JavaScript):
just test-all
To run only JavaScript tests:
just test-js
To run only component specs (including CampaignProgressComponent and ErrorModalComponent):
bundle exec rspec spec/components/
To run only system tests (including campaign progress monitoring, agent fleet monitoring, task management, error investigation, system health, campaign creation, and loading/feedback patterns):
bundle exec rspec spec/system/
To run only request specs (including API, authorization, caching, and Turbo Stream updates):
bundle exec rspec spec/requests/
To run coverage verification, deployment, and performance specs:
bundle exec rspec spec/coverage/ spec/deployment/ spec/performance/
To lint the OpenAPI specification locally:
just lint-api
This command validates that the generated OpenAPI specification (swagger/v1/swagger.json) complies with OpenAPI 3.0 standards using vacuum. The same validation is automatically enforced in CI via the lint_api job. The custom ruleset in vacuum-ruleset.yaml disables rules that conflict with Rails conventions (snake_case properties, underscore paths).
In CI, tests are executed with:
bin/bundle exec rspec --profile 10 --format RspecJunitFormatter --out /tmp/test-results/rspec.xml --format progress
See the justfile for additional quality and test commands.
Frontend and E2E#
Frontend and E2E tests are run using Playwright and Vitest. Example commands:
To run Playwright E2E tests:
npx playwright test
To run JavaScript unit/component tests (Stimulus controllers, UI logic):
yarn test:js
or
just test-js
To run only a specific JS controller test:
yarn vitest run spec/javascript/controllers/health_refresh_controller.test.js
Tests can be targeted to specific files or directories as needed. Vitest configuration is in vitest.config.js. Coverage verification specs ensure all controllers and E2E flows are tested.
Test Fixtures#
CipherSwarm uses FactoryBot for generating test data and DatabaseCleaner for maintaining a clean state between tests. The test suite includes:
- Coverage verification specs (spec/coverage/coverage_verification_spec.rb) that enforce the presence of tests for all controllers, components, services, and system flows
- Deployment validation specs (spec/deployment/air_gapped_checklist_spec.rb) that verify air-gapped readiness, asset pipeline integrity, and offline documentation
- Performance specs (spec/performance/page_load_performance_spec.rb) that enforce page load and query count SLAs
Test factories and model specs are updated to reflect the new database-level foreign key on_delete: :cascade and on_delete: :nullify rules. Factories for ephemeral and parent-child tables (e.g., DeviceStatus, AgentError, HashcatBenchmark, HashItem, ProjectUser) now match the DB schema, and tests verify correct cascade/nullify behavior. This ensures that test data and cleanup accurately represent production referential integrity.
The test suite includes comprehensive coverage of model concerns, such as Agent::Benchmarking. The benchmarking? method determines when an agent is actively running initial benchmarks based on pending state, recent activity, and absence of benchmark records. The last_benchmarks method now returns all current benchmarks without date filtering, since the unique index on (agent_id, hash_type, device) ensures only one benchmark exists per agent/hash_type/device combination. This change reflects the migration from mutable timestamp-based uniqueness to immutable natural keys.
Agent Benchmark Control Testing:
The agent API includes a required benchmarks_needed boolean field that controls whether agents run benchmarks during startup and configuration reload workflows. This represents a significant behavior change: agents can skip benchmark execution when the server already has valid benchmark results on file. Tests should verify:
- Configuration mapping: Tests in
lib/agentClient_test.goverify that thebenchmarks_neededfield is correctly mapped from the API response to the agent's internal configuration struct. TheTestMapConfigurationsuite includes test cases withbenchmarks_neededset to bothtrueandfalse, ensuring proper handling in all pointer combinations (nil, non-nil, mixed). - Agent startup behavior: When
benchmarks_needed=false, agents should log "Server reports valid benchmarks on file, skipping benchmark run", set theBenchmarksSubmittedflag to true, and skip the benchmark execution entirely. Whenbenchmarks_needed=true, agents should proceed with normal benchmark execution. - Configuration reload behavior: During server-initiated reloads, agents should check the
benchmarks_neededflag before re-running benchmarks. Whenfalse, agents should skip the re-run and log "Server reports valid benchmarks on file, skipping benchmark re-run". - Benchmark manager integration: Tests should verify that the benchmark manager's
UpdateBenchmarksmethod is only called whenbenchmarks_needed=true, and that the agent state is updated correctly in both paths.
This pattern ensures that agents respect server-side benchmark cache validity and avoid unnecessary benchmark execution, improving startup time and reducing resource usage when valid benchmarks already exist.
Array Length Validation Testing for DoS Prevention:
CipherSwarm implements array length validations to prevent denial-of-service (DoS) attacks through unbounded array payloads. These validations are tested comprehensively in model specs and backed by both Rails model validations and OpenAPI schema constraints (maxItems/minItems) for defense in depth. The test suite includes 11 new specs covering:
-
Agent model (
spec/models/agent_spec.rb): 3 tests for thedevicesarray validation- Accepts up to 64 devices
- Rejects more than 64 devices with error message "must have at most 64 entries"
- Accepts empty arrays
- Implementation: Custom validation
devices_length_within_limitenforces maximum 64 entries
-
HashcatStatus model (
spec/models/hashcat_status_spec.rb): 8 tests for multiple array length validations- Fixed-length arrays (exact length required since hashcat always emits exactly 2 values):
progressarray must have exactly 2 entriesrecovered_hashesarray must have exactly 2 entriesrecovered_saltsarray must have exactly 2 entries
- Variable-length array:
device_statusesarray limited to maximum 64 entries
- Tests verify both valid cases (correct length) and invalid cases (too few, too many, or nil values)
- Implementation: Custom validations
array_lengths_within_limitsanddevice_statuses_count_within_limit
- Fixed-length arrays (exact length required since hashcat always emits exactly 2 values):
These array length constraints are mirrored in the OpenAPI specification (spec/swagger_helper.rb and swagger/v1/swagger.json) with maxItems: 64 for variable-length arrays and minItems: 2, maxItems: 2 for fixed-length arrays. This dual-layer approach ensures API clients receive schema validation errors before reaching the Rails layer, while model validations provide a second line of defense against malformed data. The test suite demonstrates best practices for security-focused validation testing, ensuring both data integrity and DoS prevention.
Test Data Isolation Best Practices:
To prevent test mutation and flakiness, tests should generate fresh test data for each test invocation rather than sharing mutable global state. The benchmark cache tests (lib/benchmark/cache_test.go) demonstrate this pattern:
- Problem: Global variables containing test data (e.g.,
sampleBenchmarkResults) can be mutated during test execution, causing state to leak between subtests and leading to flaky test failures. - Solution: Replace global test data with helper functions that return fresh copies for each invocation. For example,
newSampleBenchmarkResults()returns a new slice of benchmark results each time it's called, preventing theSubmittedflag from leaking across subtests. - Pattern: Use helper functions prefixed with
new(e.g.,newSampleBenchmarkResults(),newTestConfig()) to create fresh test fixtures. These functions should be called at the start of each test or subtest to ensure isolation.
This pattern is particularly important when testing stateful operations such as cache submission, where flags or fields may be modified during test execution. Applying this pattern prevents inter-test dependencies and ensures that tests pass consistently regardless of execution order.
Component specs use Pagy::Offset for pagination testing (e.g., Pagy::Offset.new(count: items.size, page: 1, limit: 10)) to match the offset-based pagination implementation used in production.
For E2E tests, the environment is seeded with known users, projects, and resources to ensure deterministic results. Test data management includes standardized roles, predictable project/campaign/resource data, and mechanisms to switch between mock and real data.
System Test Helpers for Tom Select:
The Page Object Pattern includes a helper method for Tom Select dropdowns: tom_select_fill_and_choose(select_id, text) (defined in spec/support/page_objects/base_page.rb). This helper interacts with Tom Select dropdowns by clicking to open, typing to filter, and selecting a match. The helper requires the dropdown_input plugin and is used in system tests for hash list creation and other forms with searchable dropdowns (e.g., hash_list_form_page.rb). This demonstrates best practices for abstracting JavaScript component interactions in system tests using the Page Object Pattern.
System Test Refactoring#
System test refactoring in CipherSwarm focuses on improving maintainability, reliability, and coverage. Refactoring efforts include migrating to SSR session-based authentication for realistic E2E flows, consolidating duplicate test files, enhancing test isolation, and expanding negative and edge case coverage. The test coverage plan documents ongoing and planned improvements, including filling gaps in authentication, user management, access control, and real-time features.
RSwag 3.0.0.pre Migration#
CipherSwarm upgraded to rswag 3.0.0.pre to support OpenAPI 3.0 native features. This pre-release version required custom polyfills and compatibility bridges:
request_body_jsonhelper: A polyfill implemented inspec/support/rswag_polyfills.rbto provide a clean DSL for defining request bodies in RSwag specs. Call inside HTTP method blocks (post, put, patch) withschema:,required:,description:, andexamples:parameters. The helper wraps rswag's internalconsumes+parameter in: :bodymechanism; the formatter converts this to valid OpenAPI 3.0requestBodyoutput. The helper includes a guard that raises an error if called at the path level (outside an HTTP method block), preventing misuse.LetFallbackHashbridge class: Maintains compatibility with rswag 2.x parameter resolution behavior. rswag 3.x resolves parameters viaparams.fetch(name)fromrequest_params, while 2.x resolved directly fromletblocks viaexample.send(name). The bridge falls back toletblocks when a parameter is not found inrequest_params, ensuring existing specs continue to work. Implemented inspec/support/rswag_polyfills.rb.openapi_helper.rbcompatibility shim: A new file (spec/openapi_helper.rb) provides backward compatibility for rswag 3.0.0.pre, which loadsopenapi_helperby default. This shim delegates tospec/swagger_helper.rbto maintain compatibility with existing specs and CI scripts.- Validation changes: The
openapi_strict_schema_validationoption has been removed. It is replaced byopenapi_no_additional_properties(enabled) andopenapi_all_properties_required(enabled) inspec/swagger_helper.rb. Vacuum linter (just lint-api) provides document-level OpenAPI validation. - Version guard: A version guard in
spec/support/rswag_polyfills.rbchecks that rswag-specs is exactly version 3.0.0.pre. The guard ensures the monkey-patch fails loudly if rswag is upgraded, prompting removal of polyfills and verification of native support.
The polyfills are implemented in spec/support/rswag_polyfills.rb (loaded by spec/swagger_helper.rb) and are intended as temporary bridges until stable rswag 3.x releases include native support for the DSL patterns used in CipherSwarm's API specs. The version guard ensures these polyfills are only used with the pre-release version and will be removed when rswag 3.0.0 stable is released.
CI Workflow Integration#
CipherSwarm uses GitHub Actions for CI. The workflow (.github/workflows/CI.yml) runs on pull requests and pushes to main and develop branches. It sets up the environment, installs dependencies, prepares the database, and executes backend and frontend tests. The CI pipeline includes a dedicated lint_api job that validates the OpenAPI specification using vacuum, ensuring that the RSwag-generated swagger.json conforms to OpenAPI 3.0 standards and preventing API documentation drift. Artifacts such as test results and screenshots from failed system tests are uploaded for debugging. Coverage results are sent to Code Climate if configured. The CI pipeline enforces quality gates, requiring all critical path tests to pass before release (CI workflow).
Importance of Testing in the Development Lifecycle#
Testing is central to CipherSwarm’s development process. It ensures feature correctness, security, and reliability, and supports rapid iteration without regressions. The architecture emphasizes test isolation, reproducibility, and coverage of real-world workflows. Automated tests in CI provide immediate feedback, enforce quality standards, and enable safe, efficient releases. Regular refactoring and maintenance of the test suite are prioritized to keep pace with evolving features and to ensure long-term project health.
For further details on test plans, priorities, and implementation strategies, see the E2E test coverage plan.