Testing Strategy and Documentation#
CipherSwarm uses a comprehensive, multi-layered testing strategy to ensure reliability, security, and maintainability across all components. This guide covers the testing architecture, organization, conventions, and best practices for both backend and frontend code, as well as documentation standards.
Three-Tier Testing Architecture#
CipherSwarm's tests are organized into three primary layers:
-
Layer 1: Backend Tests
- Ruby on Rails (RSpec) for models, controllers, services, and API endpoints
- Python (pytest) for supporting backend services (where applicable)
- Database constraint and foreign key cascade/nullify tests
-
Layer 2: Frontend Mocked Tests
- JavaScript unit tests (Vitest) for Stimulus controllers and UI logic
- Playwright tests with mocked APIs for isolated UI component validation
-
Layer 3: Full End-to-End (E2E) Tests
- Playwright E2E tests against a real Dockerized backend
- Simulate real user workflows, authentication, and resource flows
- Includes seeded test data, SSR authentication, and real object storage
See the Phase 3 E2E Test Coverage Plan for a detailed matrix of test categories and coverage.
Test Organization#
-
Backend (Rails/RSpec):
- Located in
spec/(e.g.,spec/models/,spec/requests/,spec/services/,spec/system/) - Organized by resource or feature (e.g.,
spec/requests/agents_spec.rb) - System tests cover full user workflows and UI integration
- Located in
-
Frontend/JavaScript:
- Stimulus controller tests in
spec/javascript/controllers/ - Uses Vitest and jsdom for DOM simulation
- Playwright E2E and component tests in
/frontend/e2e/
- Stimulus controller tests in
-
Python Backend:
- Integration and unit tests in
/tests/unitand/tests/integration - Uses pytest, httpx, and testcontainers
- Integration and unit tests in
-
Test Documentation:
- Coverage plans and status in Markdown under
docs/(see Phase 3 E2E Test Coverage Plan)
- Coverage plans and status in Markdown under
Running Tests#
-
Backend (RSpec):
- Run with
bin/bundle exec rspec - CI uses this command with additional formatters for reporting
- System tests involving tus file uploads automatically start a tusd Docker container via testcontainers
- Run with
-
Python:
- Run with
pytest
- Run with
-
Frontend/E2E (Playwright):
- Run with Playwright CLI, targeting either mocked APIs or a real backend
-
JavaScript (Vitest):
- Run with
bun test:jsornpx vitest run
- Run with
-
Full Suite:
- CI orchestrates all tiers, including environment setup, asset compilation, and test execution
Testing Infrastructure Requirements#
Testcontainers for tusd:
- System tests requiring tus file uploads use the
testcontainersgem to automatically start a tusd Docker container - This eliminates manual Docker container setup for tusd testing
- The testcontainers approach ensures consistent test environments across development machines and CI/CD pipelines
- Requires Docker to be available on the test machine
- The tusd container is automatically started on first use and cleaned up when the test suite exits
Writing Tests: Guidance and Conventions#
Backend (RSpec)#
- Use
describefor features,contextfor scenarios, anditfor expectations - Set up test data with FactoryBot
- Cover authentication, authorization, and all HTTP endpoints
- Example:
describe "#index" do context "when a non-logged in user tries to access the agents index" do it "redirects to login page" do get agents_path expect(response).to redirect_to(new_user_session_path) end end end
Testing Array Length Validations#
Array length validations enforce security constraints (DoS prevention through bounded array inputs) and should be tested with the same rigor as authentication and authorization. Test both valid and invalid cases, including edge cases at boundaries.
Maximum Array Length Validations#
For variable-length arrays with maximum limits (e.g., devices array limited to 64 items):
describe "devices length" do
it "accepts up to 64 devices" do
agent.devices = Array.new(64) { |i| "GPU #{i}" }
expect(agent).to be_valid
end
it "rejects more than 64 devices" do
agent.devices = Array.new(65) { |i| "GPU #{i}" }
expect(agent).not_to be_valid
expect(agent.errors[:devices]).to include("must have at most 64 entries")
end
it "accepts an empty devices array" do
agent.devices = []
expect(agent).to be_valid
end
end
Exact Array Length Validations#
For fixed-length arrays (e.g., progress, recovered_hashes, recovered_salts must be exactly 2 elements):
describe "array length validations" do
let(:hashcat_status) { build(:hashcat_status) }
it "accepts progress with exactly 2 entries" do
hashcat_status.progress = [100, 10000]
expect(hashcat_status).to be_valid
end
it "rejects progress with more than 2 entries" do
hashcat_status.progress = [100, 10000, 999]
expect(hashcat_status).not_to be_valid
expect(hashcat_status.errors[:progress]).to include("must have exactly 2 entries")
end
it "rejects progress with only 1 entry" do
hashcat_status.progress = [100]
expect(hashcat_status).not_to be_valid
expect(hashcat_status.errors[:progress]).to include("must have exactly 2 entries")
end
it "accepts nil progress" do
hashcat_status.progress = nil
expect(hashcat_status.errors[:progress]).to be_empty
end
end
Association Count Validations#
For associations with count limits (e.g., device_statuses limited to 64 associated records):
it "rejects more than 64 device_statuses" do
hashcat_status.save!
hashcat_status.device_statuses.clear
65.times { |i| hashcat_status.device_statuses.build(device_id: i, device_name: "GPU #{i}", device_type: "GPU", speed: 1000, utilization: 50, temperature: 60) }
expect(hashcat_status).not_to be_valid
expect(hashcat_status.errors[:device_statuses]).to include("must have at most 64 entries")
end
it "accepts up to 64 device_statuses" do
hashcat_status.save!
hashcat_status.device_statuses.clear
64.times { |i| hashcat_status.device_statuses.build(device_id: i, device_name: "GPU #{i}", device_type: "GPU", speed: 1000, utilization: 50, temperature: 60) }
expect(hashcat_status).to be_valid
end
Comprehensive Coverage Example:
- Agent model: 3 test specs for
devicesarray validation (max 64 items) - HashcatStatus model: 8 test specs for array validations
- 4 tests for
progress(exactly 2 elements) - 1 test each for
recovered_hashesandrecovered_salts(exactly 2 elements) - 2 tests for
device_statusesassociation count (max 64 items)
- 4 tests for
These validations back OpenAPI maxItems and minItems constraints, providing defense in depth against unbounded array payloads.
OpenAPI Schema Constraints#
The OpenAPI schema (docs/swagger.json) enforces array size constraints at the API layer:
- devices —
maxItems: 64(agent configuration and status) - device_statuses —
maxItems: 64(hashcat status submission) - progress, recovered_hashes, recovered_salts —
minItems: 2, maxItems: 2(exactly 2 elements)
These constraints provide the first line of defense against DoS attacks via unbounded array payloads. The API layer validates incoming requests before they reach model validations, ensuring consistent enforcement across all API clients.
Database Foreign Key Cascade/Nullify Testing#
- Prefer DB-level
on_delete: :cascade/:nullifyfor ephemeral and parent-child tables - Test DB-level FK cascades using
delete, notdestroy(to avoid masking by Rails callbacks) - Always specify
column:when a table has multiple FKs to the same parent - Example:
it "cascade-deletes associated DeviceStatus when HashcatStatus is deleted" do device_status = create(:device_status) hashcat_status = device_status.hashcat_status hashcat_status.delete expect(DeviceStatus.exists?(device_status.id)).to be false end - See AGENTS.md for DB constraint strategy and additional Go agent testing patterns
Go Agent Testing#
- Extract business logic from external process invocations (e.g., hashcat) into separate functions for independent testing
- Implemented Example: The
processBenchmarkOutputfunction in thelib/benchmark/package separates business logic (parsing output, tracking submission state, incremental batch submission) from external process handling (hashcat session management). This extraction enabled comprehensive table-driven testing without requiring actual hashcat execution. ThedrainStdouthelper function inlib/benchmark/parse.gofurther demonstrates separation of concerns for buffered channel handling.
- Implemented Example: The
- Use Manager structs with constructor injection for testability (e.g.,
benchmark.Manager,task.Manager) - Write table-driven tests for core logic
- Use mocks for network and OS-level interactions
- Test Data Isolation: Prevent test mutation leakage by using helper functions that return fresh data copies rather than shared test fixtures
- Pattern: Extract test fixtures into functions like
newSampleBenchmarkResults()instead of package-level variables - Example: Benchmark cache tests use
newSampleBenchmarkResults()to preventSubmittedflag mutations from affecting subsequent subtests - Each test invocation gets an independent slice, avoiding cross-contamination between test cases
- Pattern: Extract test fixtures into functions like
- Context Propagation: Context propagation throughout the error handling chain is complete. All error handling and reporting functions now require a
context.Contextas the first parameter to enable proper cancellation and timeout handling:cserrors.SendAgentError(ctx, client, errorType, message, severity)apierrors.LogAndSendError(ctx, errorSender, err, message, severity)task.HandleError(ctx, err, taskID, attemptNum, errorSender)- ~27 of 30
//nolint:contextcheckdirectives have been removed (only 3 remain forNewHashcatSession) - context.Background() pattern: For "must-complete" operations that should execute even after context cancellation (cleanup, shutdown notifications, task abandonment), code now explicitly passes
context.Background(). Examples:taskMgr.AbandonTask(context.Background(), t)in cleanup handlerscserrors.SendAgentShutdown(context.Background(), ...)for shutdown notificationscserrors.SendAgentError(context.Background(), ...)in timeout/kill failure paths
- Signal handling: The agent now uses
signal.NotifyContext+context.WithCancelinstead ofsignal.Notify+ channel patterns. Heartbeat state errors now callcancel()instead of sending on a signal channel, providing unified shutdown semantics across the codebase
- Agent Configuration and Benchmarking: The agent startup flow includes conditional benchmark execution based on the
benchmarks_neededflag from the server- Test scenarios should cover both
benchmarks_needed=true(agent must run benchmarks) andbenchmarks_needed=false(server has valid benchmarks) - Verify the
mapConfigurationfunction properly maps thebenchmarks_neededfield from API responses - Test both startup and reload paths to ensure the flag is respected in both scenarios
- Verify that when
benchmarks_needed=false, the agent logs the skip and setsBenchmarksSubmitted=truewithout running benchmarks
- Test scenarios should cover both
- Test Cleanup Patterns: Use
t.Cleanup(fn)instead ofdefer fn()for proper state restoration in table-driven tests.t.Cleanupruns after all subtests complete, preventing state leakage between test cases. All benchmark manager tests have been standardized to uset.Cleanupfor HTTP mock and state restoration. - See AGENTS.md and GOTCHAS.md for comprehensive Go development patterns, including:
- Performance optimizations (regex compilation, cached conversions, syscall reduction, cross-platform path handling with
filepath.Join) - Error handling best practices (context propagation,
file.Close()error logging) - Linter edge cases and development pitfalls (see GOTCHAS.md for comprehensive reference, including
hashcat.NewTestSessionlimitations and golangci-lint auto-fix behavior) - Dependency considerations (go-getter, gopsutil, govulncheck)
- Testing best practices (prefer
t.Cleanup(fn)overdefer fn(), context cancellation patterns)
- Performance optimizations (regex compilation, cached conversions, syscall reduction, cross-platform path handling with
Testing Stateful Operations with External Dependencies#
When testing operations that interact with external systems (APIs, processes) and maintain internal state:
-
Track per-item state to enable partial completion and retry
- Use boolean flags (e.g.,
Submittedfield ondisplay.BenchmarkResult) to track which items have been processed - Test helper functions independently:
unsubmittedResults— filters items by state flagallSubmitted— checks if all items are completemarkSubmitted— updates state flags after successful operations
- Use boolean flags (e.g.,
-
Test incremental batch operations by simulating multiple submission attempts
- Verify normal batching (e.g.,
benchmarkBatchSize = 10) - Test final batch submission with fewer items
- Simulate partial failures (first batch fails, retry succeeds)
- Verify cache persistence with state flags for crash recovery
- Test channel draining for buffered output (e.g.,
drainStdoutfor hashcat output)
- Verify normal batching (e.g.,
-
Implemented Example: The
processBenchmarkOutputimplementation inlib/benchmark/manager.godemonstrates this pattern with comprehensive test scenarios covering normal operation, context cancellation, and edge cases:- Normal operation tests (6 scenarios):
TestProcessBenchmarkOutput_AllBatchesSucceed— verifies multiple batches (10+5 items) submit successfully and all results marked asSubmittedTestProcessBenchmarkOutput_SingleBatch— confirms single batch submission (< 10 items) and proper state trackingTestProcessBenchmarkOutput_BatchFailsFinalSucceeds— validates retry logic when first batch fails but final submission succeedsTestProcessBenchmarkOutput_AllSendsFail— ensures no results marked submitted when all API calls fail, cache preserved for retryTestProcessBenchmarkOutput_EmptyResults— handles session completion with no outputTestProcessBenchmarkOutput_SessionError— verifies partial results cached even when hashcat session errors
- Context cancellation tests (3 scenarios):
TestProcessBenchmarkOutput_ContextCancelledWithResults— verifies partial results are cached to disk when context is cancelled mid-runTestProcessBenchmarkOutput_ContextCancelledNoResults— confirms no cache is written when cancelled before any results are collectedTestProcessBenchmarkOutput_ContextCancelledCacheFails— ensures results are still returned in memory even when cache save fails during cancellation
- Edge case tests (6 scenarios):
TestDrainStdout_BufferedLines— verifiesdrainStdoutcaptures lines already buffered in channelTestDrainStdout_EmptyChannel— confirmsdrainStdoutreturns immediately when channel is emptyTestSession_DoubleCleanup— verifies cleanup idempotencyTestSubmitBatchIfReady_ExactBoundary— verifies batch submission triggers at exactlybenchmarkBatchSizeresultsTestSubmitBatchIfReady_BelowBoundary— confirms no submission occurs below batch size thresholdTestProcessBenchmarkOutput_FinalizeCacheFails— verifies results are returned and submission state is correct even when cache save fails during normal finalization
These tests validate per-item submission tracking via the
Submittedfield, incremental batch submission with configurable batch size, cache persistence after each successful batch for crash recovery, and partial result preservation during context cancellation. - Normal operation tests (6 scenarios):
Testing Partial Result Preservation During Cancellation#
Long-running operations that collect results over time (benchmark runs, task processing) should preserve partial results when the context is cancelled, enabling recovery on the next agent startup without re-running the entire operation. Test both normal completion and cancellation paths.
Pattern:
- Buffer results in memory during processing
- On
ctx.Done(), flush any remaining buffered data - Cache partial results to disk using atomic writes (temp file +
os.Rename) - Reset submission flags so the operation retries on next startup
- Load cached results on startup and submit them before starting new work
Example: Benchmark Context Cancellation Tests
// TestProcessBenchmarkOutput_ContextCancelledWithResults verifies partial
// results are cached when context is cancelled mid-run
func TestProcessBenchmarkOutput_ContextCancelledWithResults(t *testing.T) {
cleanupHTTP := testhelpers.SetupHTTPMock()
t.Cleanup(cleanupHTTP)
cleanupState := testhelpers.SetupTestState(789, "https://test.api", "test-token")
t.Cleanup(cleanupState)
sess, err := testhelpers.NewMockSession("bench-cancel")
require.NoError(t, err)
ctx, cancel := context.WithCancel(context.Background())
// Buffer lines before cancellation
lines := makeBenchmarkLines(3, 1)
for _, line := range lines {
sess.StdoutLines <- line
}
cancel() // Trigger cancellation
mgr := NewManager(agentstate.State.APIClient.Agents())
results := mgr.processBenchmarkOutput(ctx, sess)
// Verify partial results captured
assert.Len(t, results, 3)
// Verify cached to disk for retry
cached, loadErr := loadBenchmarkCache()
require.NoError(t, loadErr)
require.NotNil(t, cached)
assert.Len(t, cached, 3)
// Verify submission flag reset for retry
assert.False(t, agentstate.State.GetBenchmarksSubmitted())
}
Test Coverage Requirements:
- Cancellation with results collected — verify cache persistence
- Cancellation with no results — verify no cache written
- Cancellation when cache save fails — verify results still returned in memory
- Cache load on next startup — verify cached results submitted before new work
drainStdoutedge cases — empty channel, partial reads, buffered data
See lib/benchmark/manager_test.go for comprehensive cancellation test coverage.
Backward Compatibility Testing#
When adding fields to persisted data structures:
- Test deserialization of old cache formats without the new field
- Verify that missing fields default to appropriate zero values (e.g.,
Submitteddefaults tofalse) - Example:
TestLoadBenchmarkCache_BackwardCompatibleverifies old benchmark cache files withoutSubmittedfield load correctly
Testing Cleanup Operations#
Resource cleanup operations (files, sessions, temporary data) must be tested to prevent accumulation that can lead to disk exhaustion or resource leaks. Cleanup should be tested at both the session level and task level to ensure resources are properly freed across all completion and failure paths.
Session-Level Cleanup Testing:
Test that session cleanup methods remove all associated files. Session.Cleanup() removes output files, charset files, hash files, restore files (.restore), and hashcat session files (.log and .pid). The restore file cleanup was added in PR #135 to address issue #22 where restore files were accumulating indefinitely. The session file cleanup was added in PR #160 to clean up hashcat-created .log and .pid files that are generated when using the --session flag.
- Test that
Session.Cleanup()removes restore files when present - Test that cleanup removes
.logand.pidfiles created by hashcat for the session - Test that cleanup is idempotent (handles missing files gracefully without errors)
- Test that cleanup handles empty or uninitialized file paths (including empty session names)
- Example from
lib/hashcat/session_test.go:
func TestCleanup_RemovesSessionLogAndPidFiles(t *testing.T) {
setupSessionTestState(t)
tempDir := t.TempDir()
sessionName := "attack-42"
logFile := filepath.Join(tempDir, sessionName+".log")
pidFile := filepath.Join(tempDir, sessionName+".pid")
require.NoError(t, os.WriteFile(logFile, []byte("log data"), 0o600))
require.NoError(t, os.WriteFile(pidFile, []byte("12345"), 0o600))
t.Chdir(tempDir)
sess := &Session{
sessionName: sessionName,
}
sess.Cleanup()
_, err := os.Stat(logFile)
require.True(t, os.IsNotExist(err), ".log file should be removed after Cleanup")
_, err = os.Stat(pidFile)
require.True(t, os.IsNotExist(err), ".pid file should be removed after Cleanup")
}
func TestCleanup_IdempotentSessionFiles(t *testing.T) {
setupSessionTestState(t)
sess := &Session{
sessionName: "attack-nonexistent",
}
// Should not panic or error
sess.Cleanup()
require.Empty(t, sess.sessionName, "sessionName should be cleared even when files don't exist")
}
Task-Level Cleanup Testing:
Test the task.CleanupTaskFiles(attackID int64) helper function that removes files by attack ID. This function was added in PR #135 to handle cleanup for pre-session failure paths (download failures, session creation failures) where Session.Cleanup() is not available. It removes hash files and restore files but intentionally does NOT clean resource files (word lists, rule lists, mask lists) because they are shared across attacks and may be reused via checksum-based caching.
- Test that
CleanupTaskFiles()removes both hash files and restore files for a given attack ID - Test cleanup handles nonexistent files without panicking
- Test that both files are removed in a single call
- Example from
lib/task/cleanup_test.go:
func TestCleanupTaskFiles_RemovesBothFiles(t *testing.T) {
cleanupState := testhelpers.SetupMinimalTestState(1)
defer cleanupState()
var attackID int64 = 99
hashFile := filepath.Join(agentstate.State.HashlistPath, "99.hsh")
restoreFile := filepath.Join(agentstate.State.RestoreFilePath, "99.restore")
require.NoError(t, os.MkdirAll(filepath.Dir(hashFile), 0o750))
require.NoError(t, os.MkdirAll(filepath.Dir(restoreFile), 0o750))
require.NoError(t, os.WriteFile(hashFile, []byte("hashes"), 0o600))
require.NoError(t, os.WriteFile(restoreFile, []byte("data"), 0o600))
CleanupTaskFiles(attackID)
_, hashErr := os.Stat(hashFile)
assert.True(t, os.IsNotExist(hashErr), "hash file should be removed")
_, restoreErr := os.Stat(restoreFile)
assert.True(t, os.IsNotExist(restoreErr), "restore file should be removed")
}
Cleanup Integration Patterns:
sess.Cleanup()is called in error paths inrunAttackTask(added in PR #135):- When
sess.Start()fails - When task times out
- When done channel processing encounters errors
- When
- Use
CleanupTaskFileshelper for pre-session failure paths:DownloadFiles()failures inprocessTask()(added in PR #135)NewHashcatSession()failures inRunTask()(added in PR #135)AcceptTask()failures inprocessTask()(added in PR #135)
- All cleanup paths should be tested to ensure files don't accumulate indefinitely
- Integration test example from
lib/task/runner_test.go:
func TestHandleDoneChan_CleansRestoreFile(t *testing.T) {
sess, err := testhelpers.NewMockSession("test-session")
require.NoError(t, err)
// Create a real restore file on disk
restoreFile := filepath.Join(t.TempDir(), "test.restore")
require.NoError(t, os.WriteFile(restoreFile, []byte("data"), 0o600))
sess.RestoreFilePath = restoreFile
mgr := newTestManager()
mgr.handleDoneChan(nil, task, sess)
_, statErr := os.Stat(restoreFile)
assert.True(t, os.IsNotExist(statErr), "restore file should be removed after handleDoneChan")
}
Agent Startup Cleanup Testing:
Test cleanup operations that run at agent startup to remove orphaned files from previous runs. The CleanupOrphanedSessionFiles function demonstrates this pattern with 8 table-driven tests covering multiple scenarios:
- File removal by pattern: Test that only files matching the specific pattern (e.g.,
attack-*.log,attack-*.pid) are removed - Pattern specificity: Verify non-matching files are preserved (e.g.,
benchmark.log,hashcat.log) - File type filtering: Test that symlinks and directories matching the pattern are skipped (only regular files removed)
- Edge case handling: Test behavior with empty directories, missing directories, and mixed file sets
- Platform-specific behavior: Verify cleanup is skipped on platforms where it's unsafe (e.g., Windows where session dir = binary dir)
- Example from
lib/hashcat/session_dir_cleanup_test.go:
func TestCleanupOrphanedInDir_SkipsSymlinks(t *testing.T) {
dir := t.TempDir()
// Create a real file and a symlink matching the pattern
targetFile := filepath.Join(dir, "target.txt")
require.NoError(t, os.WriteFile(targetFile, []byte("important"), 0o600))
require.NoError(t, os.Symlink(targetFile, filepath.Join(dir, "attack-evil.log")))
// Also create a regular attack file that should be removed
require.NoError(t, os.WriteFile(filepath.Join(dir, "attack-1.log"), []byte("log"), 0o600))
cleanupOrphanedInDir(dir)
// Symlink should still exist
_, err := os.Lstat(filepath.Join(dir, "attack-evil.log"))
require.NoError(t, err, "symlink should not be removed")
// Target file should still exist
_, err = os.Stat(targetFile)
require.NoError(t, err, "target file should not be affected")
// Regular attack file should be removed
_, err = os.Stat(filepath.Join(dir, "attack-1.log"))
require.True(t, os.IsNotExist(err), "regular attack file should be removed")
}
func TestCleanupOrphanedInDir_MixedFiles(t *testing.T) {
dir := t.TempDir()
// Create a mix of files
attackFiles := []string{"attack-1.log", "attack-2.pid", "attack-99.log"}
keepFiles := []string{"benchmark.log", "hashcat.pid", "attack-1.restore", "notes.txt"}
for _, f := range attackFiles {
require.NoError(t, os.WriteFile(filepath.Join(dir, f), []byte("data"), 0o600))
}
for _, f := range keepFiles {
require.NoError(t, os.WriteFile(filepath.Join(dir, f), []byte("data"), 0o600))
}
cleanupOrphanedInDir(dir)
entries, err := os.ReadDir(dir)
require.NoError(t, err)
require.Len(t, entries, len(keepFiles), "only non-attack files should remain")
names := make([]string, 0, len(entries))
for _, e := range entries {
names = append(names, e.Name())
}
for _, f := range keepFiles {
require.Contains(t, names, f, "kept file %s should still exist", f)
}
}
Why Test Startup Cleanup:
- Safety verification: Confirm only intended files are removed (prevent accidental deletion)
- Security hardening: Verify symlink protection prevents following malicious links
- Graceful degradation: Ensure cleanup failures don't prevent agent startup
- Pattern correctness: Validate file matching logic catches all orphans without false positives
Best Practices:
- Use
testhelpers.SetupMinimalTestState()to create isolated test environments - Use
os.IsNotExist()to verify file removal - Use
t.TempDir()for session-level cleanup tests - Use table-driven tests to cover multiple file type scenarios in a single test function
- Test both positive cases (files that should be removed) and negative cases (files that should be preserved)
- Use
entry.Type().IsRegular()checks in production code to skip symlinks and directories - Cleanup functions should log errors but not fail (use
assert, notrequirefor cleanup verification) - Pass
context.Contextto error handling functions in cleanup paths (usecontext.Background()for operations that must complete even during shutdown) - Test all completion paths: normal completion, timeout, cancellation, and error scenarios
Comprehensive Cleanup Testing Example:
For a complete example of testing cleanup operations including absolute path handling, DirEntry.Type() fallbacks, and cross-platform considerations, see docs/solutions/logic-errors/hashcat-session-file-cleanup-wrong-directory.md. This document demonstrates:
- Testing cleanup with absolute paths vs relative paths (critical for hashcat session files)
- Handling
DirEntry.Type()unknown values with fallback toentry.Info() - Skipping symlink tests on Windows (
os.Symlinkrequires elevated privileges) - Using
errcheckto catch discarded cleanup errors - Prevention strategies for
os.IsNotExistmasking wrong-path bugs
Testing Conditional Side Effects Based on Error Type#
When a single error path has different side effects based on error type, tests should verify both the conditional logic and the distinct behaviors.
Pattern: Conditional Side Effects in Error Paths
// Test that 404 during acceptance skips AbandonTask
t.Run("accept 404 skips abandon", func(t *testing.T) {
httpmock.RegisterResponder("POST",
"/api/v1/client/tasks/123/accept_task",
httpmock.NewJsonResponderOrPanic(404, nil))
err := processTask(ctx, task)
// Assert error returned
require.Error(t, err)
require.ErrorIs(t, err, task.ErrTaskAcceptNotFound)
// Assert AbandonTask was NOT called
info := httpmock.GetCallCountInfo()
assert.Equal(t, 0, info["POST /api/v1/client/tasks/123/abandon"])
// Assert local cleanup still happened
assert.NoFileExists(t, taskFilePath)
})
// Test that other errors still trigger abandon
t.Run("accept 5xx triggers abandon", func(t *testing.T) {
httpmock.RegisterResponder("POST",
"/api/v1/client/tasks/123/accept_task",
httpmock.NewJsonResponderOrPanic(503, nil))
httpmock.RegisterResponder("POST",
"/api/v1/client/tasks/123/abandon",
httpmock.NewJsonResponderOrPanic(204, nil))
err := processTask(ctx, task)
require.Error(t, err)
require.ErrorIs(t, err, task.ErrTaskAcceptFailed)
// Assert AbandonTask WAS called
info := httpmock.GetCallCountInfo()
assert.Equal(t, 1, info["POST /api/v1/client/tasks/123/abandon"])
})
Key Testing Principles:
- Test both branches of the conditional explicitly
- Assert on the presence/absence of side effects (API calls, file operations)
- Use
httpmock.GetCallCountInfo()to verify which endpoints were/weren't called - Verify error types returned (
ErrorIs) match expected sentinel errors - Ensure both paths still perform shared cleanup (
CleanupTaskFiles)
Example from task acceptance failure handling:
The processTask function conditionally calls AbandonTask based on error type:
ErrTaskAcceptNotFound(404) — Task vanished, skipAbandonTask(no server state transition needed)- Other accept errors — Call
AbandonTaskto notify server and prevent task starvation
Must-complete operations like AbandonTask(context.Background(), t) should be tested for conditional invocation, not just for context isolation.
See lib/agent/agent_test.go (TestProcessTask_AcceptFailure) for the complete test suite.
JavaScript Testing (Vitest)#
- Install dependencies:
bun add -d vitest jsdom @testing-library/dom @hotwired/stimulus - Configure
vitest.config.js:import { defineConfig } from 'vitest/config'; export default defineConfig({ test: { environment: 'jsdom', setupFiles: ['./spec/javascript/setup.js'] } }); - Place tests in
spec/javascript/controllers/ - Use
describe,it, andexpectfrom Vitest - Simulate DOM events and verify controller behavior (e.g., tab switching, ARIA attributes)
- Use
@testing-library/domfor DOM queries
Tested Stimulus Controllers#
tabs_controller— Tab switching and ARIA attributes (seespec/javascript/controllers/tabs_controller.test.js)select_controller— Tom Select integration for searchable dropdowns (seespec/javascript/controllers/select_controller.test.js)- Wraps the Tom Select library for enhanced dropdown functionality
- Features:
- Configurable empty options via
data-select-allow-empty-value(default: false) - Configurable max options via
data-select-max-options-value(default: 100) - Uses
dropdown_inputplugin for search functionality - Properly destroys instances on disconnect
- Prevents re-initialization on Turbo morphing or reconnection
- Gracefully handles initialization failures with retry prevention
- Configurable empty options via
- Test coverage includes 11 tests:
- Initialization and connection
- Default options passing
- Duplicate connect prevention
- Disconnect and cleanup
- Custom configuration values (allowEmpty, maxOptions)
- Initialization failure handling
- Retry prevention after failure
- State reset on disconnect
- Used for hash type dropdown to display hashcat mode ID alongside name (e.g., "0 - MD5") with searchable filtering
- Example test pattern for external library integration:
import { describe, it, expect, beforeEach, vi } from "vitest"; import TomSelect from "tom-select"; vi.mock("tom-select", () => ({ default: vi.fn().mockImplementation(function (element, options) { this.element = element; this.options = options; this.destroy = vi.fn(); return this; }) })); it("passes default options to TomSelect", () => { expect(TomSelect).toHaveBeenCalledWith( getSelectElement(), expect.objectContaining({ allowEmptyOption: false, plugins: ['dropdown_input'], maxOptions: 100 }) ); });
Testing Atomic Locks and Job Idempotency#
Background jobs that process resources triggered by after_commit callbacks may face race conditions when the callback fires multiple times (e.g., record save + attachment commit). Use atomic locking patterns to prevent duplicate processing.
Pattern: Atomic Lock with UPDATE ... WHERE#
Use an atomic UPDATE ... WHERE query to claim work before processing:
# Acquire lock before processing
rows_claimed = HashList.where(id: id, processed: false)
.update_all(processed: true)
return if rows_claimed.zero?
# Process the work here
ingest_hash_items(list)
Key Considerations:
- Rollback on failure: Reset the flag in a rescue block so the job can retry
- Cleanup partial state: Delete any leftover data from failed attempts before re-ingesting
- Validate completion: Raise an error if no work was actually performed
- Race detection: Test scenarios where another job claims the lock first
Example: ProcessHashListJob#
The ProcessHashListJob demonstrates this pattern with 8 test contexts covering:
- Duplicate prevention: Job returns early when called twice (lock already claimed)
- Rollback on error:
processedflag resets tofalsewhen ingestion fails - Partial failure recovery: Cleans up leftover items from prior failed attempts before re-ingesting
- Race condition handling: Returns early when another job atomically claims the lock first
- Record deletion during processing: Raises
RecordNotSavedwhen the record disappears mid-job - Rollback failure logging: Logs rollback errors but re-raises the original exception
- Empty file handling: Raises an error when no items are processed
- Idempotent retry: Re-running after failure produces the same final count without duplicates
See spec/jobs/process_hash_list_job_spec.rb for the complete test suite.
Alternative Approaches Considered:
- Advisory locks (
pg_try_advisory_lock) — adds complexity, requires explicit release - Separate
processing_statecolumn — cleaner semantics but requires migration - Redis lock — adds external dependency for a DB-level concern
Decision: Atomic update keeps overhead low (one extra UPDATE) without long-lived locks.
Testing Model Concerns#
Model concerns should have dedicated test files organized under spec/models/concerns/:
- Use
letblocks to set up test data shared across contexts - Group tests by method or feature using
describeblocks - Test all states and edge cases with
contextblocks - Example:
Agent::Benchmarkingconcern (24 total examples, 5 forbenchmarking?method)
describe "#benchmarking?" do
let(:pending_agent) { create(:agent, state: "pending") }
context "when agent is pending, recently seen, and has no benchmarks" do
before { pending_agent.update!(last_seen_at: 30.seconds.ago) }
it "returns true" do
expect(pending_agent.benchmarking?).to be true
end
end
context "when agent is active" do
it "returns false" do
agent.update!(last_seen_at: 30.seconds.ago)
expect(agent).to be_active
expect(agent.benchmarking?).to be false
end
end
end
Updated Behavior with Upsert-Based Ingestion:
- The
last_benchmarksmethod now returns all benchmarks without date filtering, since the unique index(agent_id, hash_type, device)ensures only one row exists per agent/hash_type/device combination - Benchmark retrieval uses
.exists?instead of.empty?to avoid loading records unnecessarily
See spec/models/concerns/agent/benchmarking_spec.rb for the complete concern test suite.
System Test Helpers for JavaScript Components#
When testing JavaScript-enhanced UI components in system tests, add helper methods to spec/support/page_objects/base_page.rb to encapsulate interaction patterns.
Tus Upload Setup#
System tests involving tus file uploads require the tusd container to be running. Add the TusdHelper.ensure_tusd_running call to the test setup:
before(:all) { TusdHelper.ensure_tusd_running }
How it works:
TusdHelper.ensure_tusd_runningstarts a tusd Docker container via testcontainers- The container is configured with a shared volume mount so both tusd (inside Docker) and the Rails process (on the host) can access uploaded files at the same path
- The helper sets
TUS_ENDPOINT_URLandTUS_UPLOADS_DIRenvironment variables for the test environment - The container uses a random mapped port to avoid conflicts with other test runs
- The container is shared across all tests and automatically cleaned up when the test suite exits
Tom Select Helper#
The tom_select_fill_and_choose helper interacts with Tom Select dropdowns using the dropdown_input plugin:
# Interact with a Tom Select dropdown by clicking, typing, and selecting
def tom_select_fill_and_choose(select_id, text)
control = find("##{select_id}-ts-control", visible: true)
control.click
dropdown = find("##{select_id}-ts-dropdown", visible: true)
input = dropdown.find("input.dropdown-input", visible: true)
input.set(text)
dropdown.find(".option", text: text, match: :prefer_exact, visible: true).click
self
end
Usage in page objects:
def select_hash_type(name)
tom_select_fill_and_choose("hash_list_hash_type_id", name)
self
end
Why encapsulate in page objects:
- Isolates fragile selectors and interaction sequences
- Provides semantic methods for test scenarios
- Makes tests readable and maintainable
See spec/support/page_objects/hash_list_form_page.rb and spec/system/hash_lists/create_hash_list_spec.rb for usage examples.
Testing upsert_all Operations#
When using upsert_all for idempotent bulk writes, test these key behaviors:
- Idempotent upsert behavior: Submitting the same data multiple times should result in updates (not duplicates)
- Unique index enforcement: Verify the unique index constraint is properly used as a conflict target
- Pre-validation filtering: Invalid entries should be filtered before upsert (validation before bulk operations)
- Timestamp management: Verify that
updated_atis managed correctly in upsert operations
Example: Benchmark Submission Endpoint#
The benchmark submission endpoint demonstrates this pattern with the (agent_id, hash_type, device) unique index:
# Filter and validate entries before upsert
valid_records = build_valid_benchmark_records(params[:hashcat_benchmarks])
if valid_records.any?
# upsert_all with unique index as conflict target
HashcatBenchmark.upsert_all(
valid_records,
unique_by: %i[agent_id hash_type device],
update_only: %i[hash_speed runtime benchmark_date]
)
end
Validation Rules:
- Positive speeds (hash_speed > 0)
- Non-negative hash types and devices (>= 0)
- Positive runtime (> 0)
- Invalid entries are logged and skipped
Why This Index Works:
- The unique key
(agent_id, hash_type, device)represents the natural key: one benchmark row per agent per hash-type per device - The old index
(agent_id, benchmark_date, hash_type)included a mutable timestamp, so every submission with a new timestamp bypassed deduplication - The new index enables
upsert_allto reliably update existing benchmarks instead of creating duplicates
Test Coverage:
# Test idempotent updates (submitting same data twice)
it "updates existing benchmark when submitted again" do
create(:hashcat_benchmark, agent: agent, hash_type: 1000, device: 1)
post submit_benchmark_path, params: {
hashcat_benchmarks: [
{ hash_type: 1000, device: 1, hash_speed: "2000", runtime: 500 }
]
}
expect(agent.hashcat_benchmarks.count).to eq(1)
expect(agent.hashcat_benchmarks.first.hash_speed).to eq(2000.0)
end
# Test invalid entry filtering
it "skips invalid entries but processes valid ones" do
post submit_benchmark_path, params: {
hashcat_benchmarks: [
{ hash_type: 1000, device: 1, hash_speed: "1000", runtime: 1000 }, # valid
{ hash_type: 2000, device: 1, hash_speed: "0", runtime: 500 }, # invalid speed
{ hash_type: 3000, device: 1, hash_speed: "5000", runtime: 0 } # invalid runtime
]
}
expect(response).to have_http_status(:no_content)
expect(agent.hashcat_benchmarks.reload.count).to eq(1)
end
# Test multi-batch submission preserves all rows
it "preserves existing benchmarks when submitting new ones" do
create(:hashcat_benchmark, agent: agent, hash_type: 1000, device: 1)
post submit_benchmark_path, params: {
hashcat_benchmarks: [
{ hash_type: 2000, device: 1, hash_speed: "2000", runtime: 5000 }
]
}
expect(agent.hashcat_benchmarks.reload.count).to eq(2)
end
# Test all-invalid payload handling
it "returns 422 when all entries are invalid" do
pending_agent = create(:agent, state: "pending")
post submit_benchmark_path, params: {
hashcat_benchmarks: [
{ hash_type: 2000, device: 1, hash_speed: "0", runtime: 500 },
{ hash_type: 3000, device: 1, hash_speed: "5000", runtime: 0 }
]
}
expect(response).to have_http_status(:unprocessable_content)
expect(pending_agent.reload.state).to eq("pending")
expect(pending_agent.hashcat_benchmarks.count).to eq(0)
end
See spec/requests/api/v1/client/agents_spec.rb for the complete test suite. See GOTCHAS.md Database & ActiveRecord section for additional upsert_all gotchas (Rails 8.1+ auto-manages updated_at, bypasses AR callbacks, NOT NULL column requirements with unique_by: :id).
Testing Database Migrations with Cleanup Steps#
When migrations replace a permissive unique index with a stricter one, they need cleanup steps to remove duplicates before adding the new constraint. Test both the migration itself and verify the cleanup logic doesn't lose important data.
Pattern: DISTINCT ON Deduplication#
Use DISTINCT ON with ORDER BY to keep the "best" row when deduplicating:
# Remove duplicate rows sharing (agent_id, hash_type, device)
execute <<~SQL.squish
DELETE FROM hashcat_benchmarks
WHERE id NOT IN (
SELECT DISTINCT ON (agent_id, hash_type, device) id
FROM hashcat_benchmarks
ORDER BY agent_id, hash_type, device,
benchmark_date DESC NULLS LAST,
id DESC
)
SQL
Key Considerations:
- Keep the latest data: Order by relevant timestamp columns (DESC, with NULLS LAST) to preserve the most recent record
- Break ties deterministically: Add
id DESCas final tiebreaker to ensure consistent results - Test data preservation: Verify cleanup keeps the intended row and removes the correct duplicates
- Test rollback: Verify the
downmigration restores the old schema (though deleted data cannot be recovered)
Example: HashcatBenchmarks Unique Index Migration#
The migration demonstrates this pattern by replacing the old (agent_id, benchmark_date, hash_type) index with (agent_id, hash_type, device):
def up
# Remove old index that allowed duplicates
remove_index :hashcat_benchmarks,
name: "idx_on_agent_id_benchmark_date_hash_type_a667ecb9be"
# Clean up duplicates before adding stricter index
execute <<~SQL.squish
DELETE FROM hashcat_benchmarks
WHERE id NOT IN (
SELECT DISTINCT ON (agent_id, hash_type, device) id
FROM hashcat_benchmarks
ORDER BY agent_id, hash_type, device,
benchmark_date DESC NULLS LAST,
id DESC
)
SQL
# Add new unique index on natural key
add_index :hashcat_benchmarks,
%i[agent_id hash_type device],
unique: true
end
Why This Pattern:
- Old index was useless: Including
benchmark_date(a mutable timestamp) in the unique key meant every submission with a new timestamp bypassed the constraint, allowing duplicates instead of showing distinct benchmarks for different devices - Natural key: The unique key should be
(agent_id, hash_type, device)— one row per agent per hash-type per device - Idempotent upserts: The new index enables
upsert_allto use it as a conflict target for reliable duplicate prevention
Test Coverage:
Test the migration both ways:
it "removes duplicate benchmarks keeping latest" do
agent = create(:agent)
# Create duplicates with different dates
old = create(:hashcat_benchmark,
agent: agent, hash_type: 1000, device: 1,
benchmark_date: 1.day.ago, hash_speed: 1000)
new = create(:hashcat_benchmark,
agent: agent, hash_type: 1000, device: 1,
benchmark_date: Time.zone.now, hash_speed: 2000)
migrate(:up)
expect(HashcatBenchmark.exists?(new.id)).to be true
expect(HashcatBenchmark.exists?(old.id)).to be false
end
it "allows rollback to old schema" do
migrate(:up)
migrate(:down)
# Verify old index is restored
expect(ActiveRecord::Base.connection.index_exists?(
:hashcat_benchmarks,
%i[agent_id benchmark_date hash_type]
)).to be true
end
See db/migrate/20260225025422_change_hashcat_benchmarks_unique_index.rb for the complete migration. See GOTCHAS.md Database & ActiveRecord section for additional migration gotchas.
Testing Memory-Efficient Query Patterns#
When processing large datasets, test that code avoids memory-intensive ActiveRecord patterns that cause OOM on large files. Guard against regression to patterns that instantiate full AR object graphs when raw data access is sufficient.
Pattern: Verify Query Optimization Without AR Object Instantiation
Use SQL query introspection to verify that the implementation uses memory-efficient patterns like pluck instead of includes/index_by:
it "does not instantiate HashItem AR objects for the cracked-hash lookup" do
# Guard against regression to includes/index_by which would instantiate
# full HashItem objects and reintroduce memory pressure on large lists.
# The job should use pluck (returning raw arrays) not SELECT "hash_items".*
select_star_queries = []
callback = lambda { |_name, _start, _finish, _id, payload|
sql = payload[:sql].to_s
if sql.include?('"hash_items".*') || sql.match?(/SELECT\s+hash_items\.\*/i)
select_star_queries << sql
end
}
ActiveSupport::Notifications.subscribed(callback, "sql.active_record") do
ProcessHashListJob.perform_now(hash_list.id)
end
expect(select_star_queries).to be_empty,
"Expected no SELECT hash_items.* queries (would instantiate AR objects), but found:\n#{select_star_queries.join("\n")}"
end
Rationale: The original implementation used includes(:hash_list).index_by(&:hash_value) which instantiated full AR objects for every hash item, causing OOM on 100GB+ hash lists. The test verifies the optimized joins/pluck approach is maintained.
Example from ProcessHashListJob (PR #801, updated in PR #811):
The job processes large hash files in batches, checking each batch against a lookup of already-cracked hashes. The optimized implementation:
- Uses
joins(:hash_list).pluck(:hash_value_digest, :hash_value, :plain_text, :attack_id)instead ofincludes(:hash_list).index_by(&:hash_value) - Returns raw arrays (4 scalars per row: digest, hash_value, plain_text, attack_id) instead of full AR object graphs
- Uses digest-based index lookup (
where(hash_value_digest: digests)) with collision guard (.find { |hv, _, _| hv == hash_value }) - Memory bounded by
batch_sizerather than total file size - Query introspection test prevents reintroduction of memory-intensive patterns
When to Use This Pattern:
- Background jobs processing large datasets (hash lists, logs, analytics)
- Bulk operations where you only need a few columns for lookup/comparison
- Any code path that could hit memory limits on production data volumes
- Cross-resource lookups that don't need full model behavior
See spec/jobs/process_hash_list_job_spec.rb for the complete test implementation.
Testing Cross-Resource State Propagation#
When one resource's state changes affect related resources, test that updates propagate correctly across boundaries while respecting type/scope constraints.
Pattern: Test State Propagation with Cross-Resource Contexts
Create separate test contexts that verify state changes propagate to related resources of the same type/scope, while leaving unrelated resources unchanged:
context "when a hash is cracked in one list" do
let(:other_hash_list) { create(:hash_list, hash_mode: hash_list.hash_mode) }
let(:other_hash_item) { create(:hash_item, hash_list: other_hash_list, hash_value: "matching_hash") }
it "marks the same hash cracked in other lists of the same hash type" do
# Create a cracked hash in the source list
create(:hash_item, :cracked, hash_list: source_list, hash_value: "matching_hash", attack: attack)
# Process a new list with the same hash
ProcessHashListJob.perform_now(other_hash_list.id)
# Verify the matching hash was marked as cracked with source metadata
item = HashItem.find_by(hash_list: other_hash_list, hash_value: "matching_hash")
expect(item.cracked).to be true
expect(item.plain_text).to eq(source_item.plain_text)
expect(item.attack_id).to eq(attack.id)
end
it "does not affect hashes in lists of different hash types" do
different_type_list = create(:hash_list, hash_mode: different_mode)
create(:hash_item, hash_list: different_type_list, hash_value: "matching_hash")
ProcessHashListJob.perform_now(different_type_list.id)
item = HashItem.find_by(hash_list: different_type_list, hash_value: "matching_hash")
expect(item.cracked).to be false
end
end
Key Testing Principles:
- Test the happy path: Verify state propagates to matching resources (same hash_type)
- Test boundary conditions: Verify state does NOT propagate across type boundaries (different hash_type)
- Test metadata preservation: Verify related metadata (
plain_text,attack_id) is copied correctly - Test non-matching items: Verify unrelated items in the target resource remain unchanged
Example from ProcessHashListJob (PR #801, updated in PR #811):
The job implements cross-resource state propagation for cracked hashes:
- When processing a new hash list, check if any hashes were already cracked in other lists
- Uses digest-based lookup (
where(hash_value_digest: digests)) with collision guard (AND hash_value = ?) for performance - If a match is found with the same hash_type, mark the new item as cracked with source
plain_textandattack_id - Hashes in lists with different hash_types are not affected (even with matching
hash_value) - Test coverage includes 3 specs:
- Marks matching hash cracked with source metadata
- Leaves non-matching items uncracked
- Does not mark hashes cracked across different hash types
When to Use This Pattern:
- Deduplication logic that shares state across resources
- Cache invalidation that needs to propagate to related records
- Status synchronization between parent/child resources
- Any domain logic where "once X is known about resource A, it applies to all matching A's"
See spec/jobs/process_hash_list_job_spec.rb for the complete test suite.
Testing Digest-Based Index Patterns#
When a model uses a digest field to work around B-tree index size limits (e.g., hash_value_digest for indexing long hash_value TEXT columns), tests should verify both the automatic digest population and the collision guard logic.
Key Testing Principles:
- Automatic digest population: The
hash_value_digestfield is auto-populated via abefore_validationcallback that computesDigest::MD5.hexdigest(hash_value) - Factory test patterns: FactoryBot should let the callback populate the digest automatically—avoid manually setting
hash_value_digestin factories - Bulk insert patterns:
insert_allandupsert_allbypass ActiveRecord callbacks, so bulk-insert code paths must computehash_value_digest: Digest::MD5.hexdigest(value)inline - Collision guards required: MD5 is not collision-resistant, so all digest-based lookups must verify the full
hash_valuematches:- Single-row lookups:
.where(hash_value_digest: digest).find { |item| item.hash_value == hash_value } - Batch updates:
.where(hash_value_digest: digests).where(hash_value: hash_values).update_all(...)
- Single-row lookups:
- Index usage verification: Tests should confirm queries use the composite digest indexes (
hash_value_digest, hash_list_idorhash_value_digest, cracked)
Example: Testing Digest Computation in Bulk Operations
it "computes hash_value_digest inline for insert_all" do
hash_items = []
batch.each_line do |line|
hash_items << {
hash_value: line,
hash_value_digest: Digest::MD5.hexdigest(line),
hash_list_id: list.id,
created_at: Time.current,
updated_at: Time.current
}
end
HashItem.insert_all(hash_items)
# Verify digests match
inserted = HashItem.where(hash_list_id: list.id)
inserted.each do |item|
expect(item.hash_value_digest).to eq(Digest::MD5.hexdigest(item.hash_value))
end
end
Example: Testing Collision Guard in Cross-Resource Lookups
it "uses collision guard when looking up cracked hashes" do
digest = Digest::MD5.hexdigest("test_hash")
# Query uses digest index with full hash_value verification
hash_value_digests = ["test_hash"].map { |v| Digest::MD5.hexdigest(v) }
cracked_hashes = HashItem.joins(:hash_list)
.where(hash_value_digest: hash_value_digests, cracked: true)
.where(hash_value: "test_hash") # Collision guard
.pluck(:hash_value_digest, :hash_value, :plain_text, :attack_id)
expect(cracked_hashes).not_to be_empty
end
Why Test Digest Patterns:
- Prevent index insertion failures: Long
hash_valueTEXT exceeds PostgreSQL's ~2704 byte B-tree limit; digest ensures all values fit - Guard against collision regressions: Tests verify collision guards are present and prevent false positive matches
- Verify bulk-insert correctness: Ensure
insert_all/upsert_allpaths compute digests inline (callbacks bypassed) - Index coverage verification: Confirm queries leverage the digest-based composite indexes for performance
Reference: See GOTCHAS.md § "hash_value_digest Pattern" for the complete implementation requirements and collision guard patterns.
See spec/models/hash_item_spec.rb, spec/jobs/process_hash_list_job_spec.rb, and spec/services/crack_submission_service_spec.rb for comprehensive test coverage of digest-based patterns.
E2E and Playwright#
- Write tests to simulate user workflows and verify UI state
- Use mocked APIs for isolated tests, real backend for E2E
- Reference the coverage plan for required scenarios
Test Coverage and Enforcement#
- Coverage is measured using SimpleCov for Rails backend tests
- Coverage threshold: Minimum enforced in CI (see
spec/spec_helper.rb) - Every HTTP endpoint should have integration test coverage
- E2E coverage is tracked against a detailed matrix of UI pages, workflows, and features
- Missing areas are explicitly listed in the coverage plan
CI Pipeline Integration#
- CI is defined in
.github/workflows/CI.yml(for Rails/Ruby backend) and.github/workflows/go.yml(for Go agent)- Rails CI: Runs on Ubuntu with Postgres and Redis services, installs dependencies, sets up DB, precompiles assets, runs all tests, uploads test results as artifacts, reports coverage to Code Climate, quality gates require all critical path tests to pass before release, and failed system tests trigger screenshot uploads for debugging
- Go CI:
- Test job: Runs
go test -v ./...for standard unit/integration tests - Race detector job: Dedicated CI job running
go test -race -v ./...with 15-minute timeout (required check for detecting data races) - Coverage job: Runs with race detector enabled (
go test -race -coverprofile=coverage.out -covermode=atomic ./...), uploads to Codecov (no longer markedcontinue-on-error) - Linter job: Runs
golangci-lintwith auto-fix enabled for standardization
- Test job: Runs
Status Reports and Testing Guide#
- Test status and coverage are tracked in the Phase 3 E2E Test Coverage Plan
- Testing guidelines and infrastructure requirements are documented in the same plan and in Phase 6 Monitoring, Testing, and Documentation
Maintenance and Best Practices#
- Tests are updated alongside feature development
- Test data is regularly refreshed
- Documentation is kept in sync with coverage
- The architecture encourages isolation, real-time feature validation, accessibility, and role-based workflow testing
Further Reading#
- E2E Test Coverage Plan
- Monitoring, Testing, and Documentation Guide
- AGENTS.md — Comprehensive Go agent development guide covering package structure, DB constraints, FK cascade strategy, performance patterns, error handling, linter gotchas, dependency considerations, and testing best practices. The agent codebase has been successfully refactored from a "god package" (~80K lines, 17 files) into focused sub-packages:
lib/display/— All display/output functions with defensive bounds checkinglib/benchmark/— Benchmark management with Manager struct using constructor injectionlib/task/— Task management with Manager struct using constructor injectionlib/cserrors/— Centralized error handling- The refactoring was completed in PR #131 (benchmark submission) and PR #134 (full package extraction).
- GOTCHAS.md — Reference document for linting edge cases, code generation pitfalls, configuration traps, and other development gotchas.
- CI Pipeline Configuration