Documents
Testing Strategy and Documentation
Testing Strategy and Documentation
Type
Document
Status
Published
Created
Oct 31, 2025
Updated
Mar 28, 2026
Updated by
Dosu Bot

Testing Strategy and Documentation#

CipherSwarm uses a comprehensive, multi-layered testing strategy to ensure reliability, security, and maintainability across all components. This guide covers the testing architecture, organization, conventions, and best practices for both backend and frontend code, as well as documentation standards.


Three-Tier Testing Architecture#

CipherSwarm's tests are organized into three primary layers:

  • Layer 1: Backend Tests

    • Ruby on Rails (RSpec) for models, controllers, services, and API endpoints
    • Python (pytest) for supporting backend services (where applicable)
    • Database constraint and foreign key cascade/nullify tests
  • Layer 2: Frontend Mocked Tests

    • JavaScript unit tests (Vitest) for Stimulus controllers and UI logic
    • Playwright tests with mocked APIs for isolated UI component validation
  • Layer 3: Full End-to-End (E2E) Tests

    • Playwright E2E tests against a real Dockerized backend
    • Simulate real user workflows, authentication, and resource flows
    • Includes seeded test data, SSR authentication, and real object storage

See the Phase 3 E2E Test Coverage Plan for a detailed matrix of test categories and coverage.


Test Organization#

  • Backend (Rails/RSpec):

    • Located in spec/ (e.g., spec/models/, spec/requests/, spec/services/, spec/system/)
    • Organized by resource or feature (e.g., spec/requests/agents_spec.rb)
    • System tests cover full user workflows and UI integration
  • Frontend/JavaScript:

    • Stimulus controller tests in spec/javascript/controllers/
    • Uses Vitest and jsdom for DOM simulation
    • Playwright E2E and component tests in /frontend/e2e/
  • Python Backend:

    • Integration and unit tests in /tests/unit and /tests/integration
    • Uses pytest, httpx, and testcontainers
  • Test Documentation:


Running Tests#

  • Backend (RSpec):

    • Run with bin/bundle exec rspec
    • CI uses this command with additional formatters for reporting
    • System tests involving tus file uploads automatically start a tusd Docker container via testcontainers
  • Python:

    • Run with pytest
  • Frontend/E2E (Playwright):

    • Run with Playwright CLI, targeting either mocked APIs or a real backend
  • JavaScript (Vitest):

    • Run with bun test:js or npx vitest run
  • Full Suite:

    • CI orchestrates all tiers, including environment setup, asset compilation, and test execution

Testing Infrastructure Requirements#

Testcontainers for tusd:

  • System tests requiring tus file uploads use the testcontainers gem to automatically start a tusd Docker container
  • This eliminates manual Docker container setup for tusd testing
  • The testcontainers approach ensures consistent test environments across development machines and CI/CD pipelines
  • Requires Docker to be available on the test machine
  • The tusd container is automatically started on first use and cleaned up when the test suite exits

Writing Tests: Guidance and Conventions#

Backend (RSpec)#

  • Use describe for features, context for scenarios, and it for expectations
  • Set up test data with FactoryBot
  • Cover authentication, authorization, and all HTTP endpoints
  • Example:
    describe "#index" do
      context "when a non-logged in user tries to access the agents index" do
        it "redirects to login page" do
          get agents_path
          expect(response).to redirect_to(new_user_session_path)
        end
      end
    end
    

Testing Array Length Validations#

Array length validations enforce security constraints (DoS prevention through bounded array inputs) and should be tested with the same rigor as authentication and authorization. Test both valid and invalid cases, including edge cases at boundaries.

Maximum Array Length Validations#

For variable-length arrays with maximum limits (e.g., devices array limited to 64 items):

describe "devices length" do
  it "accepts up to 64 devices" do
    agent.devices = Array.new(64) { |i| "GPU #{i}" }
    expect(agent).to be_valid
  end

  it "rejects more than 64 devices" do
    agent.devices = Array.new(65) { |i| "GPU #{i}" }
    expect(agent).not_to be_valid
    expect(agent.errors[:devices]).to include("must have at most 64 entries")
  end

  it "accepts an empty devices array" do
    agent.devices = []
    expect(agent).to be_valid
  end
end

Exact Array Length Validations#

For fixed-length arrays (e.g., progress, recovered_hashes, recovered_salts must be exactly 2 elements):

describe "array length validations" do
  let(:hashcat_status) { build(:hashcat_status) }

  it "accepts progress with exactly 2 entries" do
    hashcat_status.progress = [100, 10000]
    expect(hashcat_status).to be_valid
  end

  it "rejects progress with more than 2 entries" do
    hashcat_status.progress = [100, 10000, 999]
    expect(hashcat_status).not_to be_valid
    expect(hashcat_status.errors[:progress]).to include("must have exactly 2 entries")
  end

  it "rejects progress with only 1 entry" do
    hashcat_status.progress = [100]
    expect(hashcat_status).not_to be_valid
    expect(hashcat_status.errors[:progress]).to include("must have exactly 2 entries")
  end

  it "accepts nil progress" do
    hashcat_status.progress = nil
    expect(hashcat_status.errors[:progress]).to be_empty
  end
end

Association Count Validations#

For associations with count limits (e.g., device_statuses limited to 64 associated records):

it "rejects more than 64 device_statuses" do
  hashcat_status.save!
  hashcat_status.device_statuses.clear

  65.times { |i| hashcat_status.device_statuses.build(device_id: i, device_name: "GPU #{i}", device_type: "GPU", speed: 1000, utilization: 50, temperature: 60) }
  expect(hashcat_status).not_to be_valid
  expect(hashcat_status.errors[:device_statuses]).to include("must have at most 64 entries")
end

it "accepts up to 64 device_statuses" do
  hashcat_status.save!
  hashcat_status.device_statuses.clear

  64.times { |i| hashcat_status.device_statuses.build(device_id: i, device_name: "GPU #{i}", device_type: "GPU", speed: 1000, utilization: 50, temperature: 60) }
  expect(hashcat_status).to be_valid
end

Comprehensive Coverage Example:

  • Agent model: 3 test specs for devices array validation (max 64 items)
  • HashcatStatus model: 8 test specs for array validations
    • 4 tests for progress (exactly 2 elements)
    • 1 test each for recovered_hashes and recovered_salts (exactly 2 elements)
    • 2 tests for device_statuses association count (max 64 items)

These validations back OpenAPI maxItems and minItems constraints, providing defense in depth against unbounded array payloads.

OpenAPI Schema Constraints#

The OpenAPI schema (docs/swagger.json) enforces array size constraints at the API layer:

  • devicesmaxItems: 64 (agent configuration and status)
  • device_statusesmaxItems: 64 (hashcat status submission)
  • progress, recovered_hashes, recovered_saltsminItems: 2, maxItems: 2 (exactly 2 elements)

These constraints provide the first line of defense against DoS attacks via unbounded array payloads. The API layer validates incoming requests before they reach model validations, ensuring consistent enforcement across all API clients.

Database Foreign Key Cascade/Nullify Testing#

  • Prefer DB-level on_delete: :cascade / :nullify for ephemeral and parent-child tables
  • Test DB-level FK cascades using delete, not destroy (to avoid masking by Rails callbacks)
  • Always specify column: when a table has multiple FKs to the same parent
  • Example:
    it "cascade-deletes associated DeviceStatus when HashcatStatus is deleted" do
      device_status = create(:device_status)
      hashcat_status = device_status.hashcat_status
      hashcat_status.delete
      expect(DeviceStatus.exists?(device_status.id)).to be false
    end
    
  • See AGENTS.md for DB constraint strategy and additional Go agent testing patterns

Go Agent Testing#

  • Extract business logic from external process invocations (e.g., hashcat) into separate functions for independent testing
    • Implemented Example: The processBenchmarkOutput function in the lib/benchmark/ package separates business logic (parsing output, tracking submission state, incremental batch submission) from external process handling (hashcat session management). This extraction enabled comprehensive table-driven testing without requiring actual hashcat execution. The drainStdout helper function in lib/benchmark/parse.go further demonstrates separation of concerns for buffered channel handling.
  • Use Manager structs with constructor injection for testability (e.g., benchmark.Manager, task.Manager)
  • Write table-driven tests for core logic
  • Use mocks for network and OS-level interactions
  • Test Data Isolation: Prevent test mutation leakage by using helper functions that return fresh data copies rather than shared test fixtures
    • Pattern: Extract test fixtures into functions like newSampleBenchmarkResults() instead of package-level variables
    • Example: Benchmark cache tests use newSampleBenchmarkResults() to prevent Submitted flag mutations from affecting subsequent subtests
    • Each test invocation gets an independent slice, avoiding cross-contamination between test cases
  • Context Propagation: Context propagation throughout the error handling chain is complete. All error handling and reporting functions now require a context.Context as the first parameter to enable proper cancellation and timeout handling:
    • cserrors.SendAgentError(ctx, client, errorType, message, severity)
    • apierrors.LogAndSendError(ctx, errorSender, err, message, severity)
    • task.HandleError(ctx, err, taskID, attemptNum, errorSender)
    • ~27 of 30 //nolint:contextcheck directives have been removed (only 3 remain for NewHashcatSession)
    • context.Background() pattern: For "must-complete" operations that should execute even after context cancellation (cleanup, shutdown notifications, task abandonment), code now explicitly passes context.Background(). Examples:
      • taskMgr.AbandonTask(context.Background(), t) in cleanup handlers
      • cserrors.SendAgentShutdown(context.Background(), ...) for shutdown notifications
      • cserrors.SendAgentError(context.Background(), ...) in timeout/kill failure paths
    • Signal handling: The agent now uses signal.NotifyContext + context.WithCancel instead of signal.Notify + channel patterns. Heartbeat state errors now call cancel() instead of sending on a signal channel, providing unified shutdown semantics across the codebase
  • Agent Configuration and Benchmarking: The agent startup flow includes conditional benchmark execution based on the benchmarks_needed flag from the server
    • Test scenarios should cover both benchmarks_needed=true (agent must run benchmarks) and benchmarks_needed=false (server has valid benchmarks)
    • Verify the mapConfiguration function properly maps the benchmarks_needed field from API responses
    • Test both startup and reload paths to ensure the flag is respected in both scenarios
    • Verify that when benchmarks_needed=false, the agent logs the skip and sets BenchmarksSubmitted=true without running benchmarks
  • Test Cleanup Patterns: Use t.Cleanup(fn) instead of defer fn() for proper state restoration in table-driven tests. t.Cleanup runs after all subtests complete, preventing state leakage between test cases. All benchmark manager tests have been standardized to use t.Cleanup for HTTP mock and state restoration.
  • See AGENTS.md and GOTCHAS.md for comprehensive Go development patterns, including:
    • Performance optimizations (regex compilation, cached conversions, syscall reduction, cross-platform path handling with filepath.Join)
    • Error handling best practices (context propagation, file.Close() error logging)
    • Linter edge cases and development pitfalls (see GOTCHAS.md for comprehensive reference, including hashcat.NewTestSession limitations and golangci-lint auto-fix behavior)
    • Dependency considerations (go-getter, gopsutil, govulncheck)
    • Testing best practices (prefer t.Cleanup(fn) over defer fn(), context cancellation patterns)

Testing Stateful Operations with External Dependencies#

When testing operations that interact with external systems (APIs, processes) and maintain internal state:

  • Track per-item state to enable partial completion and retry

    • Use boolean flags (e.g., Submitted field on display.BenchmarkResult) to track which items have been processed
    • Test helper functions independently:
      • unsubmittedResults — filters items by state flag
      • allSubmitted — checks if all items are complete
      • markSubmitted — updates state flags after successful operations
  • Test incremental batch operations by simulating multiple submission attempts

    • Verify normal batching (e.g., benchmarkBatchSize = 10)
    • Test final batch submission with fewer items
    • Simulate partial failures (first batch fails, retry succeeds)
    • Verify cache persistence with state flags for crash recovery
    • Test channel draining for buffered output (e.g., drainStdout for hashcat output)
  • Implemented Example: The processBenchmarkOutput implementation in lib/benchmark/manager.go demonstrates this pattern with comprehensive test scenarios covering normal operation, context cancellation, and edge cases:

    • Normal operation tests (6 scenarios):
      • TestProcessBenchmarkOutput_AllBatchesSucceed — verifies multiple batches (10+5 items) submit successfully and all results marked as Submitted
      • TestProcessBenchmarkOutput_SingleBatch — confirms single batch submission (< 10 items) and proper state tracking
      • TestProcessBenchmarkOutput_BatchFailsFinalSucceeds — validates retry logic when first batch fails but final submission succeeds
      • TestProcessBenchmarkOutput_AllSendsFail — ensures no results marked submitted when all API calls fail, cache preserved for retry
      • TestProcessBenchmarkOutput_EmptyResults — handles session completion with no output
      • TestProcessBenchmarkOutput_SessionError — verifies partial results cached even when hashcat session errors
    • Context cancellation tests (3 scenarios):
      • TestProcessBenchmarkOutput_ContextCancelledWithResults — verifies partial results are cached to disk when context is cancelled mid-run
      • TestProcessBenchmarkOutput_ContextCancelledNoResults — confirms no cache is written when cancelled before any results are collected
      • TestProcessBenchmarkOutput_ContextCancelledCacheFails — ensures results are still returned in memory even when cache save fails during cancellation
    • Edge case tests (6 scenarios):
      • TestDrainStdout_BufferedLines — verifies drainStdout captures lines already buffered in channel
      • TestDrainStdout_EmptyChannel — confirms drainStdout returns immediately when channel is empty
      • TestSession_DoubleCleanup — verifies cleanup idempotency
      • TestSubmitBatchIfReady_ExactBoundary — verifies batch submission triggers at exactly benchmarkBatchSize results
      • TestSubmitBatchIfReady_BelowBoundary — confirms no submission occurs below batch size threshold
      • TestProcessBenchmarkOutput_FinalizeCacheFails — verifies results are returned and submission state is correct even when cache save fails during normal finalization

    These tests validate per-item submission tracking via the Submitted field, incremental batch submission with configurable batch size, cache persistence after each successful batch for crash recovery, and partial result preservation during context cancellation.

Testing Partial Result Preservation During Cancellation#

Long-running operations that collect results over time (benchmark runs, task processing) should preserve partial results when the context is cancelled, enabling recovery on the next agent startup without re-running the entire operation. Test both normal completion and cancellation paths.

Pattern:

  • Buffer results in memory during processing
  • On ctx.Done(), flush any remaining buffered data
  • Cache partial results to disk using atomic writes (temp file + os.Rename)
  • Reset submission flags so the operation retries on next startup
  • Load cached results on startup and submit them before starting new work

Example: Benchmark Context Cancellation Tests

// TestProcessBenchmarkOutput_ContextCancelledWithResults verifies partial
// results are cached when context is cancelled mid-run
func TestProcessBenchmarkOutput_ContextCancelledWithResults(t *testing.T) {
  cleanupHTTP := testhelpers.SetupHTTPMock()
  t.Cleanup(cleanupHTTP)

  cleanupState := testhelpers.SetupTestState(789, "https://test.api", "test-token")
  t.Cleanup(cleanupState)

  sess, err := testhelpers.NewMockSession("bench-cancel")
  require.NoError(t, err)

  ctx, cancel := context.WithCancel(context.Background())

  // Buffer lines before cancellation
  lines := makeBenchmarkLines(3, 1)
  for _, line := range lines {
    sess.StdoutLines <- line
  }

  cancel() // Trigger cancellation

  mgr := NewManager(agentstate.State.APIClient.Agents())
  results := mgr.processBenchmarkOutput(ctx, sess)

  // Verify partial results captured
  assert.Len(t, results, 3)

  // Verify cached to disk for retry
  cached, loadErr := loadBenchmarkCache()
  require.NoError(t, loadErr)
  require.NotNil(t, cached)
  assert.Len(t, cached, 3)

  // Verify submission flag reset for retry
  assert.False(t, agentstate.State.GetBenchmarksSubmitted())
}

Test Coverage Requirements:

  • Cancellation with results collected — verify cache persistence
  • Cancellation with no results — verify no cache written
  • Cancellation when cache save fails — verify results still returned in memory
  • Cache load on next startup — verify cached results submitted before new work
  • drainStdout edge cases — empty channel, partial reads, buffered data

See lib/benchmark/manager_test.go for comprehensive cancellation test coverage.

Backward Compatibility Testing#

When adding fields to persisted data structures:

  • Test deserialization of old cache formats without the new field
  • Verify that missing fields default to appropriate zero values (e.g., Submitted defaults to false)
  • Example: TestLoadBenchmarkCache_BackwardCompatible verifies old benchmark cache files without Submitted field load correctly

Testing Cleanup Operations#

Resource cleanup operations (files, sessions, temporary data) must be tested to prevent accumulation that can lead to disk exhaustion or resource leaks. Cleanup should be tested at both the session level and task level to ensure resources are properly freed across all completion and failure paths.

Session-Level Cleanup Testing:

Test that session cleanup methods remove all associated files. Session.Cleanup() removes output files, charset files, hash files, restore files (.restore), and hashcat session files (.log and .pid). The restore file cleanup was added in PR #135 to address issue #22 where restore files were accumulating indefinitely. The session file cleanup was added in PR #160 to clean up hashcat-created .log and .pid files that are generated when using the --session flag.

  • Test that Session.Cleanup() removes restore files when present
  • Test that cleanup removes .log and .pid files created by hashcat for the session
  • Test that cleanup is idempotent (handles missing files gracefully without errors)
  • Test that cleanup handles empty or uninitialized file paths (including empty session names)
  • Example from lib/hashcat/session_test.go:
func TestCleanup_RemovesSessionLogAndPidFiles(t *testing.T) {
  setupSessionTestState(t)
  tempDir := t.TempDir()

  sessionName := "attack-42"
  logFile := filepath.Join(tempDir, sessionName+".log")
  pidFile := filepath.Join(tempDir, sessionName+".pid")

  require.NoError(t, os.WriteFile(logFile, []byte("log data"), 0o600))
  require.NoError(t, os.WriteFile(pidFile, []byte("12345"), 0o600))

  t.Chdir(tempDir)

  sess := &Session{
    sessionName: sessionName,
  }

  sess.Cleanup()

  _, err := os.Stat(logFile)
  require.True(t, os.IsNotExist(err), ".log file should be removed after Cleanup")
  _, err = os.Stat(pidFile)
  require.True(t, os.IsNotExist(err), ".pid file should be removed after Cleanup")
}

func TestCleanup_IdempotentSessionFiles(t *testing.T) {
  setupSessionTestState(t)

  sess := &Session{
    sessionName: "attack-nonexistent",
  }

  // Should not panic or error
  sess.Cleanup()
  require.Empty(t, sess.sessionName, "sessionName should be cleared even when files don't exist")
}

Task-Level Cleanup Testing:

Test the task.CleanupTaskFiles(attackID int64) helper function that removes files by attack ID. This function was added in PR #135 to handle cleanup for pre-session failure paths (download failures, session creation failures) where Session.Cleanup() is not available. It removes hash files and restore files but intentionally does NOT clean resource files (word lists, rule lists, mask lists) because they are shared across attacks and may be reused via checksum-based caching.

  • Test that CleanupTaskFiles() removes both hash files and restore files for a given attack ID
  • Test cleanup handles nonexistent files without panicking
  • Test that both files are removed in a single call
  • Example from lib/task/cleanup_test.go:
func TestCleanupTaskFiles_RemovesBothFiles(t *testing.T) {
  cleanupState := testhelpers.SetupMinimalTestState(1)
  defer cleanupState()

  var attackID int64 = 99
  hashFile := filepath.Join(agentstate.State.HashlistPath, "99.hsh")
  restoreFile := filepath.Join(agentstate.State.RestoreFilePath, "99.restore")

  require.NoError(t, os.MkdirAll(filepath.Dir(hashFile), 0o750))
  require.NoError(t, os.MkdirAll(filepath.Dir(restoreFile), 0o750))
  require.NoError(t, os.WriteFile(hashFile, []byte("hashes"), 0o600))
  require.NoError(t, os.WriteFile(restoreFile, []byte("data"), 0o600))

  CleanupTaskFiles(attackID)

  _, hashErr := os.Stat(hashFile)
  assert.True(t, os.IsNotExist(hashErr), "hash file should be removed")

  _, restoreErr := os.Stat(restoreFile)
  assert.True(t, os.IsNotExist(restoreErr), "restore file should be removed")
}

Cleanup Integration Patterns:

  • sess.Cleanup() is called in error paths in runAttackTask (added in PR #135):
    • When sess.Start() fails
    • When task times out
    • When done channel processing encounters errors
  • Use CleanupTaskFiles helper for pre-session failure paths:
    • DownloadFiles() failures in processTask() (added in PR #135)
    • NewHashcatSession() failures in RunTask() (added in PR #135)
    • AcceptTask() failures in processTask() (added in PR #135)
  • All cleanup paths should be tested to ensure files don't accumulate indefinitely
  • Integration test example from lib/task/runner_test.go:
func TestHandleDoneChan_CleansRestoreFile(t *testing.T) {
  sess, err := testhelpers.NewMockSession("test-session")
  require.NoError(t, err)

  // Create a real restore file on disk
  restoreFile := filepath.Join(t.TempDir(), "test.restore")
  require.NoError(t, os.WriteFile(restoreFile, []byte("data"), 0o600))
  sess.RestoreFilePath = restoreFile

  mgr := newTestManager()
  mgr.handleDoneChan(nil, task, sess)

  _, statErr := os.Stat(restoreFile)
  assert.True(t, os.IsNotExist(statErr), "restore file should be removed after handleDoneChan")
}

Agent Startup Cleanup Testing:

Test cleanup operations that run at agent startup to remove orphaned files from previous runs. The CleanupOrphanedSessionFiles function demonstrates this pattern with 8 table-driven tests covering multiple scenarios:

  • File removal by pattern: Test that only files matching the specific pattern (e.g., attack-*.log, attack-*.pid) are removed
  • Pattern specificity: Verify non-matching files are preserved (e.g., benchmark.log, hashcat.log)
  • File type filtering: Test that symlinks and directories matching the pattern are skipped (only regular files removed)
  • Edge case handling: Test behavior with empty directories, missing directories, and mixed file sets
  • Platform-specific behavior: Verify cleanup is skipped on platforms where it's unsafe (e.g., Windows where session dir = binary dir)
  • Example from lib/hashcat/session_dir_cleanup_test.go:
func TestCleanupOrphanedInDir_SkipsSymlinks(t *testing.T) {
  dir := t.TempDir()

  // Create a real file and a symlink matching the pattern
  targetFile := filepath.Join(dir, "target.txt")
  require.NoError(t, os.WriteFile(targetFile, []byte("important"), 0o600))
  require.NoError(t, os.Symlink(targetFile, filepath.Join(dir, "attack-evil.log")))

  // Also create a regular attack file that should be removed
  require.NoError(t, os.WriteFile(filepath.Join(dir, "attack-1.log"), []byte("log"), 0o600))

  cleanupOrphanedInDir(dir)

  // Symlink should still exist
  _, err := os.Lstat(filepath.Join(dir, "attack-evil.log"))
  require.NoError(t, err, "symlink should not be removed")

  // Target file should still exist
  _, err = os.Stat(targetFile)
  require.NoError(t, err, "target file should not be affected")

  // Regular attack file should be removed
  _, err = os.Stat(filepath.Join(dir, "attack-1.log"))
  require.True(t, os.IsNotExist(err), "regular attack file should be removed")
}

func TestCleanupOrphanedInDir_MixedFiles(t *testing.T) {
  dir := t.TempDir()

  // Create a mix of files
  attackFiles := []string{"attack-1.log", "attack-2.pid", "attack-99.log"}
  keepFiles := []string{"benchmark.log", "hashcat.pid", "attack-1.restore", "notes.txt"}

  for _, f := range attackFiles {
    require.NoError(t, os.WriteFile(filepath.Join(dir, f), []byte("data"), 0o600))
  }
  for _, f := range keepFiles {
    require.NoError(t, os.WriteFile(filepath.Join(dir, f), []byte("data"), 0o600))
  }

  cleanupOrphanedInDir(dir)

  entries, err := os.ReadDir(dir)
  require.NoError(t, err)
  require.Len(t, entries, len(keepFiles), "only non-attack files should remain")

  names := make([]string, 0, len(entries))
  for _, e := range entries {
    names = append(names, e.Name())
  }
  for _, f := range keepFiles {
    require.Contains(t, names, f, "kept file %s should still exist", f)
  }
}

Why Test Startup Cleanup:

  • Safety verification: Confirm only intended files are removed (prevent accidental deletion)
  • Security hardening: Verify symlink protection prevents following malicious links
  • Graceful degradation: Ensure cleanup failures don't prevent agent startup
  • Pattern correctness: Validate file matching logic catches all orphans without false positives

Best Practices:

  • Use testhelpers.SetupMinimalTestState() to create isolated test environments
  • Use os.IsNotExist() to verify file removal
  • Use t.TempDir() for session-level cleanup tests
  • Use table-driven tests to cover multiple file type scenarios in a single test function
  • Test both positive cases (files that should be removed) and negative cases (files that should be preserved)
  • Use entry.Type().IsRegular() checks in production code to skip symlinks and directories
  • Cleanup functions should log errors but not fail (use assert, not require for cleanup verification)
  • Pass context.Context to error handling functions in cleanup paths (use context.Background() for operations that must complete even during shutdown)
  • Test all completion paths: normal completion, timeout, cancellation, and error scenarios

Comprehensive Cleanup Testing Example:

For a complete example of testing cleanup operations including absolute path handling, DirEntry.Type() fallbacks, and cross-platform considerations, see docs/solutions/logic-errors/hashcat-session-file-cleanup-wrong-directory.md. This document demonstrates:

  • Testing cleanup with absolute paths vs relative paths (critical for hashcat session files)
  • Handling DirEntry.Type() unknown values with fallback to entry.Info()
  • Skipping symlink tests on Windows (os.Symlink requires elevated privileges)
  • Using errcheck to catch discarded cleanup errors
  • Prevention strategies for os.IsNotExist masking wrong-path bugs

Testing Conditional Side Effects Based on Error Type#

When a single error path has different side effects based on error type, tests should verify both the conditional logic and the distinct behaviors.

Pattern: Conditional Side Effects in Error Paths

// Test that 404 during acceptance skips AbandonTask
t.Run("accept 404 skips abandon", func(t *testing.T) {
    httpmock.RegisterResponder("POST", 
        "/api/v1/client/tasks/123/accept_task",
        httpmock.NewJsonResponderOrPanic(404, nil))

    err := processTask(ctx, task)

    // Assert error returned
    require.Error(t, err)
    require.ErrorIs(t, err, task.ErrTaskAcceptNotFound)

    // Assert AbandonTask was NOT called
    info := httpmock.GetCallCountInfo()
    assert.Equal(t, 0, info["POST /api/v1/client/tasks/123/abandon"])

    // Assert local cleanup still happened
    assert.NoFileExists(t, taskFilePath)
})

// Test that other errors still trigger abandon
t.Run("accept 5xx triggers abandon", func(t *testing.T) {
    httpmock.RegisterResponder("POST", 
        "/api/v1/client/tasks/123/accept_task",
        httpmock.NewJsonResponderOrPanic(503, nil))

    httpmock.RegisterResponder("POST",
        "/api/v1/client/tasks/123/abandon",
        httpmock.NewJsonResponderOrPanic(204, nil))

    err := processTask(ctx, task)

    require.Error(t, err)
    require.ErrorIs(t, err, task.ErrTaskAcceptFailed)

    // Assert AbandonTask WAS called
    info := httpmock.GetCallCountInfo()
    assert.Equal(t, 1, info["POST /api/v1/client/tasks/123/abandon"])
})

Key Testing Principles:

  1. Test both branches of the conditional explicitly
  2. Assert on the presence/absence of side effects (API calls, file operations)
  3. Use httpmock.GetCallCountInfo() to verify which endpoints were/weren't called
  4. Verify error types returned (ErrorIs) match expected sentinel errors
  5. Ensure both paths still perform shared cleanup (CleanupTaskFiles)

Example from task acceptance failure handling:

The processTask function conditionally calls AbandonTask based on error type:

  • ErrTaskAcceptNotFound (404) — Task vanished, skip AbandonTask (no server state transition needed)
  • Other accept errors — Call AbandonTask to notify server and prevent task starvation

Must-complete operations like AbandonTask(context.Background(), t) should be tested for conditional invocation, not just for context isolation.

See lib/agent/agent_test.go (TestProcessTask_AcceptFailure) for the complete test suite.

JavaScript Testing (Vitest)#

  • Install dependencies:
    bun add -d vitest jsdom @testing-library/dom @hotwired/stimulus
    
  • Configure vitest.config.js:
    import { defineConfig } from 'vitest/config';
    export default defineConfig({
      test: {
        environment: 'jsdom',
        setupFiles: ['./spec/javascript/setup.js']
      }
    });
    
  • Place tests in spec/javascript/controllers/
  • Use describe, it, and expect from Vitest
  • Simulate DOM events and verify controller behavior (e.g., tab switching, ARIA attributes)
  • Use @testing-library/dom for DOM queries

Tested Stimulus Controllers#

  • tabs_controller — Tab switching and ARIA attributes (see spec/javascript/controllers/tabs_controller.test.js)
  • select_controller — Tom Select integration for searchable dropdowns (see spec/javascript/controllers/select_controller.test.js)
    • Wraps the Tom Select library for enhanced dropdown functionality
    • Features:
      • Configurable empty options via data-select-allow-empty-value (default: false)
      • Configurable max options via data-select-max-options-value (default: 100)
      • Uses dropdown_input plugin for search functionality
      • Properly destroys instances on disconnect
      • Prevents re-initialization on Turbo morphing or reconnection
      • Gracefully handles initialization failures with retry prevention
    • Test coverage includes 11 tests:
      • Initialization and connection
      • Default options passing
      • Duplicate connect prevention
      • Disconnect and cleanup
      • Custom configuration values (allowEmpty, maxOptions)
      • Initialization failure handling
      • Retry prevention after failure
      • State reset on disconnect
    • Used for hash type dropdown to display hashcat mode ID alongside name (e.g., "0 - MD5") with searchable filtering
    • Example test pattern for external library integration:
      import { describe, it, expect, beforeEach, vi } from "vitest";
      import TomSelect from "tom-select";
      
      vi.mock("tom-select", () => ({
        default: vi.fn().mockImplementation(function (element, options) {
          this.element = element;
          this.options = options;
          this.destroy = vi.fn();
          return this;
        })
      }));
      
      it("passes default options to TomSelect", () => {
        expect(TomSelect).toHaveBeenCalledWith(
          getSelectElement(),
          expect.objectContaining({
            allowEmptyOption: false,
            plugins: ['dropdown_input'],
            maxOptions: 100
          })
        );
      });
      

Testing Atomic Locks and Job Idempotency#

Background jobs that process resources triggered by after_commit callbacks may face race conditions when the callback fires multiple times (e.g., record save + attachment commit). Use atomic locking patterns to prevent duplicate processing.

Pattern: Atomic Lock with UPDATE ... WHERE#

Use an atomic UPDATE ... WHERE query to claim work before processing:

# Acquire lock before processing
rows_claimed = HashList.where(id: id, processed: false)
                       .update_all(processed: true)
return if rows_claimed.zero?

# Process the work here
ingest_hash_items(list)

Key Considerations:

  • Rollback on failure: Reset the flag in a rescue block so the job can retry
  • Cleanup partial state: Delete any leftover data from failed attempts before re-ingesting
  • Validate completion: Raise an error if no work was actually performed
  • Race detection: Test scenarios where another job claims the lock first

Example: ProcessHashListJob#

The ProcessHashListJob demonstrates this pattern with 8 test contexts covering:

  1. Duplicate prevention: Job returns early when called twice (lock already claimed)
  2. Rollback on error: processed flag resets to false when ingestion fails
  3. Partial failure recovery: Cleans up leftover items from prior failed attempts before re-ingesting
  4. Race condition handling: Returns early when another job atomically claims the lock first
  5. Record deletion during processing: Raises RecordNotSaved when the record disappears mid-job
  6. Rollback failure logging: Logs rollback errors but re-raises the original exception
  7. Empty file handling: Raises an error when no items are processed
  8. Idempotent retry: Re-running after failure produces the same final count without duplicates

See spec/jobs/process_hash_list_job_spec.rb for the complete test suite.

Alternative Approaches Considered:

  • Advisory locks (pg_try_advisory_lock) — adds complexity, requires explicit release
  • Separate processing_state column — cleaner semantics but requires migration
  • Redis lock — adds external dependency for a DB-level concern

Decision: Atomic update keeps overhead low (one extra UPDATE) without long-lived locks.

Testing Model Concerns#

Model concerns should have dedicated test files organized under spec/models/concerns/:

  • Use let blocks to set up test data shared across contexts
  • Group tests by method or feature using describe blocks
  • Test all states and edge cases with context blocks
  • Example: Agent::Benchmarking concern (24 total examples, 5 for benchmarking? method)
describe "#benchmarking?" do
  let(:pending_agent) { create(:agent, state: "pending") }

  context "when agent is pending, recently seen, and has no benchmarks" do
    before { pending_agent.update!(last_seen_at: 30.seconds.ago) }

    it "returns true" do
      expect(pending_agent.benchmarking?).to be true
    end
  end

  context "when agent is active" do
    it "returns false" do
      agent.update!(last_seen_at: 30.seconds.ago)
      expect(agent).to be_active
      expect(agent.benchmarking?).to be false
    end
  end
end

Updated Behavior with Upsert-Based Ingestion:

  • The last_benchmarks method now returns all benchmarks without date filtering, since the unique index (agent_id, hash_type, device) ensures only one row exists per agent/hash_type/device combination
  • Benchmark retrieval uses .exists? instead of .empty? to avoid loading records unnecessarily

See spec/models/concerns/agent/benchmarking_spec.rb for the complete concern test suite.

System Test Helpers for JavaScript Components#

When testing JavaScript-enhanced UI components in system tests, add helper methods to spec/support/page_objects/base_page.rb to encapsulate interaction patterns.

Tus Upload Setup#

System tests involving tus file uploads require the tusd container to be running. Add the TusdHelper.ensure_tusd_running call to the test setup:

before(:all) { TusdHelper.ensure_tusd_running }

How it works:

  • TusdHelper.ensure_tusd_running starts a tusd Docker container via testcontainers
  • The container is configured with a shared volume mount so both tusd (inside Docker) and the Rails process (on the host) can access uploaded files at the same path
  • The helper sets TUS_ENDPOINT_URL and TUS_UPLOADS_DIR environment variables for the test environment
  • The container uses a random mapped port to avoid conflicts with other test runs
  • The container is shared across all tests and automatically cleaned up when the test suite exits

Tom Select Helper#

The tom_select_fill_and_choose helper interacts with Tom Select dropdowns using the dropdown_input plugin:

# Interact with a Tom Select dropdown by clicking, typing, and selecting
def tom_select_fill_and_choose(select_id, text)
  control = find("##{select_id}-ts-control", visible: true)
  control.click
  dropdown = find("##{select_id}-ts-dropdown", visible: true)
  input = dropdown.find("input.dropdown-input", visible: true)
  input.set(text)
  dropdown.find(".option", text: text, match: :prefer_exact, visible: true).click
  self
end

Usage in page objects:

def select_hash_type(name)
  tom_select_fill_and_choose("hash_list_hash_type_id", name)
  self
end

Why encapsulate in page objects:

  • Isolates fragile selectors and interaction sequences
  • Provides semantic methods for test scenarios
  • Makes tests readable and maintainable

See spec/support/page_objects/hash_list_form_page.rb and spec/system/hash_lists/create_hash_list_spec.rb for usage examples.

Testing upsert_all Operations#

When using upsert_all for idempotent bulk writes, test these key behaviors:

  • Idempotent upsert behavior: Submitting the same data multiple times should result in updates (not duplicates)
  • Unique index enforcement: Verify the unique index constraint is properly used as a conflict target
  • Pre-validation filtering: Invalid entries should be filtered before upsert (validation before bulk operations)
  • Timestamp management: Verify that updated_at is managed correctly in upsert operations

Example: Benchmark Submission Endpoint#

The benchmark submission endpoint demonstrates this pattern with the (agent_id, hash_type, device) unique index:

# Filter and validate entries before upsert
valid_records = build_valid_benchmark_records(params[:hashcat_benchmarks])

if valid_records.any?
  # upsert_all with unique index as conflict target
  HashcatBenchmark.upsert_all(
    valid_records,
    unique_by: %i[agent_id hash_type device],
    update_only: %i[hash_speed runtime benchmark_date]
  )
end

Validation Rules:

  • Positive speeds (hash_speed > 0)
  • Non-negative hash types and devices (>= 0)
  • Positive runtime (> 0)
  • Invalid entries are logged and skipped

Why This Index Works:

  • The unique key (agent_id, hash_type, device) represents the natural key: one benchmark row per agent per hash-type per device
  • The old index (agent_id, benchmark_date, hash_type) included a mutable timestamp, so every submission with a new timestamp bypassed deduplication
  • The new index enables upsert_all to reliably update existing benchmarks instead of creating duplicates

Test Coverage:

# Test idempotent updates (submitting same data twice)
it "updates existing benchmark when submitted again" do
  create(:hashcat_benchmark, agent: agent, hash_type: 1000, device: 1)

  post submit_benchmark_path, params: {
    hashcat_benchmarks: [
      { hash_type: 1000, device: 1, hash_speed: "2000", runtime: 500 }
    ]
  }

  expect(agent.hashcat_benchmarks.count).to eq(1)
  expect(agent.hashcat_benchmarks.first.hash_speed).to eq(2000.0)
end

# Test invalid entry filtering
it "skips invalid entries but processes valid ones" do
  post submit_benchmark_path, params: {
    hashcat_benchmarks: [
      { hash_type: 1000, device: 1, hash_speed: "1000", runtime: 1000 }, # valid
      { hash_type: 2000, device: 1, hash_speed: "0", runtime: 500 }, # invalid speed
      { hash_type: 3000, device: 1, hash_speed: "5000", runtime: 0 } # invalid runtime
    ]
  }

  expect(response).to have_http_status(:no_content)
  expect(agent.hashcat_benchmarks.reload.count).to eq(1)
end

# Test multi-batch submission preserves all rows
it "preserves existing benchmarks when submitting new ones" do
  create(:hashcat_benchmark, agent: agent, hash_type: 1000, device: 1)

  post submit_benchmark_path, params: {
    hashcat_benchmarks: [
      { hash_type: 2000, device: 1, hash_speed: "2000", runtime: 5000 }
    ]
  }

  expect(agent.hashcat_benchmarks.reload.count).to eq(2)
end

# Test all-invalid payload handling
it "returns 422 when all entries are invalid" do
  pending_agent = create(:agent, state: "pending")

  post submit_benchmark_path, params: {
    hashcat_benchmarks: [
      { hash_type: 2000, device: 1, hash_speed: "0", runtime: 500 },
      { hash_type: 3000, device: 1, hash_speed: "5000", runtime: 0 }
    ]
  }

  expect(response).to have_http_status(:unprocessable_content)
  expect(pending_agent.reload.state).to eq("pending")
  expect(pending_agent.hashcat_benchmarks.count).to eq(0)
end

See spec/requests/api/v1/client/agents_spec.rb for the complete test suite. See GOTCHAS.md Database & ActiveRecord section for additional upsert_all gotchas (Rails 8.1+ auto-manages updated_at, bypasses AR callbacks, NOT NULL column requirements with unique_by: :id).

Testing Database Migrations with Cleanup Steps#

When migrations replace a permissive unique index with a stricter one, they need cleanup steps to remove duplicates before adding the new constraint. Test both the migration itself and verify the cleanup logic doesn't lose important data.

Pattern: DISTINCT ON Deduplication#

Use DISTINCT ON with ORDER BY to keep the "best" row when deduplicating:

# Remove duplicate rows sharing (agent_id, hash_type, device)
execute <<~SQL.squish
  DELETE FROM hashcat_benchmarks
  WHERE id NOT IN (
    SELECT DISTINCT ON (agent_id, hash_type, device) id
    FROM hashcat_benchmarks
    ORDER BY agent_id, hash_type, device, 
             benchmark_date DESC NULLS LAST, 
             id DESC
  )
SQL

Key Considerations:

  • Keep the latest data: Order by relevant timestamp columns (DESC, with NULLS LAST) to preserve the most recent record
  • Break ties deterministically: Add id DESC as final tiebreaker to ensure consistent results
  • Test data preservation: Verify cleanup keeps the intended row and removes the correct duplicates
  • Test rollback: Verify the down migration restores the old schema (though deleted data cannot be recovered)

Example: HashcatBenchmarks Unique Index Migration#

The migration demonstrates this pattern by replacing the old (agent_id, benchmark_date, hash_type) index with (agent_id, hash_type, device):

def up
  # Remove old index that allowed duplicates
  remove_index :hashcat_benchmarks, 
    name: "idx_on_agent_id_benchmark_date_hash_type_a667ecb9be"

  # Clean up duplicates before adding stricter index
  execute <<~SQL.squish
    DELETE FROM hashcat_benchmarks
    WHERE id NOT IN (
      SELECT DISTINCT ON (agent_id, hash_type, device) id
      FROM hashcat_benchmarks
      ORDER BY agent_id, hash_type, device, 
               benchmark_date DESC NULLS LAST, 
               id DESC
    )
  SQL

  # Add new unique index on natural key
  add_index :hashcat_benchmarks, 
    %i[agent_id hash_type device], 
    unique: true
end

Why This Pattern:

  • Old index was useless: Including benchmark_date (a mutable timestamp) in the unique key meant every submission with a new timestamp bypassed the constraint, allowing duplicates instead of showing distinct benchmarks for different devices
  • Natural key: The unique key should be (agent_id, hash_type, device) — one row per agent per hash-type per device
  • Idempotent upserts: The new index enables upsert_all to use it as a conflict target for reliable duplicate prevention

Test Coverage:

Test the migration both ways:

it "removes duplicate benchmarks keeping latest" do
  agent = create(:agent)

  # Create duplicates with different dates
  old = create(:hashcat_benchmark, 
    agent: agent, hash_type: 1000, device: 1,
    benchmark_date: 1.day.ago, hash_speed: 1000)
  new = create(:hashcat_benchmark,
    agent: agent, hash_type: 1000, device: 1,
    benchmark_date: Time.zone.now, hash_speed: 2000)

  migrate(:up)

  expect(HashcatBenchmark.exists?(new.id)).to be true
  expect(HashcatBenchmark.exists?(old.id)).to be false
end

it "allows rollback to old schema" do
  migrate(:up)
  migrate(:down)

  # Verify old index is restored
  expect(ActiveRecord::Base.connection.index_exists?(
    :hashcat_benchmarks, 
    %i[agent_id benchmark_date hash_type]
  )).to be true
end

See db/migrate/20260225025422_change_hashcat_benchmarks_unique_index.rb for the complete migration. See GOTCHAS.md Database & ActiveRecord section for additional migration gotchas.

Testing Memory-Efficient Query Patterns#

When processing large datasets, test that code avoids memory-intensive ActiveRecord patterns that cause OOM on large files. Guard against regression to patterns that instantiate full AR object graphs when raw data access is sufficient.

Pattern: Verify Query Optimization Without AR Object Instantiation

Use SQL query introspection to verify that the implementation uses memory-efficient patterns like pluck instead of includes/index_by:

it "does not instantiate HashItem AR objects for the cracked-hash lookup" do
  # Guard against regression to includes/index_by which would instantiate
  # full HashItem objects and reintroduce memory pressure on large lists.
  # The job should use pluck (returning raw arrays) not SELECT "hash_items".*
  select_star_queries = []
  callback = lambda { |_name, _start, _finish, _id, payload|
    sql = payload[:sql].to_s
    if sql.include?('"hash_items".*') || sql.match?(/SELECT\s+hash_items\.\*/i)
      select_star_queries << sql
    end
  }

  ActiveSupport::Notifications.subscribed(callback, "sql.active_record") do
    ProcessHashListJob.perform_now(hash_list.id)
  end

  expect(select_star_queries).to be_empty,
    "Expected no SELECT hash_items.* queries (would instantiate AR objects), but found:\n#{select_star_queries.join("\n")}"
end

Rationale: The original implementation used includes(:hash_list).index_by(&:hash_value) which instantiated full AR objects for every hash item, causing OOM on 100GB+ hash lists. The test verifies the optimized joins/pluck approach is maintained.

Example from ProcessHashListJob (PR #801, updated in PR #811):

The job processes large hash files in batches, checking each batch against a lookup of already-cracked hashes. The optimized implementation:

  • Uses joins(:hash_list).pluck(:hash_value_digest, :hash_value, :plain_text, :attack_id) instead of includes(:hash_list).index_by(&:hash_value)
  • Returns raw arrays (4 scalars per row: digest, hash_value, plain_text, attack_id) instead of full AR object graphs
  • Uses digest-based index lookup (where(hash_value_digest: digests)) with collision guard (.find { |hv, _, _| hv == hash_value })
  • Memory bounded by batch_size rather than total file size
  • Query introspection test prevents reintroduction of memory-intensive patterns

When to Use This Pattern:

  • Background jobs processing large datasets (hash lists, logs, analytics)
  • Bulk operations where you only need a few columns for lookup/comparison
  • Any code path that could hit memory limits on production data volumes
  • Cross-resource lookups that don't need full model behavior

See spec/jobs/process_hash_list_job_spec.rb for the complete test implementation.

Testing Cross-Resource State Propagation#

When one resource's state changes affect related resources, test that updates propagate correctly across boundaries while respecting type/scope constraints.

Pattern: Test State Propagation with Cross-Resource Contexts

Create separate test contexts that verify state changes propagate to related resources of the same type/scope, while leaving unrelated resources unchanged:

context "when a hash is cracked in one list" do
  let(:other_hash_list) { create(:hash_list, hash_mode: hash_list.hash_mode) }
  let(:other_hash_item) { create(:hash_item, hash_list: other_hash_list, hash_value: "matching_hash") }

  it "marks the same hash cracked in other lists of the same hash type" do
    # Create a cracked hash in the source list
    create(:hash_item, :cracked, hash_list: source_list, hash_value: "matching_hash", attack: attack)

    # Process a new list with the same hash
    ProcessHashListJob.perform_now(other_hash_list.id)

    # Verify the matching hash was marked as cracked with source metadata
    item = HashItem.find_by(hash_list: other_hash_list, hash_value: "matching_hash")
    expect(item.cracked).to be true
    expect(item.plain_text).to eq(source_item.plain_text)
    expect(item.attack_id).to eq(attack.id)
  end

  it "does not affect hashes in lists of different hash types" do
    different_type_list = create(:hash_list, hash_mode: different_mode)
    create(:hash_item, hash_list: different_type_list, hash_value: "matching_hash")

    ProcessHashListJob.perform_now(different_type_list.id)

    item = HashItem.find_by(hash_list: different_type_list, hash_value: "matching_hash")
    expect(item.cracked).to be false
  end
end

Key Testing Principles:

  1. Test the happy path: Verify state propagates to matching resources (same hash_type)
  2. Test boundary conditions: Verify state does NOT propagate across type boundaries (different hash_type)
  3. Test metadata preservation: Verify related metadata (plain_text, attack_id) is copied correctly
  4. Test non-matching items: Verify unrelated items in the target resource remain unchanged

Example from ProcessHashListJob (PR #801, updated in PR #811):

The job implements cross-resource state propagation for cracked hashes:

  • When processing a new hash list, check if any hashes were already cracked in other lists
  • Uses digest-based lookup (where(hash_value_digest: digests)) with collision guard (AND hash_value = ?) for performance
  • If a match is found with the same hash_type, mark the new item as cracked with source plain_text and attack_id
  • Hashes in lists with different hash_types are not affected (even with matching hash_value)
  • Test coverage includes 3 specs:
    • Marks matching hash cracked with source metadata
    • Leaves non-matching items uncracked
    • Does not mark hashes cracked across different hash types

When to Use This Pattern:

  • Deduplication logic that shares state across resources
  • Cache invalidation that needs to propagate to related records
  • Status synchronization between parent/child resources
  • Any domain logic where "once X is known about resource A, it applies to all matching A's"

See spec/jobs/process_hash_list_job_spec.rb for the complete test suite.

Testing Digest-Based Index Patterns#

When a model uses a digest field to work around B-tree index size limits (e.g., hash_value_digest for indexing long hash_value TEXT columns), tests should verify both the automatic digest population and the collision guard logic.

Key Testing Principles:

  1. Automatic digest population: The hash_value_digest field is auto-populated via a before_validation callback that computes Digest::MD5.hexdigest(hash_value)
  2. Factory test patterns: FactoryBot should let the callback populate the digest automatically—avoid manually setting hash_value_digest in factories
  3. Bulk insert patterns: insert_all and upsert_all bypass ActiveRecord callbacks, so bulk-insert code paths must compute hash_value_digest: Digest::MD5.hexdigest(value) inline
  4. Collision guards required: MD5 is not collision-resistant, so all digest-based lookups must verify the full hash_value matches:
    • Single-row lookups: .where(hash_value_digest: digest).find { |item| item.hash_value == hash_value }
    • Batch updates: .where(hash_value_digest: digests).where(hash_value: hash_values).update_all(...)
  5. Index usage verification: Tests should confirm queries use the composite digest indexes (hash_value_digest, hash_list_id or hash_value_digest, cracked)

Example: Testing Digest Computation in Bulk Operations

it "computes hash_value_digest inline for insert_all" do
  hash_items = []
  batch.each_line do |line|
    hash_items << {
      hash_value: line,
      hash_value_digest: Digest::MD5.hexdigest(line),
      hash_list_id: list.id,
      created_at: Time.current,
      updated_at: Time.current
    }
  end

  HashItem.insert_all(hash_items)

  # Verify digests match
  inserted = HashItem.where(hash_list_id: list.id)
  inserted.each do |item|
    expect(item.hash_value_digest).to eq(Digest::MD5.hexdigest(item.hash_value))
  end
end

Example: Testing Collision Guard in Cross-Resource Lookups

it "uses collision guard when looking up cracked hashes" do
  digest = Digest::MD5.hexdigest("test_hash")

  # Query uses digest index with full hash_value verification
  hash_value_digests = ["test_hash"].map { |v| Digest::MD5.hexdigest(v) }
  cracked_hashes = HashItem.joins(:hash_list)
                           .where(hash_value_digest: hash_value_digests, cracked: true)
                           .where(hash_value: "test_hash") # Collision guard
                           .pluck(:hash_value_digest, :hash_value, :plain_text, :attack_id)

  expect(cracked_hashes).not_to be_empty
end

Why Test Digest Patterns:

  • Prevent index insertion failures: Long hash_value TEXT exceeds PostgreSQL's ~2704 byte B-tree limit; digest ensures all values fit
  • Guard against collision regressions: Tests verify collision guards are present and prevent false positive matches
  • Verify bulk-insert correctness: Ensure insert_all/upsert_all paths compute digests inline (callbacks bypassed)
  • Index coverage verification: Confirm queries leverage the digest-based composite indexes for performance

Reference: See GOTCHAS.md § "hash_value_digest Pattern" for the complete implementation requirements and collision guard patterns.

See spec/models/hash_item_spec.rb, spec/jobs/process_hash_list_job_spec.rb, and spec/services/crack_submission_service_spec.rb for comprehensive test coverage of digest-based patterns.

E2E and Playwright#

  • Write tests to simulate user workflows and verify UI state
  • Use mocked APIs for isolated tests, real backend for E2E
  • Reference the coverage plan for required scenarios

Test Coverage and Enforcement#

  • Coverage is measured using SimpleCov for Rails backend tests
  • Coverage threshold: Minimum enforced in CI (see spec/spec_helper.rb)
  • Every HTTP endpoint should have integration test coverage
  • E2E coverage is tracked against a detailed matrix of UI pages, workflows, and features
  • Missing areas are explicitly listed in the coverage plan

CI Pipeline Integration#

  • CI is defined in .github/workflows/CI.yml (for Rails/Ruby backend) and .github/workflows/go.yml (for Go agent)
    • Rails CI: Runs on Ubuntu with Postgres and Redis services, installs dependencies, sets up DB, precompiles assets, runs all tests, uploads test results as artifacts, reports coverage to Code Climate, quality gates require all critical path tests to pass before release, and failed system tests trigger screenshot uploads for debugging
    • Go CI:
      • Test job: Runs go test -v ./... for standard unit/integration tests
      • Race detector job: Dedicated CI job running go test -race -v ./... with 15-minute timeout (required check for detecting data races)
      • Coverage job: Runs with race detector enabled (go test -race -coverprofile=coverage.out -covermode=atomic ./...), uploads to Codecov (no longer marked continue-on-error)
      • Linter job: Runs golangci-lint with auto-fix enabled for standardization

Status Reports and Testing Guide#


Maintenance and Best Practices#

  • Tests are updated alongside feature development
  • Test data is regularly refreshed
  • Documentation is kept in sync with coverage
  • The architecture encourages isolation, real-time feature validation, accessibility, and role-based workflow testing

Further Reading#

  • E2E Test Coverage Plan
  • Monitoring, Testing, and Documentation Guide
  • AGENTS.md — Comprehensive Go agent development guide covering package structure, DB constraints, FK cascade strategy, performance patterns, error handling, linter gotchas, dependency considerations, and testing best practices. The agent codebase has been successfully refactored from a "god package" (~80K lines, 17 files) into focused sub-packages:
    • lib/display/ — All display/output functions with defensive bounds checking
    • lib/benchmark/ — Benchmark management with Manager struct using constructor injection
    • lib/task/ — Task management with Manager struct using constructor injection
    • lib/cserrors/ — Centralized error handling
    • The refactoring was completed in PR #131 (benchmark submission) and PR #134 (full package extraction).
  • GOTCHAS.md — Reference document for linting edge cases, code generation pitfalls, configuration traps, and other development gotchas.
  • CI Pipeline Configuration