Troubleshooting Guide#

This guide covers system diagnostics, common issues, and solutions for CipherSwarm v2.

Table of Contents#

Quick Diagnostics: System Health Dashboard
Authentication Issues
Project Access Issues
Campaign Issues
Attack Issues
Agent Issues
Task Issues
Resource Issues
Performance Issues
Live Updates Issues
Getting Help
Log Locations
Common Error Messages

Quick Diagnostics: System Health Dashboard#

CipherSwarm V2 includes a dedicated System Health Dashboard that provides real-time status information for all critical system components.

Accessing the Health Dashboard#

Navigate to Admin > System Health (administrators only)
The dashboard displays status cards for each service

Service Status Cards#

The health dashboard monitors four core services:

PostgreSQL#

Status: Connected / Disconnected
Metrics: Connection pool status, response time
Common Issues: Connection refused, max connections reached

Redis#

Status: Connected / Disconnected
Metrics: Memory usage, connection count
Purpose: Caching, session storage, Action Cable

File Storage#

Status: Connected / Disconnected
Metrics: Storage capacity, available space
Purpose: File storage for wordlists, rules, hash lists

Application#

Status: Running / Degraded / Error
Metrics: Uptime, memory usage, background job queue depth
Details: Ruby version, Rails version, boot time

Health Dashboard Features#

Auto-refresh: The dashboard refreshes automatically at configurable intervals
Manual Refresh: Click the refresh button for an immediate status check
Diagnostics: Each service card shows detailed diagnostic information
Error Details: When a service is unhealthy, the card displays the specific error

Using Health Data for Troubleshooting#

The health dashboard is your first stop when something goes wrong:

Check all service statuses - A single failed service can cascade
PostgreSQL down: No data access, campaigns stall, agents cannot report
Redis down: Live updates stop, sessions may expire, caching disabled
File storage down: Resource downloads fail, uploads fail, agents cannot get wordlists
Application degraded: Check memory usage and job queue depth

Authentication Issues#

Cannot Log In#

Symptoms: Login form rejects credentials or redirects back to the login page.

Solutions:

Verify your email and password are correct
Clear browser cache and cookies, or try incognito mode
Check if your account is locked or disabled (contact administrator)
Ensure the server is running (check the system health dashboard)

Session Expired#

Symptoms: You are unexpectedly logged out.

Solutions:

Log in again - sessions expire after a period of inactivity
If this happens frequently, check Redis status on the health dashboard
Contact your administrator if the session timeout is too short

Agent Authentication Failures#

Symptoms: Agent reports authentication errors in its logs.

Solutions:

Verify the agent token is correct and has not been rotated

Test authentication manually:

curl -H "Authorization: Bearer <token>" \
  https://your-server.example.com/api/v1/client/authenticate

Check if the agent has been disabled in the web interface
See Agent Setup for more details

Agent Network and Connection Issues#

Symptoms: Agent reports network errors, connection failures, or circuit breaker messages.

Automatic Retry Behavior#

The agent automatically handles transient network failures:

Retry attempts: Failed API requests are retried up to 3 times by default
Exponential backoff: Retry delays increase exponentially (1s, 2s, 4s, ...)
Maximum delay: Retry backoff is capped at 30 seconds
Automatic handling: Temporary network errors and 5xx server responses are retried without intervention

Check agent logs for retry attempts before investigating further. Brief network disruptions are handled automatically.

Circuit Breaker Errors#

Symptoms: Agent logs show "circuit breaker open" or ErrCircuitOpen messages.

What it means: The circuit breaker activates after repeated connection failures (default: 5 consecutive failures) to prevent cascading failures and resource exhaustion. This is distinct from authentication errors—it indicates the agent is protecting itself from an unresponsive server.

Behavior:

Circuit opens automatically after threshold failures
Agent skips new requests while circuit is open
Circuit automatically attempts recovery after timeout (default: 30 seconds)
One probe request is allowed in half-open state to test server recovery
Circuit closes if probe succeeds; reopens immediately if probe fails

Solutions:

Check server availability: Verify the CipherSwarm server is running and accessible
Check network connectivity: Verify network path between agent and server (firewall rules, DNS resolution, routing)
Check server health: Use the System Health Dashboard to verify server services are operational
Wait for recovery: The circuit breaker will automatically retry after the timeout period
Review server logs: Check for server-side errors that might be causing repeated failures
Check timeouts: Verify timeout settings are appropriate for your network latency (see Configuration below)

Note: Circuit breaker activation is a symptom of underlying server or network issues. Focus troubleshooting on server availability and network connectivity rather than agent configuration.

Configurable Connection Settings#

Network resilience settings can be tuned via CLI flags or server configuration:

Timeout Settings:

--connect-timeout: TCP connection timeout (default: 10s)
--read-timeout: Response read timeout (default: 30s)
--write-timeout: Request write timeout (default: 10s)
--request-timeout: Overall request timeout (default: 60s)

Retry Settings:

--api-max-retries: Maximum retry attempts (default: 3)
--api-retry-initial-delay: Initial retry delay (default: 1s)
--api-retry-max-delay: Maximum retry backoff (default: 30s)

Circuit Breaker Settings:

--circuit-breaker-failure-threshold: Failures before circuit opens (default: 5)
--circuit-breaker-timeout: Wait time before retry attempt (default: 30s)

Server-Recommended Settings: The server can provide recommended timeout and retry settings via the /configuration endpoint. Agent-side settings are used as defaults, but the server can override them for centralized tuning.

Project Access Issues#

Missing Projects#

Symptoms: Expected projects do not appear in the project selector.

Solutions:

Verify you have been granted access to the project by an administrator
Refresh the page to reload the project list
Contact your administrator to request project access

Permission Denied#

Symptoms: Authorization error when accessing resources (HTML: "You are not authorized"; JSON API: {"error": "Forbidden", "status": 403}).

Solutions:

Ensure you have the correct role within the project
Check that the resource belongs to your currently selected project
Switch to the correct project using the project selector
Contact your administrator if you need elevated permissions

Note: This is a 403 Forbidden error (authenticated user lacking permissions), distinct from 401 Unauthorized errors (authentication failures).

Campaign Issues#

Campaign Won't Start#

Symptoms: Campaign remains in pending state after creation.

Possible Causes:

No attacks configured - Add at least one attack to the campaign
No agents available - Verify agents are online and assigned to the project
Priority preemption - A higher-priority campaign may be using all agents
Hash list processing - The hash list may still be processing
Campaign quarantined - The campaign was automatically quarantined due to unrecoverable errors

Solutions:

Check the campaign has at least one attack
Go to Agents and verify at least one agent is online and in the correct project
Check if higher-priority campaigns are running
Verify the hash list status shows "Processed"
Check for a red "Quarantined" badge on the campaign page (see Campaign Quarantine below)

Campaign Stuck with No Progress#

Symptoms: Campaign shows running state but progress bar does not advance.

Possible Causes:

Agent issues - Agents may have crashed or lost connectivity
Task errors - Tasks may be failing repeatedly
Resource access - Agents cannot download required wordlists or rules

Solutions:

Check agent status in the Agents page
Look at the campaign error log for task failure messages
Verify resources (wordlists, rules) are accessible
Check the system health dashboard for file storage status

Campaign Shows Wrong Progress#

Symptoms: Progress percentage seems incorrect or jumps unexpectedly.

Explanation: Progress is based on keyspace processed. Some attack types (like rule-based dictionary attacks) have keyspace estimates that adjust as processing proceeds. This can cause apparent jumps or reversals in progress.

Campaign Quarantine#

Symptoms: Campaign shows a red "Quarantined" badge, alert banner displays quarantine reason, agents are not receiving new tasks from the campaign.

Cause: Campaign was automatically quarantined due to unrecoverable agent errors. CipherSwarm automatically quarantines campaigns when agents report fatal hashcat errors that cannot be resolved by retrying the task. Common quarantine triggers include:

Token length exception: Hash format does not match the selected hash type (e.g., MD5 hashes loaded but SHA-256 hash type selected)
No hashes loaded: Hash list file is empty or improperly formatted
Invalid hash type for attack mode: The attack configuration is incompatible with the hash type
Other terminal hashcat errors: Fatal errors that hashcat cannot recover from

Resolution:

Review the quarantine reason: Check the alert banner on the campaign show page for the specific error message
Fix the underlying issue:
- If hash format error: Verify the hash type setting matches the actual hash format in your hash list file
- If no hashes loaded: Check that the hash list file is not empty and contains properly formatted hashes
- If attack parameter error: Review attack configuration for compatibility with the selected hash type
Update the problematic parameter:
- Quarantine automatically clears when you update the hash type or hash list file
- Quarantine automatically clears when you update attack parameters (word lists, rules, masks, attack mode, etc.)
- Administrators can manually clear quarantine using the "Clear Quarantine" button
Retry the campaign: After clearing quarantine, agents will resume receiving tasks from the campaign

Prevention: Validate hash list format and attack parameters before starting campaigns. Ensure the hash type matches your hash file format and that attack configurations are compatible with the selected hash type.

Attack Issues#

Attack Validation Errors#

Symptoms: Cannot create or save an attack configuration.

Solutions:

Verify all required fields are filled in
Check that selected resources (wordlists, rules) exist and are accessible
Ensure the attack type is compatible with the hash type
Review any error messages displayed on the form

Attack Stays in Pending State#

Symptoms: Attack created but never starts running.

Solutions:

Ensure the parent campaign is started
Check that higher-priority attacks are not blocking this one
Verify agents have the required capabilities for this attack
Check that required resources are available for download

Attack Fails Immediately#

Symptoms: Attack transitions to failed state shortly after starting.

Solutions:

Check the task error messages in the campaign error log
Common causes: invalid mask syntax, missing wordlist, unsupported hash type
Verify the attack configuration matches hashcat requirements
Test the attack configuration manually with hashcat if possible

Agent Issues#

For detailed agent troubleshooting, see Agent Troubleshooting.

Quick Checks#

Agent offline: Check network connectivity and agent logs
Agent not accepting tasks: Verify project assignment and agent state
Agent errors: Check the agent error tab in the web interface

Connection Troubleshooting#

For detailed network diagnostics including retry behavior, circuit breaker recovery, and configurable timeout settings, see Agent Network and Connection Issues above.

Task Issues#

CipherSwarm V2 provides task management actions directly from the web interface.

Task Stuck in Pending#

Symptoms: Tasks remain in pending state and are never picked up by agents.

Possible Causes:

No agents are available (all busy or offline)
Agent capabilities do not match task requirements
Agents are not assigned to the task's project
Campaign is quarantined due to unrecoverable errors

Solutions:

Check agent availability in the Agents page
Wait for agents to complete their current tasks
Verify agent project assignments match the campaign's project
Check the campaign for a "Quarantined" badge (see Campaign Quarantine)

Task Failures#

Symptoms: Tasks transition to failed state.

Common Causes:

Agent crashes during processing
hashcat returns an error (invalid arguments, GPU error)
Network interruption during processing
Resource files corrupted or inaccessible

Solutions:

Check the task detail page for the specific error message
Review agent logs for additional context
Try retrying the task (see Task Actions below)
If the error persists, review the attack configuration

Task Cancellation#

To cancel a running task:

Navigate to the task detail page
Click Cancel
The task will be stopped and its status updated
The keyspace will be reassigned to another task if agents are available

Task Retry Procedures#

To retry a failed task:

Navigate to the task detail page
Click Retry
The task is reset and made available for an agent to pick up
The retry count is incremented for tracking purposes

Task Reassignment#

To reassign a task to a different agent:

Navigate to the task detail page
Click Reassign
The task is released from the current agent
An available agent will pick up the task automatically

Task Status History#

Each task maintains a status history showing all state transitions:

When the task was created
When it was assigned to an agent
Any pauses, errors, or retries
When it completed or failed

Resource Issues#

Upload Failures#

Symptoms: File upload fails or hangs.

Solutions:

Check file size against the configured maximum
Verify the file format is supported
Check file storage status on the system health dashboard
Try a smaller file to isolate the issue
Check browser console for JavaScript errors

Resource Access Denied#

Symptoms: Agents report they cannot download resources.

Solutions:

Verify the resource is assigned to the correct project
Check file storage connectivity on the system health dashboard
Verify the agent's token has not expired
Check server logs for presigned URL generation errors

Missing Resources#

Symptoms: Resources that existed previously are no longer accessible.

Solutions:

Check if the resource was deleted by another user
Verify you are in the correct project context
Check file storage health for potential data loss
Review audit logs for resource deletion events

Performance Issues#

Slow Web Interface#

Symptoms: Pages load slowly, actions take a long time to complete.

Solutions:

Check the system health dashboard for service issues
Verify PostgreSQL is not under heavy load
Check Redis status (caching may be down)
Reduce the number of concurrent browser tabs to CipherSwarm
Check network latency to the server

High Server Resource Usage#

Symptoms: Server CPU or memory usage is consistently high.

Solutions:

Check Sidekiq job queue depth on the health dashboard
Review active campaigns for unusually large operations
Monitor database query performance
Consider scaling resources if usage is consistently high

Slow Agent Performance#

Symptoms: Agents are cracking slower than expected.

Solutions:

Check GPU temperatures (throttling starts around 80-90C)
Verify agent workload settings are appropriate
Check for competing processes on the agent machine
See Performance Optimization for tuning guidance

Live Updates Issues#

Real-Time Updates Not Working#

Symptoms: Dashboard and campaign pages do not update automatically.

Solutions:

Check browser console for WebSocket connection errors
Verify Redis is running (shown on health dashboard)
Try refreshing the page to re-establish the connection
Check if your network/proxy blocks WebSocket connections
CipherSwarm automatically falls back to polling if WebSockets are unavailable

Stale Data Warning#

Symptoms: A "Stale data" warning appears on the page.

Explanation: This warning indicates that the live update connection was lost for more than 30 seconds. The displayed data may be out of date.

Solutions:

Refresh the page to get the latest data
Check network connectivity
Verify server health via the system health dashboard

Getting Help#

When to Contact Administrators#

Contact your administrator if:

You cannot resolve an issue using this guide
Multiple users are experiencing the same problem
System health dashboard shows service failures
You suspect a security issue

What Information to Provide#

When reporting an issue, include:

Description: What you were trying to do and what happened
Steps to Reproduce: How to recreate the issue
Error Messages: Exact text of any error messages
Screenshots: If the issue is visual
Timestamps: When the issue occurred
Browser/OS: Your browser version and operating system

Log Locations#

Server Logs#

Log File	Purpose
`log/development.log`	Development environment application log
`log/production.log`	Production environment application log
`log/sidekiq.log`	Background job processing log

Docker Logs#

# Application logs
docker logs cipherswarm-web

# Sidekiq logs
docker logs cipherswarm-sidekiq

# PostgreSQL logs
docker logs cipherswarm-postgres

# Redis logs
docker logs cipherswarm-redis

Agent Logs#

See Agent Troubleshooting for agent-specific log locations and analysis.

Common Error Messages#

Error Message	Meaning	Solution
"You are not authorized"	Authorization failure (HTML, 403)	Check role and project access
"Forbidden" (JSON)	Authorization failure (JSON API, 403)	Check role and project access
"Record not found"	Resource was deleted or doesn't exist	Verify resource exists, check project context
"Hash list is still processing"	Hash list upload hasn't completed	Wait for processing, check job queue
"No agents available"	No online agents in this project	Register agents or check agent status
"Resource download failed"	Agent couldn't download wordlist/rules	Check file storage status and agent connectivity
"Invalid attack configuration"	Attack parameters are incorrect	Review attack settings, check hash type compat
"Connection refused"	A backend service is down	Check system health dashboard
"Task expired"	Agent took too long to complete a task	Check agent performance, increase timeout
"Circuit breaker open"	Agent protecting against failed server	Check server health and network connectivity
"All API request attempts failed"	Retries exhausted for API request	Check network stability and server availability

Common Issues - Quick fixes for frequently encountered problems
Agent Troubleshooting - Detailed agent diagnostics
FAQ - Frequently asked questions
Performance Optimization - System and agent tuning

Troubleshooting Guide#

Table of Contents#

Quick Diagnostics: System Health Dashboard#

Accessing the Health Dashboard#

Service Status Cards#

PostgreSQL#

Redis#

File Storage#

Application#

Health Dashboard Features#

Using Health Data for Troubleshooting#

Authentication Issues#

Cannot Log In#

Session Expired#

Agent Authentication Failures#

Agent Network and Connection Issues#

Automatic Retry Behavior#

Circuit Breaker Errors#

Configurable Connection Settings#

Project Access Issues#

Missing Projects#

Permission Denied#

Campaign Issues#

Campaign Won't Start#

Campaign Stuck with No Progress#

Campaign Shows Wrong Progress#

Campaign Quarantine#

Attack Issues#

Attack Validation Errors#

Attack Stays in Pending State#

Attack Fails Immediately#

Agent Issues#

Quick Checks#

Connection Troubleshooting#

Task Issues#

Task Stuck in Pending#

Task Failures#

Task Cancellation#

Task Retry Procedures#

Task Reassignment#

Task Status History#

Resource Issues#

Upload Failures#

Resource Access Denied#

Missing Resources#

Performance Issues#

Slow Web Interface#

High Server Resource Usage#

Slow Agent Performance#

Live Updates Issues#

Real-Time Updates Not Working#

Stale Data Warning#

Getting Help#

When to Contact Administrators#

What Information to Provide#

Log Locations#

Server Logs#

Docker Logs#

Agent Logs#

Common Error Messages#

Related Guides#