Critical Fixes and Optimizations#
Race Conditions#
Race conditions were addressed in several areas of the CipherSwarm project. In the submit_crack API endpoint, a race condition was fixed by wrapping the entire hash cracking operation in a single database transaction. This ensures atomicity and data consistency during concurrent submissions, preventing partial updates or conflicting state changes when multiple agents attempt to crack the same hash simultaneously. Additionally, a state machine was introduced for tasks and attacks to enforce safe state transitions and further reduce the risk of race conditions in distributed job processing and API interactions [PR #268][PR #50].
The ProcessHashListJob uses an atomic locking mechanism to prevent duplicate processing when after_commit callbacks fire multiple times (e.g., once for record save and once for attachment commit). The job atomically claims work using UPDATE ... WHERE processed=false, sets the processed flag immediately, and rolls back the flag on errors to allow retries [PR #615].
Transaction Safety#
Transaction safety was improved by ensuring that all critical data modifications, especially those involving multiple related records, occur within explicit database transactions. For example, the ProcessHashListJob uses transactions to perform bulk inserts and updates, guaranteeing that either all changes are committed or none are, thus maintaining data integrity. The same approach is used in the submit_crack endpoint, where updates to hash items and associated task state transitions are performed atomically [PR #268][process_hash_list_job.rb].
Parameter Sanitization#
Parameter sanitization is enforced at the API layer using strong parameter filtering. Controllers such as AttacksController and API endpoints for status submission use Rails strong parameters to whitelist only trusted fields, preventing mass assignment vulnerabilities and injection attacks. The base API controller also rescues from ActionController::ParameterMissing exceptions, returning a clear error response for missing or malformed input [PR #268][attacks_controller.rb].
Cryptographic Security#
Constant-Time Token Authentication#
Agent API authentication uses constant-time comparison (ActiveSupport::SecurityUtils.secure_compare) to prevent timing attacks that could enumerate valid agent tokens. The implementation compares provided tokens against a dummy value when no matching agent is found, equalizing the comparison time regardless of whether the token exists. This prevents attackers from using timing side channels to determine valid token values through statistical analysis of API response times [PR #788].
tusd Webhook Authentication#
The tusd webhook endpoint (/api/v1/hooks/tus) enforces shared-secret authentication via the TUSD_HOOK_SECRET environment variable. The endpoint validates the X-Tusd-Hook-Secret HTTP header against the configured secret using constant-time comparison. This prevents cache poisoning attacks where unauthenticated POST requests could inject malicious upload metadata. The shared secret must be configured identically in both the Rails application and the tusd service [PR #788].
In development environments, webhook authentication is optional (when TUSD_HOOK_SECRET is unset). In production environments, the application fails fast at startup if TUSD_HOOK_SECRET is not configured.
Path Traversal Protection#
The TusUploadHandler concern implements path traversal protection via validate_source_path!, which ensures that all file access remains within the tusd uploads directory. The validation uses File.realpath to resolve symbolic links and relative paths, then verifies the canonical path starts with the canonical tusd directory. This prevents attackers from using specially crafted upload IDs (e.g., ../../../etc/passwd) to access files outside the intended storage location [PR #788].
DNS Rebinding Protection#
Production deployments can enable DNS rebinding protection by setting the APPLICATION_HOST environment variable to the application's hostname (e.g., cipherswarm.lab.local). When configured, Rails validates the Host header on all incoming requests, rejecting requests from unexpected hostnames. This prevents DNS rebinding attacks where an attacker-controlled domain resolves to the application's internal IP address. Health check endpoints (/up, /api/v1/client/health) are exempted from host validation to ensure monitoring systems can reach them [PR #788].
In non-production environments or deployments without a stable hostname, host checking is disabled for backward compatibility.
Nginx Security Headers#
The nginx reverse proxy configuration adds defense-in-depth security headers:
- X-Content-Type-Options: nosniff — Prevents MIME sniffing attacks where browsers incorrectly interpret response content types
- X-Frame-Options: SAMEORIGIN — Prevents clickjacking by disallowing the application from being embedded in cross-origin frames
- Referrer-Policy: strict-origin-when-cross-origin — Controls referrer information sent to external sites
- Permissions-Policy — Disables browser features (camera, microphone, geolocation, USB) not required by the application
These headers are set at the reverse proxy layer with always to ensure they appear even on error responses. Strict-Transport-Security (HSTS) should only be configured when TLS is terminated at the nginx layer; when TLS terminates upstream (e.g., cloud load balancer), the upstream proxy should set HSTS instead [PR #788].
Production Secret Guards#
Production deployments require two critical secrets to be configured at startup:
- TUSD_HOOK_SECRET — Authenticates webhook callbacks from the tusd service
- POSTGRES_PASSWORD — Authenticates database connections
The application fails fast with clear error messages if either secret is unset in production mode. This prevents insecure deployments with default or missing credentials [PR #788].
Array Length Validation#
Array length constraints are enforced in both the OpenAPI specification and Rails model validations to prevent denial-of-service attacks via unbounded array payloads. This defense-in-depth approach ensures validation occurs at both the API schema level and the application model level [PR #636].
Fixed-Length Arrays
The following arrays use exact-length validation (exactly 2 elements) because hashcat's output format is fixed:
HashcatStatus.progress: Exactly 2 elements (current progress and total)HashcatStatus.recovered_hashes: Exactly 2 elements (recovered count and total)HashcatStatus.recovered_salts: Exactly 2 elements (recovered count and total)
These constraints are enforced via minItems: 2 and maxItems: 2 in the OpenAPI schema, backed by Rails model validations that reject arrays with length != 2.
Variable-Length Arrays
The following arrays use maximum length limits to prevent resource exhaustion:
Agent.devices: Maximum 64 itemsHashcatStatus.device_statuses: Maximum 64 items
These constraints prevent malicious actors from sending unbounded arrays that could exhaust server memory or processing resources. The 64-item limit accommodates realistic multi-GPU configurations while providing a reasonable upper bound for API hardening.
Performance Optimizations#
Batch Processing#
Batch processing is implemented in jobs that handle large data sets, such as ProcessHashListJob. Hash items are read from files and processed in configurable batches (default size: 1000), reducing memory usage and improving throughput. Bulk insert operations (insert_all) are used to efficiently write large numbers of records, and validations are intentionally skipped for trusted data to maximize performance.
The HashItem model automatically computes hash_value_digest via a before_validation callback when individual records are saved. However, bulk operations (insert_all, upsert_all) bypass ActiveRecord callbacks, so jobs must compute the digest inline: hash_value_digest: Digest::MD5.hexdigest(hash_value). This ensures the indexed digest column is populated for all inserts. Queries use digest-based lookups with collision guards (verifying the full hash_value matches) to handle potential MD5 collisions, as documented in GOTCHAS.md [PR #811].
Bulk updates for cracked hashes use upsert_all to batch all modifications into a single SQL statement, replacing individual per-item UPDATE queries. For hash lists with 100,000+ cracked items, this reduces the query count from 100,000+ individual UPDATEs to a single bulk operation [PR #788][process_hash_list_job.rb].
Memory-Efficient Cracked-Hash Lookups
The ProcessHashListJob uses joins(:hash_list).pluck(:hash_value_digest, :hash_value, :plain_text, :attack_id) instead of includes(:hash_list).index_by(&:hash_value) when checking for already-cracked hashes. This eliminates ActiveRecord object instantiation during cracked-hash lookups, which was causing out-of-memory (OOM) errors on hash lists exceeding 100GB. The pluck-based approach fetches only the required columns as simple Ruby values instead of full ActiveRecord objects, dramatically reducing memory overhead. Memory usage is O(matched rows × 4 scalars) rather than O(matched rows × full AR object graph), preventing memory exhaustion when processing hash lists with millions of entries [PR #801].
Queries use hash_value_digest for efficient indexed lookups, with collision guards to verify the full hash_value matches. Since MD5 is not collision-resistant, the lookup returns all candidates with the same digest, and the code uses Ruby's .find to confirm the exact hash_value match. This ensures correctness while maintaining query performance through indexed access [PR #811].
Database Indexing#
Critical database indexes were added to optimize query performance. Composite indexes on the hash_items table (by hash_value_digest and hash_list_id, and by hash_value_digest and cracked) accelerate frequent lookups during hash cracking and result submission. Additional indexes on the agents table (state, last_seen_at) and the tasks table (activity_timestamp) improve filtering and reduce query latency for job scheduling and agent monitoring [add_performance_indexes.rb].
The hash_value_digest column is an MD5 hex digest (32 characters) of hash_value, used to work around PostgreSQL's ~2704 byte B-tree index limit on long TEXT values. When hash_value exceeds this limit, direct indexing fails, so the digest provides a fixed-length indexable key. Queries use digest-based lookups with collision guards (checking the full hash_value) to handle potential MD5 collisions [PR #811].
Association Touch Storm Elimination#
The touch: true option was removed from HashcatStatus and AgentError associations to eliminate cascading UPDATE storms. Each status submission (occurring every 5-30 seconds per active agent) previously triggered 6-10 additional UPDATE queries as touches cascaded through Task → Attack associations and fired multiple after_commit callbacks. Task and attack freshness is tracked via explicit state machine transitions rather than cascading timestamp updates [PR #788].
ETA Cache Optimization#
The CampaignETACalculator uses a time-based 30-second TTL cache instead of freshness-based cache keys. The previous implementation executed 3 SQL queries (attacks.maximum(:updated_at), campaign.attack_ids, tasks.maximum(:updated_at)) on every cache lookup to construct a freshness-based key, defeating the purpose of caching when the calculator was invoked multiple times per request. ETA estimates do not require sub-second freshness, so a simple TTL-based approach provides better performance [PR #788].
ETA Broadcast Throttling#
Campaign ETA broadcasts are throttled to fire only when meaningful data changes (priority, attacks_count, quarantined), rather than on every cascading touch or timestamp update. This prevents unnecessary Turbo Stream broadcasts that would re-render ETA displays without any visible changes [PR #788].
Heartbeat Throttling#
The agent heartbeat state machine event is guarded within the 30-second last-seen update check. Heartbeat transitions are only fired when the agent state needs to change, preventing redundant state machine processing on every API request [PR #788].
Impact on API Endpoints, Job Processing, and Database Queries#
API endpoints now benefit from improved data consistency and error handling. For example, hash cracking submissions and status updates are protected against partial updates and race conditions, and input is strictly validated. Job processing is more robust and scalable, with batch processing and retry mechanisms for transient errors. Database queries are faster and more reliable due to targeted indexing and reduced N+1 query patterns (e.g., eager loading associations in campaign queries).
Testing Recommendations#
Concurrency#
To ensure robustness under concurrent workloads, implement environment parity testing and infrastructure-as-code (IaC) templates to maintain consistent deployment configurations across environments. Use health checks (e.g., SidekiqAlive) to monitor worker availability. For critical workflows, write integration tests that simulate concurrent submissions and verify atomicity and data consistency. Performance tests should stress the system with simultaneous job executions and API requests to detect race conditions or deadlocks [issue #408].
Large File Processing#
Test batch processing jobs (such as ProcessHashListJob) with files of varying sizes, including edge cases with very large files. Verify that memory usage remains stable and that all records are processed without data loss or duplication. Use background job test suites to simulate file ingestion and validate that retry mechanisms handle transient failures gracefully.
Security Validation#
Achieve 100% test coverage for authorization logic and data access control. Implement unit tests for service layers and models, integration tests for role-based access and project scoping, and system tests for UI visibility and multi-user scenarios. Use API contract testing tools (such as Rswag) to verify endpoint compatibility and prevent regressions. Always validate project membership and permissions on the server side, and audit log all sensitive operations. Test for mass assignment vulnerabilities and ensure strong parameter filtering is enforced throughout the API [issue #426][issue #431][issue #436].
HTTP Status Codes for Authentication and Authorization#
The application uses HTTP status codes according to RFC 9110 semantics:
- HTTP 401 Unauthorized: Returned when a user is not authenticated (not logged in). The user must authenticate before retrying the request.
- HTTP 403 Forbidden: Returned when an authenticated user lacks permission to access a resource. The application returns 403 for
CanCan::AccessDeniedexceptions to correctly indicate authorization failures. JSON error responses for 403 status use{"error": "Forbidden", "status": 403}to align with RFC terminology, where "Forbidden" indicates an authorization failure (not "Not Authorized," which would imply 401 authentication failure) [PR #647][PR #739].
This distinction improves API clarity and helps clients diagnose whether they need to authenticate or whether the authenticated user requires different permissions.
Authorization Enforcement for File Downloads#
All downloadable resource actions (download, view_file, view_file_content) in the Downloadable concern enforce mandatory authorize! calls. This ensures that users can only access files they have permission to view, preventing unauthorized access to sensitive resources such as word lists, rule lists, and mask lists. The resource_class helper safely extracts the resource class name from the controller path, with NameError handling for edge cases [PR #647].
File Preview Safety Limits#
The view_file_content action implements streaming safeguards to prevent out-of-memory (OOM) crashes when previewing large files. A MAX_PREVIEW_BYTES constant is set to 5 MB, establishing a hard limit on the amount of data that can be streamed during a single preview operation. This protects the Rails service from resource exhaustion when users attempt to preview extremely large word lists, mask lists, or rule lists.
The streaming implementation processes file content in chunks directly from ActiveStorage, without loading the entire file into memory. The method uses a buffer to accumulate chunks and extracts complete lines incrementally, terminating early via throw(:preview_limit_reached) when either the byte cap or line limit is reached. This approach efficiently handles files of any size, including edge cases such as files with no newlines, while maintaining bounded memory usage.
Preview line limits are clamped between 1 and 5000 lines, with a default of 1000 lines. The effective limit is calculated in the view_file action and passed through to the lazy-loaded Turbo Frame for file content. When the byte or line limit is reached, users see a message indicating "Showing first X lines of file," making it clear that they are viewing a preview rather than the complete file. This combination of byte and line limits prevents denial-of-service scenarios where malicious or accidental requests could crash the Rails service through unbounded memory consumption [PR #697].
Turbo Frame Security#
When a Turbo Frame request encounters an authorization failure, the application renders a special _not_authorized_frame.html.erb partial with HTTP 403 status. This prevents perpetual "Loading..." states in Turbo Frames and provides proper user feedback. The partial displays an internationalized error message (errors.not_authorized) within the Turbo Frame context, improving security UX by making authorization failures visible to users [PR #647].
Example: Batch Processing in ProcessHashListJob#
def perform(id)
list = HashList.find(id)
return if list.processed?
# Acquire an atomic lock to prevent duplicate processing from concurrent jobs.
# The after_commit callback can fire multiple times (record save + attachment commit),
# causing two jobs to race. This UPDATE ... WHERE atomically claims the work.
rows_claimed = HashList.where(id: id, processed: false)
.update_all(processed: true)
return if rows_claimed.zero?
ingest_hash_items(list)
rescue StandardError
# Roll back the processed flag so the job can be retried.
# Wrapped in its own rescue to ensure the original exception always propagates.
begin
HashList.where(id: id).update_all(processed: false)
rescue StandardError => rollback_error
Rails.logger.error("[ProcessHashList] Failed to roll back processed flag for list #{id}: #{rollback_error.message}")
end
raise
end
private
def ingest_hash_items(list)
# Clean up any partial results from a prior failed attempt to ensure idempotent ingestion.
list.hash_items.delete_all
hash_items = []
processed_count = 0
list.file.open do |file|
file.each_line do |line|
next if line.blank?
line.strip!
# Bulk operations bypass ActiveRecord callbacks, so hash_value_digest must be computed inline.
# When individual HashItem records are saved, the before_validation callback auto-computes
# the digest, but insert_all skips callbacks for performance.
hash_items << {
hash_value: line,
hash_value_digest: Digest::MD5.hexdigest(line),
metadata: {},
hash_list_id: list.id,
created_at: Time.current,
updated_at: Time.current,
cracked: false
}
if hash_items.size >= batch_size
process_batch(list, hash_items)
processed_count += hash_items.size
hash_items.clear
end
end
end
process_batch(list, hash_items) if hash_items.any?
# Update the hash items count now that processing is complete.
# The `processed` flag was already set atomically at the start to prevent duplicates.
if processed_count.positive?
affected_rows = HashList.where(id: list.id).update_all(
hash_items_count: processed_count
)
if affected_rows.zero?
error_msg = "[ProcessHashList] Failed to update hash list #{list.id} count - record may have been deleted"
Rails.logger.error(error_msg)
raise ActiveRecord::RecordNotSaved, error_msg
end
end
end
def process_batch(list, hash_items)
# ... (bulk insert code) ...
# Check for previously cracked hashes and propagate crack results.
# Query uses hash_value_digest for indexed lookup, then verifies full hash_value
# as a collision guard (MD5 is not collision-resistant).
hash_value_digests = hash_values.map { |v| Digest::MD5.hexdigest(v) }
cracked_hashes = HashItem.joins(:hash_list)
.where(hash_value_digest: hash_value_digests, cracked: true, hash_lists: { hash_type_id: list.hash_type_id })
.pluck(:hash_value_digest, :hash_value, :plain_text, :attack_id)
.each_with_object(Hash.new { |h, k| h[k] = [] }) do |(digest, hv, pt, aid), acc|
acc[digest] << [hv, pt, aid]
end
if cracked_hashes.any?
now = Time.current
updates = []
inserted_items.each do |inserted|
digest = Digest::MD5.hexdigest(inserted["hash_value"])
candidates = cracked_hashes[digest]
next if candidates.empty?
# Collision guard: verify the full hash_value matches (not just the digest).
match = candidates.find { |hv, _, _| hv == inserted["hash_value"] }
next unless match
_original_hash_value, plain_text, attack_id = match
# All NOT NULL columns required in payload for upsert_all
updates << {
id: inserted["id"],
hash_list_id: list.id,
hash_value: inserted["hash_value"],
hash_value_digest: digest,
metadata: {},
created_at: now,
updated_at: now,
plain_text: plain_text,
cracked: true,
cracked_time: now,
attack_id: attack_id
}
end
# Bulk update all cracked items in a single query
HashItem.upsert_all(updates, unique_by: :id) if updates.any?
end
end
This approach ensures efficient, safe, and scalable ingestion of large hash lists. The atomic lock at the start prevents duplicate processing from concurrent jobs, while error handling rolls back the processed flag to allow retries on transient failures. The job cleans up any partial results from prior failed attempts at the start of ingest_hash_items, making the entire operation idempotent.
The hash_value_digest field must be computed inline for bulk operations since insert_all and upsert_all bypass ActiveRecord callbacks. Individual record saves use the before_validation callback to auto-compute the digest. Query patterns use where(hash_value_digest: digest) for indexed lookups, followed by Ruby-side .find { |item| item.hash_value == hash_value } collision guards for single-record lookups, or SQL AND hash_value = ? clauses for batch updates. This pattern is documented in GOTCHAS.md. Log messages use the [ProcessHashList] prefix for easier filtering and debugging.