Crackable Upload Processing Pipeline#
Here's a Skirmish-style implementation checklist for the Crackable Upload Processing Pipeline
Note: This document contains Python code examples as reference implementations. The actual CipherSwarm V2 implementation uses Ruby on Rails 8.0+ with Hotwire (Turbo + Stimulus), ActiveJob/Sidekiq for background processing, and ActiveStorage for file handling. See requirements.md for the complete technology stack.
Table of Contents#
Crackable Upload Plugin Pipeline - Skirmish Task Checklist#
Review 📂 Crackable Uploads section in docs/v2_rewrite_implementation_plan/phase-2-api-implementation-parts/phase-2-api-implementation-part-2.md for the complete description of the crackable uploads pipeline and the high level tasks that need to be completed. The tasks below are a more detailed breakdown of the tasks that need to be completed before the crackable uploads pipeline can be implemented.
Backend Models & DB#
- Create
HashUploadTaskmodel:- Fields:
id,user_id,filename,status,started_at,finished_at,error_count,hash_list_id,campaign_id
- Fields:
- Create
UploadErrorEntrymodel:- Fields:
id,upload_id,line_number,raw_line,error_message
- Fields:
- Add Alembic migrations for both models
File Handling + Storage#
- Create a new
UploadResourceFilemodel:- Similar to
AttackResourceFileexcept this is for the purpose of uploading a file or text blob to the bucket and then only downloaded by the processing task - It might be possible to use a subclass of
AttackResourceFileto avoid duplicating code, have a specific variant of theAttackResourceFilemodel only for uploads, or to have them share a common base class. The big difference is that theUploadResourceFilemodel is not used for any other purpose than to store the uploaded file and is not used for any other attack resources, so there's never aline_formatfield.
- Similar to
- Return a presigned url to allow the user to upload a file or save the text blob in the
contentfield of theUploadResourceFilemodel. This will require updates toapp/api/v1/endpoints/web/uploads.py,app/core/services/resource_service.py, andapp/core/services/storage_service.py. Triggering the upload will create a newHashUploadTaskmodel and theUploadResourceFilemodel will be linked to theHashUploadTaskmodel. - Create a new
RawHashmodel:- Fields:
id: int,hash: str,hash_type: HashType,username: str,metadata: dict[str, str],line_number: int,upload_error_entry_id: int | None - This will be used to store the raw hashes extracted from the file.
- Fields:
Plugin Interface & Dispatch#
The logical pipeline for the crackable uploads plugin is as follows:
- The user initiated the upload task, which creates a
HashUploadTaskmodel. - The user uploads a file or text blob to the system, which is stored in the
UploadResourceFilemodel. - The file is downloaded by a background task to a temporary location and the
extract_hashes()function is called on the appropriate plugin, based on the file extension selected during the creation of theHashUploadTask, to extract the hashes from the file. - Each hash is extracted from the file and added to the
HashUploadTaskmodel araw_hashesfield, which is a list ofRawHashobjects, each containing the hash and the hash type, as identified by theHashGuessService. - A new
CampaignandHashListare created withis_unavailableset toTrue, and theHashUploadTaskis linked to theCampaignandHashList. - Each hash in the
raw_hashesfield is parsed and converted to the appropriate hashcat-compatible format using theparse_hash_line()function. The resulting formatted hash is added to the generatedHashListmodel as a newHashItem, along with theusernameandmetadatafields. The metadata field is a dictionary of key-value pairs that are extracted from the hash, and is defined by the plugin. If there are errors, theUploadErrorEntrymodel is created to store the error message and the line number of the hash that caused the error. - If all raw hashes are successfully parsed, the
HashListandCampaignmodels are updated to reflect the status of the upload and processing, and theHashUploadTaskis updated to reflect the status of the upload and processing. If there are noUploadErrorEntryobjects and no unparsed hashes, theUploadResourceFileis marked for deletion. - The user is notified of the status of the upload and processing. If there are no errors, the
CampaignandHashListmodels are updated to set theis_unavailablefield toFalseand the campaign status remains inDRAFT. If there are errors, the campaign status remains inDRAFTand theHashListandCampaignmodels are updated to set theis_unavailablefield toTrue, allowing the user to edit the hash list and campaign to fix the errors.
-
Create
plugins/folder with base interface:def extract_hashes(path: Path) -> list[RawHash]: ... -
Add
plugins/shadow_plugin.py(first plugin implementation) -
Add dispatcher:
- Loads plugin based on extension (or selected by the user in the UI during the upload task creation)
- Validates it implements
extract_hashes()
-
Raise and log
PluginExecutionErrorexception if plugin fails -
Add tests for the plugin interface and dispatcher. The tests include verifying that the plugin is loaded and that the
extract_hashes()function is implemented correctly. Use shadow_plugin.py as the reference plugin for the tests.
Hash Parsing & Conversion#
-
Implement
def parse_hash_line(raw_hash: RawHash) -> ParsedHashLine | None: ...- Validates format
- Extracts:
username: str | None,hashcat_hash: str,metadata: dict[str, str]
-
Add call to hash type guessing (use
HashGuessServicefromapp.core.services.hash_guess_service) inparse_hash_line() -
Enforce type confidence threshold before inserting
-
Create an initial reference plugin implementation to use for tests that supports the standard linux
shadowfile format usingsha512crypthashes. It should allow either a standard shadow file or a combined "unshadowed" file generated by theunshadowtool (see unshadow man page for reference). Every plugin should be a python file in theplugins/folder and should be a valid python module. The plugin file should have been created in the previou set of tasks, so it just needs to be updated to implement theextract_hashes()function, along with a set of tests to verify the plugin is working as expected.
HashList + Campaign Creation#
- Create ephemeral HashList:
- Make a
HashListwithhash_typematching the most confident guess from theHashGuessService - Add a flag to the hash list:
is_unavailable - Include all hashes as
HashItemobjects in the hash list, if they are successfully parsed
- Make a
- Create Campaign under current user's project
- Add a flag to the campaign:
is_unavailable
- Add a flag to the campaign:
- Ensure that
CampaignandHashListmodels withis_unavailableset toTrueare not returned by the normal campaign and hash list endpoints.
Task Runner + Status Updater#
-
Create background task:
def process_uploaded_hash_file(upload_id: int) -> None: ... -
Ensure the background task executes the full processing pipeline described above, including the creation of the
HashListandCampaignmodels, the parsing of the hashes, and the creation of theHashItemobjects. The trigger for the background task should be the creation of theHashUploadTaskmodel by the user in thePOST /api/v1/web/uploads/endpoint. Verify all steps are implemented in the background task.task_id:upload.integrate_background_task_pipeline. The steps should be:- Load the
HashUploadTaskmodel by ID - Load the
UploadResourceFilemodel by ID - Download the file from the
UploadResourceFilemodel to a temporary location - Update the
HashUploadTaskmodel to reflect the status of the upload and processing (e.g.,status=running) - Call the appropriate plugin to extract the hashes from the file and add them to the
HashUploadTaskmodel asRawHashobjects- Log failed lines to
UploadErrorEntrymodel - If the file is not a valid hash file, set the
HashUploadTaskmodel to reflect the status of the upload and processing (e.g.,status=failed) and theUploadResourceFilemodel to reflect the status of the upload and processing (e.g.,status=failed) and do not continue with the processing pipeline.
- Log failed lines to
- Create the
HashListandCampaignmodels withis_unavailableset toTrueand link them to theHashUploadTaskmodel. TheHashListmodel should be created with thehash_typematching the most confident guess from theHashGuessServiceand theCampaignmodel should be created under the current user's project. - Parse the
RawHashobjects intoHashItemobjects and add them to theHashListmodel asHashItemobjects. TheHashItemobjects should be created with theusernameandmetadatafields set to the values from theRawHashobject. - Update the
HashListandCampaignmodels to reflect the status of the upload and processing (e.g.,is_unavailable=False) and theHashUploadTaskmodel to reflect the status of the upload and processing (e.g.,status=completed) - Update the
HashUploadTaskmodel to reflect the status of the upload and processing (e.g.,status=completed) if no errors were encountered, otherwise set tofailedorpartial_failuredepending on whether some successfull hashes were parsed.
- Load the
API Endpoints#
-
POST /api/v1/web/uploads/- Accept file upload
- Triggers background task
-
GET /api/v1/web/uploads/{id}/status- Returns:
status,started_at,finished_aterror_count
-
GET /api/v1/web/uploads/{id}/errors- Returns list of failed lines (paginated) (derive fromPaginatedResponseinapp.schemas.shared)
Tests#
- Unit tests for plugin interface and dispatcher
- Hash parser + inference tests (use HashGuessService from
app.core.services.hash_guess_service) - Integration test: full upload flow with synthetic data
- Permission test: Only allow upload for authenticated users
Security & Hardening#
- Sanitize file names and restrict extensions (
shadow,.pdf,.zip,.7z,.docx, etc.) - Set upload size limit configured in the
app.core.config.settings(e.g.,UPLOAD_MAX_SIZE = 100 * 1024 * 1024, default 100MB) - Escape all user-visible error lines in UI
UI Integration Prep#
- Define structure for status polling (
/api/v1/web/uploads/{id}/status) - This should return the status of the upload task, including the hash type, extracted preview, and validation state. It should also return the ID of the uploaded resource file, along with an upload task ID that can be used to view the new upload processing progress in the UI. Status information and task completion information should be returned for each step of the upload and processing process to reflect the current state in the UI.task_id:upload.status_polling_structure - Ensure error lines are returned in full with line number + reason - This should be displayed in the UI by using the
GET /api/v1/web/uploads/{id}/errorsendpoint. It should be a list ofUploadErrorEntryobjects that are paginated (derive fromPaginatedResponseinapp.schemas.shared)task_id:upload.error_lines_returned- Write integration tests for the
GET /api/v1/web/uploads/{id}/errorsendpoint. - Write integration tests for the
UploadErrorEntrymodel.
- Write integration tests for the
- Add status to Campaign and HashList models:
is_unavailable- This should be used to indicate that the hash list and campaign are still being processed and are not ready to be used. This should default toFalsefor new campaigns and hash lists, but should be set toTrueonly when created by the crackable upload task and reverted toFalsewhen the processing is complete.task_id:upload.is_unavailable_status- Write integration tests to ensure that the
is_unavailablefield is set toTruewhen the crackable upload task is created and reverted toFalsewhen the processing is complete. - Write integration tests to ensure that unavailable campaigns and hash lists are not returned by the normal campaign and hash list endpoints.
- Write integration tests to ensure that the