Documents
AI Output Safety Guardrails
AI Output Safety Guardrails
Type
Topic
Status
Published
Created
Apr 22, 2026
Updated
Apr 22, 2026
Created by
Dosu Bot
Updated by
Dosu Bot

AI Output Safety Guardrails#

The better-stale-bot workflow constrains every GitHub write operation produced by the AI agent through a safe-outputs system — a declarative layer that caps operation counts, restricts allowed label values, and validates field contents before any mutations reach the GitHub API. All guardrails are defined in YAML frontmatter and compiled into the lock file; they cannot be bypassed at runtime.


Safe Output Types & Configuration#

Issues & Discussions:

Safe OutputKey Configuration Options
create-issueauto-expiration (expires field for time-limited issues), group: true (sub-issue grouping), close-older-issues (automatically close previous issues), group-by-day
update-issuetarget, field update controls
close-issuetarget, required-labels, state-reason (completed, not_planned, duplicate)
link-sub-issueParent-child issue relationships
create-discussionDiscussion board posting
update-discussionModify existing discussions
close-discussionClose discussions

Pull Requests:

Safe OutputKey Configuration Options
create-pull-requestBranch, base, title, body configuration
update-pull-requestField updates for existing PRs
close-pull-requestPR closure
create-pull-request-review-commentPR review comments
reply-to-pull-request-review-commentReview comment threading
resolve-pull-request-review-threadMark threads resolved
add-reviewerAssign PR reviewers
push-to-pull-request-branchDirect branch updates

Labels & Assignments:

Safe OutputKey Configuration Options
add-commenttarget (triggering issue, "*", or number), hide-older-comments, footer control
hide-commentHide comments
add-labelsallowed list (restrict to specific labels), blocked patterns (glob support)
remove-labelsallowed list, blocked patterns (glob support)
assign-milestoneMilestone assignment
assign-to-agentAgent assignment
assign-to-userUser assignment
unassign-from-userRemove user assignment

Projects & Releases:

Safe OutputKey Configuration Options
create-projectGitHub Projects creation
update-projectModify existing projects
create-project-status-updateProject status updates
update-releaseRelease management
upload-assetRelease asset uploads
upload-artifactWorkflow artifact uploads
skip-archiveArchive bypass

Security & Agent Tasks:

Safe OutputKey Configuration Options
dispatch-workflowTrigger workflow runs
call-workflowWorkflow call integration
dispatch_repositoryRepository dispatch events
create-code-scanning-alertCreate security alerts
autofix-code-scanning-alertAuto-remediate security findings
create-agent-sessionAgent session management

System Types (auto-enabled):

Safe OutputKey Configuration Options
noopmessage field — Required when no action is taken; hard cap of 1 per run; report-as-issue: false disables automatic no-op reporting
missing-toolSystem-generated for missing tool calls
missing-dataSystem-generated for data gaps

Custom:

Safe OutputKey Configuration Options
jobsCustom post-processing job definitions
actionsGitHub Action wrapper integrations

Each safe output includes a hidden workflow-id marker (<!-- gh-aw-workflow-id: WORKFLOW_NAME -->) for searchability. For workflow_call triggers, outputs are auto-injected (created_issue_number, created_issue_url, etc.).

All safe output types support cross-repository operations via target-repo and allowed-repos.

Safe output operation counts can be capped per run (e.g., max: 30). The maximum value is configured in frontmatter and requires recompilation to change. The compiled lock file injects these caps into the agent's system prompt as :

Tools: add_comment(max:30), close_issue(max:30), add_labels(max:30), remove_labels(max:30), …

The same limits are also enforced at the handler level in the safe_outputs job via GH_AW_SAFE_OUTPUTS_HANDLER_CONFIG, which re-applies the config when processing the agent's JSONL output.

Changing caps: Edit max: values in the frontmatter of your workflow .md file, then run gh aw compile. The lock file is the enforced artifact — the source .md alone is not active . Changes to safe-outputs configuration in frontmatter require workflow recompilation to take effect, unlike changes to the markdown body (instructions), which take effect on the next run.


Label Restrictions#

The add-labels and remove-labels safe outputs support both allowed (whitelist) and blocked (blacklist) patterns using glob syntax :

add-labels:
  max: 30
  allowed: ["bug-*", "priority-*"]
  blocked: ["internal-*"]
remove-labels:
  max: 30
  allowed: ["Stale"]

The compiled workflow surfaces these constraints in tool descriptions so the model is aware at inference time :

  • add_labels"CONSTRAINTS: Maximum 30 label(s) can be added. Only these labels are allowed: ["bug-*", "priority-*"]. Labels matching these patterns are blocked: ["internal-*"]."
  • remove_labels"CONSTRAINTS: Maximum 30 label(s) can be removed. Only these labels can be removed: [Stale]."

The handler also re-validates the allowed and blocked lists before executing any label operation , ensuring constraints are enforced even if the model ignores its own tool description.


Field-Level Validation#

The Write Safe Outputs Tools step emits a GH_AW_VALIDATION_JSON schema that constrains every field of every safe output:

OutputKey constraint
add_commentbody: string, sanitized, max 65,000 chars
close_issuebody: string, sanitized, max 65,000 chars; issue_number: optional positive integer
add_labels / remove_labelslabels: array of strings, each sanitized, max 128 chars
noopmessage: string, sanitized, max 65,000 chars; hard cap of 1 per run

Critical: The noop safe output must be called when the agent finishes without taking any GitHub action. If the agent completes without calling any safe-output tool, the workflow fails silently. By default, noop runs are posted as issues (the [aw] No-Op Runs issue), which is why agentic-workflows is included in the exempt label list. To disable this behavior, set report-as-issue: false in the noop safe-output configuration.


Advanced Configuration Options#

Failure Handling:

  • report-failure-as-issue: false — suppress automatic failure issue creation
  • failure-issue-repo: owner/repo — redirect failure issues to a different repository
  • group-reports: true — group failed runs under a parent issue

Security & Limits:

  • allowed-domains — URL sanitization in output (restrict to specific domains)
  • max-bot-mentions — control bot trigger phrase escaping (default: 10)
  • max-patch-size — maximum patch size for PR operations (default: 1024 KB)

Infrastructure:

  • runs-on — custom runner image for safe output jobs
  • concurrency-group — concurrency control for safe outputs job execution
  • environment — deployment environment scoping for safe outputs

Customization:

  • messages — custom notification templates for various safe output events

Staged Mode#

Users can preview what safe outputs a workflow would create without actually executing them by adding staged: true to the safe-outputs: block in the workflow configuration. This is useful for testing workflows before letting them take real actions on the repository.


Replaying Safe Outputs#

If the safe_outputs job fails due to transient API errors or threat detection blocking, users can replay safe outputs from a previous run without re-running the entire agent workflow. Use the Agentic Maintenance workflow and provide the failed run URL to recover from failures and retry applying the safe outputs.


Exempt Issue Protection#

The agent's instructions prevent it from ever targeting issues that carry any of these labels :

agentic-workflows, pinned, security, help wanted

Issues with an exempt label are excluded from Bucket B (potentially-stale) evaluation entirely , so the bot will never add Stale or close them regardless of inactivity duration.


Defense-in-Depth Architecture#

The guardrails are enforced at multiple layers, in order:

  1. Agent prompt — safe-output tool declarations with inline constraints are injected at prompt time , so the model is steered toward compliant outputs.
  2. MCP read-only server — the GitHub MCP server runs with GITHUB_READ_ONLY: "1" and only the issues toolset during the agent phase , making direct write calls impossible.
  3. Safe Outputs MCP server — all write intents are funnelled through a separate HTTP MCP server (started here) that records outputs to a JSONL file rather than executing them immediately. The agent cannot write to GitHub directly; it only produces a structured artifact describing intended actions.
  4. Threat detection job — a second Claude agent reviews all proposed outputs before any GitHub API call is made ; the safe_outputs job only runs if detection succeeds . This AI-powered scan checks for prompt injection attacks, leaked credentials, and malicious code patterns.
  5. Safe outputs handler — the safe_outputs job runs with scoped write permissions and re-validates every output against the full config (caps + label allowlist) before dispatching to the GitHub API. Only what the workflow permits is applied.

Note: better-stale-bot defaults to Claude Haiku with engine: { id: claude, model: haiku } for both the agent and threat detection jobs.

Key files:

AI Output Safety Guardrails | Dosu