AI Output Safety Guardrails

AI Output Safety Guardrails#

The better-stale-bot workflow constrains every GitHub write operation produced by the AI agent through a safe-outputs system — a declarative layer that caps operation counts, restricts allowed label values, and validates field contents before any mutations reach the GitHub API. All guardrails are defined in YAML frontmatter and compiled into the lock file; they cannot be bypassed at runtime.

Safe Output Types & Configuration#

Issues & Discussions:

Safe Output	Key Configuration Options
`create-issue`	`auto-expiration` (expires field for time-limited issues), `group: true` (sub-issue grouping), `close-older-issues` (automatically close previous issues), `group-by-day`
`update-issue`	`target`, field update controls
`close-issue`	`target`, `required-labels`, `state-reason` (completed, not_planned, duplicate)
`link-sub-issue`	Parent-child issue relationships
`create-discussion`	Discussion board posting
`update-discussion`	Modify existing discussions
`close-discussion`	Close discussions

Pull Requests:

Safe Output	Key Configuration Options
`create-pull-request`	Branch, base, title, body configuration
`update-pull-request`	Field updates for existing PRs
`close-pull-request`	PR closure
`create-pull-request-review-comment`	PR review comments
`reply-to-pull-request-review-comment`	Review comment threading
`resolve-pull-request-review-thread`	Mark threads resolved
`add-reviewer`	Assign PR reviewers
`push-to-pull-request-branch`	Direct branch updates

Labels & Assignments:

Safe Output	Key Configuration Options
`add-comment`	`target` (triggering issue, "*", or number), `hide-older-comments`, `footer` control
`hide-comment`	Hide comments
`add-labels`	`allowed` list (restrict to specific labels), `blocked` patterns (glob support)
`remove-labels`	`allowed` list, `blocked` patterns (glob support)
`assign-milestone`	Milestone assignment
`assign-to-agent`	Agent assignment
`assign-to-user`	User assignment
`unassign-from-user`	Remove user assignment

Projects & Releases:

Safe Output	Key Configuration Options
`create-project`	GitHub Projects creation
`update-project`	Modify existing projects
`create-project-status-update`	Project status updates
`update-release`	Release management
`upload-asset`	Release asset uploads
`upload-artifact`	Workflow artifact uploads
`skip-archive`	Archive bypass

Security & Agent Tasks:

Safe Output	Key Configuration Options
`dispatch-workflow`	Trigger workflow runs
`call-workflow`	Workflow call integration
`dispatch_repository`	Repository dispatch events
`create-code-scanning-alert`	Create security alerts
`autofix-code-scanning-alert`	Auto-remediate security findings
`create-agent-session`	Agent session management

System Types (auto-enabled):

Safe Output	Key Configuration Options
`noop`	`message` field — Required when no action is taken; hard cap of 1 per run; `report-as-issue: false` disables automatic no-op reporting
`missing-tool`	System-generated for missing tool calls
`missing-data`	System-generated for data gaps

Custom:

Safe Output	Key Configuration Options
`jobs`	Custom post-processing job definitions
`actions`	GitHub Action wrapper integrations

Each safe output includes a hidden workflow-id marker () for searchability. For workflow_call triggers, outputs are auto-injected (created_issue_number, created_issue_url, etc.).

All safe output types support cross-repository operations via target-repo and allowed-repos.

Safe output operation counts can be capped per run (e.g., max: 30). The maximum value is configured in frontmatter and requires recompilation to change. The compiled lock file injects these caps into the agent's system prompt as :

Tools: add_comment(max:30), close_issue(max:30), add_labels(max:30), remove_labels(max:30), …

The same limits are also enforced at the handler level in the safe_outputs job via GH_AW_SAFE_OUTPUTS_HANDLER_CONFIG, which re-applies the config when processing the agent's JSONL output.

Changing caps: Edit max: values in the frontmatter of your workflow .md file, then run gh aw compile. The lock file is the enforced artifact — the source .md alone is not active . Changes to safe-outputs configuration in frontmatter require workflow recompilation to take effect, unlike changes to the markdown body (instructions), which take effect on the next run.

Label Restrictions#

The add-labels and remove-labels safe outputs support both allowed (whitelist) and blocked (blacklist) patterns using glob syntax :

add-labels:
  max: 30
  allowed: ["bug-*", "priority-*"]
  blocked: ["internal-*"]
remove-labels:
  max: 30
  allowed: ["Stale"]

The compiled workflow surfaces these constraints in tool descriptions so the model is aware at inference time :

add_labels → "CONSTRAINTS: Maximum 30 label(s) can be added. Only these labels are allowed: ["bug-*", "priority-*"]. Labels matching these patterns are blocked: ["internal-*"]."
remove_labels → "CONSTRAINTS: Maximum 30 label(s) can be removed. Only these labels can be removed: [Stale]."

The handler also re-validates the allowed and blocked lists before executing any label operation , ensuring constraints are enforced even if the model ignores its own tool description.

Field-Level Validation#

The Write Safe Outputs Tools step emits a GH_AW_VALIDATION_JSON schema that constrains every field of every safe output:

Output	Key constraint
`add_comment`	`body`: string, sanitized, max 65,000 chars
`close_issue`	`body`: string, sanitized, max 65,000 chars; `issue_number`: optional positive integer
`add_labels` / `remove_labels`	`labels`: array of strings, each sanitized, max 128 chars
`noop`	`message`: string, sanitized, max 65,000 chars; hard cap of 1 per run

Critical: The noop safe output must be called when the agent finishes without taking any GitHub action. If the agent completes without calling any safe-output tool, the workflow fails silently. By default, noop runs are posted as issues (the [aw] No-Op Runs issue), which is why agentic-workflows is included in the exempt label list. To disable this behavior, set report-as-issue: false in the noop safe-output configuration.

Advanced Configuration Options#

Failure Handling:

report-failure-as-issue: false — suppress automatic failure issue creation
failure-issue-repo: owner/repo — redirect failure issues to a different repository
group-reports: true — group failed runs under a parent issue

Security & Limits:

allowed-domains — URL sanitization in output (restrict to specific domains)
max-bot-mentions — control bot trigger phrase escaping (default: 10)
max-patch-size — maximum patch size for PR operations (default: 1024 KB)

Infrastructure:

runs-on — custom runner image for safe output jobs
concurrency-group — concurrency control for safe outputs job execution
environment — deployment environment scoping for safe outputs

Customization:

messages — custom notification templates for various safe output events

Staged Mode#

Users can preview what safe outputs a workflow would create without actually executing them by adding staged: true to the safe-outputs: block in the workflow configuration. This is useful for testing workflows before letting them take real actions on the repository.

Replaying Safe Outputs#

If the safe_outputs job fails due to transient API errors or threat detection blocking, users can replay safe outputs from a previous run without re-running the entire agent workflow. Use the Agentic Maintenance workflow and provide the failed run URL to recover from failures and retry applying the safe outputs.

Exempt Issue Protection#

The agent's instructions prevent it from ever targeting issues that carry any of these labels :

agentic-workflows, pinned, security, help wanted

Issues with an exempt label are excluded from Bucket B (potentially-stale) evaluation entirely , so the bot will never add Stale or close them regardless of inactivity duration.

Defense-in-Depth Architecture#

The guardrails are enforced at multiple layers, in order:

Agent prompt — safe-output tool declarations with inline constraints are injected at prompt time , so the model is steered toward compliant outputs.
MCP read-only server — the GitHub MCP server runs with GITHUB_READ_ONLY: "1" and only the issues toolset during the agent phase , making direct write calls impossible.
Safe Outputs MCP server — all write intents are funnelled through a separate HTTP MCP server (started here) that records outputs to a JSONL file rather than executing them immediately. The agent cannot write to GitHub directly; it only produces a structured artifact describing intended actions.
Threat detection job — a second Claude agent reviews all proposed outputs before any GitHub API call is made ; the safe_outputs job only runs if detection succeeds . This AI-powered scan checks for prompt injection attacks, leaked credentials, and malicious code patterns.
Safe outputs handler — the safe_outputs job runs with scoped write permissions and re-validates every output against the full config (caps + label allowlist) before dispatching to the GitHub API. Only what the workflow permits is applied.

Note: better-stale-bot defaults to Claude Haiku with engine: { id: claude, model: haiku } for both the agent and threat detection jobs.

Key files:

.github/workflows/better-stale-bot.md — source of truth for guardrail configuration (frontmatter)
.github/workflows/better-stale-bot.lock.yml — compiled enforcement artifact; do not edit directly