AI Output Safety Guardrails#
The better-stale-bot workflow constrains every GitHub write operation produced by the AI agent through a safe-outputs system — a declarative layer that caps operation counts, restricts allowed label values, and validates field contents before any mutations reach the GitHub API. All guardrails are defined in YAML frontmatter and compiled into the lock file; they cannot be bypassed at runtime.
Safe Output Types & Configuration#
Issues & Discussions:
| Safe Output | Key Configuration Options |
|---|---|
create-issue | auto-expiration (expires field for time-limited issues), group: true (sub-issue grouping), close-older-issues (automatically close previous issues), group-by-day |
update-issue | target, field update controls |
close-issue | target, required-labels, state-reason (completed, not_planned, duplicate) |
link-sub-issue | Parent-child issue relationships |
create-discussion | Discussion board posting |
update-discussion | Modify existing discussions |
close-discussion | Close discussions |
Pull Requests:
| Safe Output | Key Configuration Options |
|---|---|
create-pull-request | Branch, base, title, body configuration |
update-pull-request | Field updates for existing PRs |
close-pull-request | PR closure |
create-pull-request-review-comment | PR review comments |
reply-to-pull-request-review-comment | Review comment threading |
resolve-pull-request-review-thread | Mark threads resolved |
add-reviewer | Assign PR reviewers |
push-to-pull-request-branch | Direct branch updates |
Labels & Assignments:
| Safe Output | Key Configuration Options |
|---|---|
add-comment | target (triggering issue, "*", or number), hide-older-comments, footer control |
hide-comment | Hide comments |
add-labels | allowed list (restrict to specific labels), blocked patterns (glob support) |
remove-labels | allowed list, blocked patterns (glob support) |
assign-milestone | Milestone assignment |
assign-to-agent | Agent assignment |
assign-to-user | User assignment |
unassign-from-user | Remove user assignment |
Projects & Releases:
| Safe Output | Key Configuration Options |
|---|---|
create-project | GitHub Projects creation |
update-project | Modify existing projects |
create-project-status-update | Project status updates |
update-release | Release management |
upload-asset | Release asset uploads |
upload-artifact | Workflow artifact uploads |
skip-archive | Archive bypass |
Security & Agent Tasks:
| Safe Output | Key Configuration Options |
|---|---|
dispatch-workflow | Trigger workflow runs |
call-workflow | Workflow call integration |
dispatch_repository | Repository dispatch events |
create-code-scanning-alert | Create security alerts |
autofix-code-scanning-alert | Auto-remediate security findings |
create-agent-session | Agent session management |
System Types (auto-enabled):
| Safe Output | Key Configuration Options |
|---|---|
noop | message field — Required when no action is taken; hard cap of 1 per run; report-as-issue: false disables automatic no-op reporting |
missing-tool | System-generated for missing tool calls |
missing-data | System-generated for data gaps |
Custom:
| Safe Output | Key Configuration Options |
|---|---|
jobs | Custom post-processing job definitions |
actions | GitHub Action wrapper integrations |
Each safe output includes a hidden workflow-id marker (<!-- gh-aw-workflow-id: WORKFLOW_NAME -->) for searchability. For workflow_call triggers, outputs are auto-injected (created_issue_number, created_issue_url, etc.).
All safe output types support cross-repository operations via target-repo and allowed-repos.
Safe output operation counts can be capped per run (e.g., max: 30). The maximum value is configured in frontmatter and requires recompilation to change. The compiled lock file injects these caps into the agent's system prompt as :
Tools: add_comment(max:30), close_issue(max:30), add_labels(max:30), remove_labels(max:30), …
The same limits are also enforced at the handler level in the safe_outputs job via GH_AW_SAFE_OUTPUTS_HANDLER_CONFIG, which re-applies the config when processing the agent's JSONL output.
Changing caps: Edit
max:values in the frontmatter of your workflow.mdfile, then rungh aw compile. The lock file is the enforced artifact — the source.mdalone is not active . Changes to safe-outputs configuration in frontmatter require workflow recompilation to take effect, unlike changes to the markdown body (instructions), which take effect on the next run.
Label Restrictions#
The add-labels and remove-labels safe outputs support both allowed (whitelist) and blocked (blacklist) patterns using glob syntax :
add-labels:
max: 30
allowed: ["bug-*", "priority-*"]
blocked: ["internal-*"]
remove-labels:
max: 30
allowed: ["Stale"]
The compiled workflow surfaces these constraints in tool descriptions so the model is aware at inference time :
add_labels→"CONSTRAINTS: Maximum 30 label(s) can be added. Only these labels are allowed: ["bug-*", "priority-*"]. Labels matching these patterns are blocked: ["internal-*"]."remove_labels→"CONSTRAINTS: Maximum 30 label(s) can be removed. Only these labels can be removed: [Stale]."
The handler also re-validates the allowed and blocked lists before executing any label operation , ensuring constraints are enforced even if the model ignores its own tool description.
Field-Level Validation#
The Write Safe Outputs Tools step emits a GH_AW_VALIDATION_JSON schema that constrains every field of every safe output:
| Output | Key constraint |
|---|---|
add_comment | body: string, sanitized, max 65,000 chars |
close_issue | body: string, sanitized, max 65,000 chars; issue_number: optional positive integer |
add_labels / remove_labels | labels: array of strings, each sanitized, max 128 chars |
noop | message: string, sanitized, max 65,000 chars; hard cap of 1 per run |
Critical: The
noopsafe output must be called when the agent finishes without taking any GitHub action. If the agent completes without calling any safe-output tool, the workflow fails silently. By default, noop runs are posted as issues (the[aw] No-Op Runsissue), which is whyagentic-workflowsis included in the exempt label list. To disable this behavior, setreport-as-issue: falsein thenoopsafe-output configuration.
Advanced Configuration Options#
Failure Handling:
report-failure-as-issue: false— suppress automatic failure issue creationfailure-issue-repo: owner/repo— redirect failure issues to a different repositorygroup-reports: true— group failed runs under a parent issue
Security & Limits:
allowed-domains— URL sanitization in output (restrict to specific domains)max-bot-mentions— control bot trigger phrase escaping (default: 10)max-patch-size— maximum patch size for PR operations (default: 1024 KB)
Infrastructure:
runs-on— custom runner image for safe output jobsconcurrency-group— concurrency control for safe outputs job executionenvironment— deployment environment scoping for safe outputs
Customization:
messages— custom notification templates for various safe output events
Staged Mode#
Users can preview what safe outputs a workflow would create without actually executing them by adding staged: true to the safe-outputs: block in the workflow configuration. This is useful for testing workflows before letting them take real actions on the repository.
Replaying Safe Outputs#
If the safe_outputs job fails due to transient API errors or threat detection blocking, users can replay safe outputs from a previous run without re-running the entire agent workflow. Use the Agentic Maintenance workflow and provide the failed run URL to recover from failures and retry applying the safe outputs.
Exempt Issue Protection#
The agent's instructions prevent it from ever targeting issues that carry any of these labels :
agentic-workflows,pinned,security,help wanted
Issues with an exempt label are excluded from Bucket B (potentially-stale) evaluation entirely , so the bot will never add Stale or close them regardless of inactivity duration.
Defense-in-Depth Architecture#
The guardrails are enforced at multiple layers, in order:
- Agent prompt — safe-output tool declarations with inline constraints are injected at prompt time , so the model is steered toward compliant outputs.
- MCP read-only server — the GitHub MCP server runs with
GITHUB_READ_ONLY: "1"and only theissuestoolset during the agent phase , making direct write calls impossible. - Safe Outputs MCP server — all write intents are funnelled through a separate HTTP MCP server (started here) that records outputs to a JSONL file rather than executing them immediately. The agent cannot write to GitHub directly; it only produces a structured artifact describing intended actions.
- Threat detection job — a second Claude agent reviews all proposed outputs before any GitHub API call is made ; the
safe_outputsjob only runs if detection succeeds . This AI-powered scan checks for prompt injection attacks, leaked credentials, and malicious code patterns. - Safe outputs handler — the
safe_outputsjob runs with scoped write permissions and re-validates every output against the full config (caps + label allowlist) before dispatching to the GitHub API. Only what the workflow permits is applied.
Note: better-stale-bot defaults to Claude Haiku with
engine: { id: claude, model: haiku }for both the agent and threat detection jobs.
Key files:
.github/workflows/better-stale-bot.md— source of truth for guardrail configuration (frontmatter).github/workflows/better-stale-bot.lock.yml— compiled enforcement artifact; do not edit directly