Prompt Caching#
Claude Code manages prompt caching automatically to minimize token costs by reusing unchanged portions of the prompt across turns. The cache operates as a prefix cache: each turn's request is the previous turn's content plus the new exchange appended at the end. Claude's API charges roughly 10% of the standard input rate for cache hits (cache_read_input_tokens) and a cache-write rate for newly cached content (cache_creation_input_tokens) .
Cache Layers#
The prompt is divided into three ordered layers :
| Layer | Content | Invalidated when |
|---|---|---|
| System prompt | Core instructions, tool definitions, output style | MCP server connects/disconnects, or Claude Code upgrades |
| Project context | CLAUDE.md, auto memory, unscoped rules | Session starts, or after /clear or /compact |
| Conversation | Messages, responses, tool results | Every turn |
Cache invalidation at any layer busts all layers below it. The system prompt layer is therefore the most expensive to invalidate.
Actions That Invalidate the Cache#
The following actions reset the system prompt layer, causing a full uncached turn :
- Switching models (
/model) or changing effort (/effort) - Connecting or disconnecting an MCP server
- Denying an entire tool (e.g.,
BashorWebFetch) via/permissions - Running
/compact(also resets the conversation layer) - Upgrading Claude Code (auto-updates can be suppressed with
DISABLE_AUTOUPDATER=1)
The following actions preserve the cache :
- Editing files in your repository (file reads appear as
<system-reminder>tags, outside the cached prefix) - Editing CLAUDE.md mid-session (takes effect after
/clearor/compact) - Changing output style via
/config outputStyle(takes effect after/clear) - Changing permission mode (e.g., to
opusplan) - Invoking skills/commands or running
/recap - Using
/rewind(rewinding does not re-invalidate the cache)
Cache Lifetime#
| Context | Default TTL |
|---|---|
| Claude subscription | 5 minutes |
| API key, Bedrock, Vertex, Foundry | 5 minutes (extendable) |
To extend cache lifetime on API key or third-party providers, set ENABLE_PROMPT_CACHING_1H=1 for a 1-hour TTL . Use FORCE_PROMPT_CACHING_5M=1 to explicitly force 5-minute TTL . Note: a bug where 1-hour TTL was silently downgraded to 5 minutes was fixed in the changelog .
System Prompt Customization and Cache Strategy#
When using the Agent SDK, there are four customization approaches with different cache implications:
claude_code preset with append#
The lowest-risk option for cache preservation. Your text is appended after the stable claude_code preset, so the preset itself stays cacheable. If the append content is static (the same string on every call), the full system prompt remains stable and cache-friendly .
systemPrompt: {
type: "preset",
preset: "claude_code",
append: "You operate Acme's internal triage workflow. Label issues by component and severity.",
excludeDynamicSections: true
}
excludeDynamicSections#
The claude_code preset injects some dynamic sections (environment metadata, session-specific context) that change per user or machine. Enabling excludeDynamicSections: true (Python: exclude_dynamic_sections=True) strips these sections from the system prompt, making the prefix identical across all users — enabling cross-user and cross-machine cache sharing . This is especially valuable for multi-tenant or CI deployments. The same effect is available from the CLI via --exclude-dynamic-system-prompt-sections .
A related changelog fix notes that removing dynamic content from tool descriptions specifically improved cache hit rates for Bedrock, Vertex, and Foundry users .
Custom systemPrompt string#
Fully overrides the preset. Cache behavior depends entirely on whether the supplied string is stable. Any per-request interpolation (user name, timestamp, session ID) will bust the cache on every call .
CLAUDE.md and output styles#
CLAUDE.md changes sit in the project context layer, not the system prompt layer, so they don't invalidate the system prompt cache. However, they take effect only at session start, after /clear, or after /compact .
Monitoring Cache Performance#
Inspect cache usage from /usage → current_usage :
| Field | Meaning |
|---|---|
cache_creation_input_tokens | Tokens written to cache (cache write rate) |
cache_read_input_tokens | Tokens served from cache (~10% of input rate) |
A high ratio of cache_read_input_tokens to cache_creation_input_tokens indicates healthy cache reuse.
Disabling Prompt Caching#
Set to 1 to disable :
| Variable | Effect |
|---|---|
DISABLE_PROMPT_CACHING | Disable for all models |
DISABLE_PROMPT_CACHING_HAIKU | Disable for Haiku only |
DISABLE_PROMPT_CACHING_SONNET | Disable for Sonnet only |
DISABLE_PROMPT_CACHING_OPUS | Disable for Opus only |
Subagents and Cache#
Subagents invoked during a session share the same caching infrastructure. A bug fix in the changelog notes a ~3× reduction in cache_creation tokens after fixing sub-agent progress summaries that were missing the prompt cache . An earlier fix addressed SDK query() cache invalidation that reduced input token costs up to 12× .
Key References#
- How Claude Code uses prompt caching — primary reference for cache layers, invalidation rules, TTLs
- Modifying system prompts —
append,excludeDynamicSections, custom prompt tradeoffs - CHANGELOG.md — history of cache bug fixes and improvements