Prompt Caching#

Claude Code manages prompt caching automatically to minimize token costs by reusing unchanged portions of the prompt across turns. The cache operates as a prefix cache: each turn's request is the previous turn's content plus the new exchange appended at the end. Claude's API charges roughly 10% of the standard input rate for cache hits (cache_read_input_tokens) and a cache-write rate for newly cached content (cache_creation_input_tokens) .

Cache Layers#

The prompt is divided into three ordered layers :

Layer	Content	Invalidated when
System prompt	Core instructions, tool definitions, output style	MCP server connects/disconnects, or Claude Code upgrades
Project context	CLAUDE.md, auto memory, unscoped rules	Session starts, or after `/clear` or `/compact`
Conversation	Messages, responses, tool results	Every turn

Cache invalidation at any layer busts all layers below it. The system prompt layer is therefore the most expensive to invalidate.

Actions That Invalidate the Cache#

The following actions reset the system prompt layer, causing a full uncached turn :

Switching models (/model) or changing effort (/effort)
Connecting or disconnecting an MCP server
Denying an entire tool (e.g., Bash or WebFetch) via /permissions
Running /compact (also resets the conversation layer)
Upgrading Claude Code (auto-updates can be suppressed with DISABLE_AUTOUPDATER=1)

The following actions preserve the cache :

Editing files in your repository (file reads appear as <system-reminder> tags, outside the cached prefix)
Editing CLAUDE.md mid-session (takes effect after /clear or /compact)
Changing output style via /config outputStyle (takes effect after /clear)
Changing permission mode (e.g., to opusplan)
Invoking skills/commands or running /recap
Using /rewind (rewinding does not re-invalidate the cache)

Cache Lifetime#

Context	Default TTL
Claude subscription	5 minutes
API key, Bedrock, Vertex, Foundry	5 minutes (extendable)

To extend cache lifetime on API key or third-party providers, set ENABLE_PROMPT_CACHING_1H=1 for a 1-hour TTL . Use FORCE_PROMPT_CACHING_5M=1 to explicitly force 5-minute TTL . Note: a bug where 1-hour TTL was silently downgraded to 5 minutes was fixed in the changelog .

System Prompt Customization and Cache Strategy#

When using the Agent SDK, there are four customization approaches with different cache implications:

`claude_code` preset with `append`#

The lowest-risk option for cache preservation. Your text is appended after the stable claude_code preset, so the preset itself stays cacheable. If the append content is static (the same string on every call), the full system prompt remains stable and cache-friendly .

systemPrompt: {
  type: "preset",
  preset: "claude_code",
  append: "You operate Acme's internal triage workflow. Label issues by component and severity.",
  excludeDynamicSections: true
}

`excludeDynamicSections`#

The claude_code preset injects some dynamic sections (environment metadata, session-specific context) that change per user or machine. Enabling excludeDynamicSections: true (Python: exclude_dynamic_sections=True) strips these sections from the system prompt, making the prefix identical across all users — enabling cross-user and cross-machine cache sharing . This is especially valuable for multi-tenant or CI deployments. The same effect is available from the CLI via --exclude-dynamic-system-prompt-sections .

A related changelog fix notes that removing dynamic content from tool descriptions specifically improved cache hit rates for Bedrock, Vertex, and Foundry users .

Custom `systemPrompt` string#

Fully overrides the preset. Cache behavior depends entirely on whether the supplied string is stable. Any per-request interpolation (user name, timestamp, session ID) will bust the cache on every call .

CLAUDE.md and output styles#

CLAUDE.md changes sit in the project context layer, not the system prompt layer, so they don't invalidate the system prompt cache. However, they take effect only at session start, after /clear, or after /compact .

Monitoring Cache Performance#

Inspect cache usage from /usage → current_usage :

Field	Meaning
`cache_creation_input_tokens`	Tokens written to cache (cache write rate)
`cache_read_input_tokens`	Tokens served from cache (~10% of input rate)

A high ratio of cache_read_input_tokens to cache_creation_input_tokens indicates healthy cache reuse.

Disabling Prompt Caching#

Set to 1 to disable :

Variable	Effect
`DISABLE_PROMPT_CACHING`	Disable for all models
`DISABLE_PROMPT_CACHING_HAIKU`	Disable for Haiku only
`DISABLE_PROMPT_CACHING_SONNET`	Disable for Sonnet only
`DISABLE_PROMPT_CACHING_OPUS`	Disable for Opus only

Subagents and Cache#

Subagents invoked during a session share the same caching infrastructure. A bug fix in the changelog notes a ~3× reduction in cache_creation tokens after fixing sub-agent progress summaries that were missing the prompt cache . An earlier fix addressed SDK query() cache invalidation that reduced input token costs up to 12× .

Key References#

How Claude Code uses prompt caching — primary reference for cache layers, invalidation rules, TTLs
Modifying system prompts — append, excludeDynamicSections, custom prompt tradeoffs
CHANGELOG.md — history of cache bug fixes and improvements