Documents
Prompt Caching
Prompt Caching
Type
Topic
Status
Published
Created
May 29, 2026
Updated
May 29, 2026

Prompt Caching#

Claude Code manages prompt caching automatically to minimize token costs by reusing unchanged portions of the prompt across turns. The cache operates as a prefix cache: each turn's request is the previous turn's content plus the new exchange appended at the end. Claude's API charges roughly 10% of the standard input rate for cache hits (cache_read_input_tokens) and a cache-write rate for newly cached content (cache_creation_input_tokens) .

Cache Layers#

The prompt is divided into three ordered layers :

LayerContentInvalidated when
System promptCore instructions, tool definitions, output styleMCP server connects/disconnects, or Claude Code upgrades
Project contextCLAUDE.md, auto memory, unscoped rulesSession starts, or after /clear or /compact
ConversationMessages, responses, tool resultsEvery turn

Cache invalidation at any layer busts all layers below it. The system prompt layer is therefore the most expensive to invalidate.

Actions That Invalidate the Cache#

The following actions reset the system prompt layer, causing a full uncached turn :

  • Switching models (/model) or changing effort (/effort)
  • Connecting or disconnecting an MCP server
  • Denying an entire tool (e.g., Bash or WebFetch) via /permissions
  • Running /compact (also resets the conversation layer)
  • Upgrading Claude Code (auto-updates can be suppressed with DISABLE_AUTOUPDATER=1)

The following actions preserve the cache :

  • Editing files in your repository (file reads appear as <system-reminder> tags, outside the cached prefix)
  • Editing CLAUDE.md mid-session (takes effect after /clear or /compact)
  • Changing output style via /config outputStyle (takes effect after /clear)
  • Changing permission mode (e.g., to opusplan)
  • Invoking skills/commands or running /recap
  • Using /rewind (rewinding does not re-invalidate the cache)

Cache Lifetime#

ContextDefault TTL
Claude subscription5 minutes
API key, Bedrock, Vertex, Foundry5 minutes (extendable)

To extend cache lifetime on API key or third-party providers, set ENABLE_PROMPT_CACHING_1H=1 for a 1-hour TTL . Use FORCE_PROMPT_CACHING_5M=1 to explicitly force 5-minute TTL . Note: a bug where 1-hour TTL was silently downgraded to 5 minutes was fixed in the changelog .

System Prompt Customization and Cache Strategy#

When using the Agent SDK, there are four customization approaches with different cache implications:

claude_code preset with append#

The lowest-risk option for cache preservation. Your text is appended after the stable claude_code preset, so the preset itself stays cacheable. If the append content is static (the same string on every call), the full system prompt remains stable and cache-friendly .

systemPrompt: {
  type: "preset",
  preset: "claude_code",
  append: "You operate Acme's internal triage workflow. Label issues by component and severity.",
  excludeDynamicSections: true
}

excludeDynamicSections#

The claude_code preset injects some dynamic sections (environment metadata, session-specific context) that change per user or machine. Enabling excludeDynamicSections: true (Python: exclude_dynamic_sections=True) strips these sections from the system prompt, making the prefix identical across all users — enabling cross-user and cross-machine cache sharing . This is especially valuable for multi-tenant or CI deployments. The same effect is available from the CLI via --exclude-dynamic-system-prompt-sections .

A related changelog fix notes that removing dynamic content from tool descriptions specifically improved cache hit rates for Bedrock, Vertex, and Foundry users .

Custom systemPrompt string#

Fully overrides the preset. Cache behavior depends entirely on whether the supplied string is stable. Any per-request interpolation (user name, timestamp, session ID) will bust the cache on every call .

CLAUDE.md and output styles#

CLAUDE.md changes sit in the project context layer, not the system prompt layer, so they don't invalidate the system prompt cache. However, they take effect only at session start, after /clear, or after /compact .

Monitoring Cache Performance#

Inspect cache usage from /usagecurrent_usage :

FieldMeaning
cache_creation_input_tokensTokens written to cache (cache write rate)
cache_read_input_tokensTokens served from cache (~10% of input rate)

A high ratio of cache_read_input_tokens to cache_creation_input_tokens indicates healthy cache reuse.

Disabling Prompt Caching#

Set to 1 to disable :

VariableEffect
DISABLE_PROMPT_CACHINGDisable for all models
DISABLE_PROMPT_CACHING_HAIKUDisable for Haiku only
DISABLE_PROMPT_CACHING_SONNETDisable for Sonnet only
DISABLE_PROMPT_CACHING_OPUSDisable for Opus only

Subagents and Cache#

Subagents invoked during a session share the same caching infrastructure. A bug fix in the changelog notes a ~3× reduction in cache_creation tokens after fixing sub-agent progress summaries that were missing the prompt cache . An earlier fix addressed SDK query() cache invalidation that reduced input token costs up to 12× .

Key References#