Skip to main content

Claude Code Cache TTL and Token Usage: Why Cache Misses Get Expensive and How to Prove It

A
12 min readClaude Code

Claude Code cache TTL is route-specific. Check the active route, TTL, cache creation tokens, cache read tokens, invalidators, and billing proof before blaming a bug.

Claude Code Cache TTL and Token Usage: Why Cache Misses Get Expensive and How to Prove It

A Claude Code cache miss usually means the cheap cache read did not happen for the reusable prefix. The expensive turn is the cache write or full input processing side of prompt caching, plus any output tokens generated by the model.

As of May 30, 2026, the cache TTL answer is route-specific. Claude Code main conversations on a Claude subscription request the 1-hour TTL automatically while plan usage is included. API-key, Bedrock, Vertex, Foundry, Claude Platform on AWS, and gateway-style routes stay on the cheaper 5-minute TTL by default unless 1-hour caching is enabled where supported. Subagents use 5-minute TTL even when the parent subscription conversation gets automatic 1-hour TTL.

Anthropic's cache pricing makes the common "12.5x" shorthand simple: a 5-minute cache write is priced at 1.25x base input, while a cache read is 0.1x base input for the same tokens. That is a write/read comparison, not an official cache-miss surcharge.

Before changing workflow, name the route that produced the number: subscription usage, API key billing, an Agent SDK estimate, Bedrock, Vertex, Foundry, or a gateway. Then check cache_creation_input_tokens, cache_read_input_tokens, recent invalidators such as model/MCP changes or /compact, and the billing surface that owns that route. If those records do not line up, collect timestamps, model, route, version, cache fields, and invalidator history before calling it a bug.

Fast Answer: Prove Route Before Fixing Cache

QuestionShort answerProof to collect
Is a Claude Code cache miss a separate fee?No. It means the cached prefix was not read cheaply and had to be processed or written again.Cache creation/read fields for the same route.
Why does one turn look 12.5x more expensive?1.25x cache write divided by 0.1x cache read equals 12.5x for the same token volume.Current Anthropic pricing page and usage fields.
Is Claude Code cache TTL 5 minutes or 1 hour?It depends on route: subscription main conversations request 1 hour; API-key and provider routes default to 5 minutes./status, auth route, environment variables, and cache fields.
What should you check first?Route owner, then cache creation and cache read tokens./status, /cost when relevant, SDK usage, Console usage, provider invoice.
When is it suspicious?Repeated creation on similar turns after you expected stable cache reuse.Similar timestamps, same model, same route, stable MCP/tool setup, field deltas.
What should not be the first fix?Blindly compacting or switching routes.Identify the invalidator before changing context shape.

The broader OpenAI vs Claude cache pricing comparison is useful when you are choosing providers. Stay here when the symptom is narrower: Claude Code shows a costly turn, high cache creation, low cache read, or confusing usage drain, and you need to explain the current run.

TTL Route Matrix: 5 Minutes, 1 Hour, and Subagents

Claude Code cache TTL is not one global setting. It is chosen from the active route and can change when the billing owner changes.

Active routeDefault cache TTLWhat that means for token usageProof or control
Claude subscription main conversation1 hourThe longer TTL keeps the main conversation cache warm through longer pauses while usage is included in the plan./status, plan state, Claude Code prompt caching docs.
Claude subscription after plan usage is exceeded and usage credits are billed5 minutesClaude Code drops to the cheaper TTL because the usage is now token-billed.Plan message, usage credit state, next-turn cache fields.
Claude API key5 minutesThe default write is cheaper, but cache expires sooner after idle gaps.Claude Console, Usage and Cost API, ENABLE_PROMPT_CACHING_1H=1 when 1-hour TTL is justified.
Bedrock, Vertex, Foundry, or Claude Platform on AWS5 minutes by defaultProvider support, regions, minimum cacheable prefix length, and 1-hour availability can vary.Provider logs, model/region docs, cache token counts.
Custom base URL or LLM gatewayRoute-dependentThe gateway may pass prompt caching through, alter headers, or hide upstream fields.Gateway logs plus upstream provider billing where available.
Subagent5 minutesA subagent starts its own conversation and builds its own cache; the parent's 1-hour subscription TTL does not automatically carry over.Parent/subagent timestamps, separate cache fields, subagent transcript.

Use ENABLE_PROMPT_CACHING_1H=1 only when the route supports 1-hour caching and the same prefix will be reused enough times to pay back the higher write cost. Use FORCE_PROMPT_CACHING_5M=1 when you need to debug behavior, compare TTLs, or override a managed 1-hour setting.

The route matrix changes the first move. If a subscription main conversation looks expensive after a 20-minute break, ask why the cache did not stay warm. If an API-key route looks expensive after the same break, a 5-minute TTL expiry may be normal unless you explicitly enabled 1-hour caching.

The 12.5x Number Is Write/Read Math

Anthropic's Claude API pricing page separates cache writes and cache reads. A 5-minute cache write is 1.25x the base input rate. A 1-hour cache write is 2x the base input rate. A cache read is 0.1x the base input rate.

For the same reusable prefix:

text
5-minute cache write / cache read = 1.25 / 0.1 = 12.5 1-hour cache write / cache read = 2.0 / 0.1 = 20

That ratio is a comparison between token classes. It does not mean Claude Code has a separate "cache miss fee" line item. It means the turn did not benefit from the cheap read path for that reusable context.

The higher 1-hour write multiplier is not automatically worse. It pays off when later reads would otherwise miss because the 5-minute cache expired. It is wasteful when you write a large prefix once and never reuse it.

The difference matters most when the prefix is large: long conversation history, project context, tool results, MCP server definitions, subagent context, file packs, or a large system prompt. A small cache miss may be noise. A large cache write repeated turn after turn is the pattern that deserves attention.

Token Class Ledger

Claude Code cache creation and cache read token ledger

Read the usage record by token class, not by total tokens alone.

Token or meterWhat it meansHow to use it
Full input tokensNormal prompt input processed by the model.Baseline cost when cache is absent or only part of the prefix matches.
cache_creation_input_tokensTokens written into prompt cache.One expected write can be normal; repeated high creation is the diagnostic signal.
cache_read_input_tokensTokens read from an existing cache entry.High reads relative to creation usually mean caching is working.
Output tokensTokens generated by the model.Cache savings do not make output free.
current_usage in a statusline scriptLive per-turn usage object that can include cache creation and read counts.Useful for catching a repeated miss while the session is still active.
OpenTelemetry cache metricsOrganization-level visibility into cache read and creation tokens per user/session.Useful when the issue affects many machines or a shared automation fleet.
Subscription usagePlan or seat usage, often shown as limit/reset behavior.Do not compare directly to API invoices.
/cost and SDK cost fieldsHelpful local or client-side estimates.Useful for tracking, weaker than Console or provider billing proof.
Console, Usage and Cost API, provider invoiceAuthoritative billing surfaces for their routes.Use for final spend proof and support packets.

Anthropic's Agent SDK cost tracking docs describe cost fields such as total_cost_usd or costUSD as estimates. They are still useful for catching a runaway session, but they are not the same kind of evidence as an API billing record.

If the problem is "which route is active?", use the sibling Claude Code API key vs subscription billing route map before doing cache math. A subscription session, an API key route, and a provider route can show different meters for the same human workflow.

What Invalidated The Cache In Claude Code?

Claude Code cache invalidator matrix for model MCP compact and route changes

Claude Code's prompt caching docs explain that cache matching is prefix-based. Changes near the beginning of the request can invalidate everything after that point. In practical terms, the last workflow change often matters more than the total token number.

Recent actionCache riskWhat to check next
Switching modelsHighCompare model field and timestamp before/after the spike.
Changing effort levelHighCache key includes effort level as well as model.
Connecting or disconnecting MCP serversHighMCP tool definitions can change the early prefix.
Denying an entire toolHighA whole-tool denial can change the available tool context.
Running /compactHighCompacting rewrites conversation context and can break reuse.
Upgrading Claude CodeHighVersion changes can alter prompt shape or cache scope.
Changing provider, API route, or gatewayHighCache scope and billing owner may change with the route.
Long idle gapMediumCache TTL may have expired before the next read.
Subagents, forks, worktrees, or directory changesMediumSubagents build their own 5-minute cache; forks may inherit the parent prefix; worktrees and directories can split cache scope.
Editing filesLowerFile edits add context, but do not necessarily rewrite the early prefix.
Changing output style or permission modeLowerStill worth noting when the timing matches the spike.
Running /recap or rewindingLowerUseful for navigation, but still verify field changes.

The safest interpretation is conservative: a cache write after a meaningful context change can be expected. A repeated cache write on similar turns, same model, same route, same MCP setup, and short time gap is stronger evidence of abnormal cache behavior.

For broader context bloat from too many MCP servers or large tool definitions, use the Claude Code MCP context overload cleanup path. Cache miss cost is one symptom; context overload is the wider design problem.

Route-Specific Evidence

The same local Claude Code session can produce numbers from different evidence surfaces. Treat each route as a different contract.

RouteWhat the number can meanStronger proof
Claude subscription loginPlan or seat usage, limit windows, reset timing, included usage behavior, and automatic 1-hour TTL for main conversations.Help Center usage/limits language, account status, /status, plan messages.
Claude API keyPay-as-you-go API token billing with 5-minute TTL by default.Claude Console usage, Usage and Cost API, project billing settings, TTL environment variables.
Agent SDKClient-side usage and cost estimate from SDK messages.SDK fields plus Console or Usage and Cost API for billing confirmation.
Bedrock, Vertex, or FoundryProvider-routed model use with provider billing, cache support boundaries, and 5-minute TTL by default where caching is supported.Cloud provider invoice, route-specific request logs, model/region cache support.
GatewayGateway usage, provider pass-through behavior, and possible hidden upstream TTL behavior.Gateway logs plus upstream provider billing where available.

Anthropic's Claude Code costs docs and Help Center usage article are important because they keep subscription usage and API billing separate. Do not use a subscription limit message to prove API spend. Do not use an API estimate to prove a subscription seat was charged.

When the route is unclear, stop the cache investigation and verify route first:

bash
claude /status /cost

/status names the active account or route. /cost can help monitor API-style spend, but the billing proof still belongs to the route that produced the run.

Fix Ladder

Claude Code cache miss fix ladder and support packet

Use the smallest fix that matches the evidence.

  1. Identify the route. Make sure the number came from subscription usage, API key billing, SDK estimate, provider route, or gateway route.
  2. Verify the TTL branch. Decide whether this route should be on subscription 1-hour TTL, token-billed 5-minute TTL, or an explicit environment override.
  3. Read the cache fields. Compare cache_creation_input_tokens with cache_read_input_tokens over several similar turns.
  4. Name the invalidator. Look for model switches, effort changes, MCP changes, whole-tool denial, /compact, upgrade timing, provider change, TTL expiry, subagents, forks, worktree changes, or directory scope changes.
  5. Apply the smallest fix. Keep the model stable, avoid changing MCP setup mid-task, keep reusable context early and stable, and split unrelated work with /clear.
  6. Use /compact at natural breaks. Compacting is useful when the conversation is too large, but it is not a universal cache-cost fix.
  7. Consider 1-hour TTL only when the cadence fits. A longer TTL has a higher write multiplier, so it helps only when later reads justify that write.
  8. Verify the next turn. A successful fix should reduce repeated creation and increase reads for similar context.

If the real symptom is a plan-window interruption, use Claude Code rate limit or Claude Code rate limit reached. Cache math can explain why a session used more input, but it is not the full quota recovery playbook.

Suspicious Spike Packet

Do not file a vague "cache bug" report. Collect a packet that lets a teammate or support engineer compare the same route and same context shape.

EvidenceWhy it matters
Timestamp and timezoneLets usage records and provider invoices line up.
Claude Code versionUpgrades can affect cache shape.
ModelModel switches are known invalidators.
Active route and TTL branchSubscription, API key, SDK, provider, gateway, 5-minute TTL, and 1-hour TTL evidence must not be mixed.
MCP server list before and afterMCP connect/disconnect can change the early prefix.
Effort level before and afterEffort changes have a separate cache key.
Tool permission changesWhole-tool denial is a known invalidator.
/compact, /clear, /recap, rewind historyConversation-shape actions affect interpretation.
cache_creation_input_tokens and cache_read_input_tokensThe core proof of write versus read behavior.
Similar hit and miss turnsA comparison turn is stronger than one isolated screenshot.
Billing surfaceConsole, Usage and Cost API, provider invoice, or subscription limit message.

The escalation boundary is simple: one cache write after a context change is not enough. Repeated creation with stable route, stable model, stable MCP setup, short timing, and similar context is worth escalating.

FAQ

Is a Claude Code cache miss a separate fee?

No. It is better described as the expensive side of prompt caching: the reusable prefix was processed or written again instead of being read from cache at the cheaper read rate.

Why do people say a cache miss costs 12.5x more?

The number comes from Anthropic's cache multipliers. A 5-minute cache write is 1.25x base input, and a cache read is 0.1x base input. For the same token volume, 1.25 divided by 0.1 equals 12.5.

Which fields prove a cache miss?

Start with cache_creation_input_tokens and cache_read_input_tokens. High creation with low reads means the prefix was written or processed instead of reused. Repeated high creation across similar turns is the stronger signal.

Is Claude Code cache TTL 5 minutes or 1 hour?

It depends on authentication and route. Claude subscription main conversations request 1-hour TTL automatically while usage is included in the plan. Token-billed routes such as API keys and provider routes default to 5 minutes unless you opt into 1-hour caching where supported. Subagents use 5-minute TTL even when the parent subscription conversation uses 1 hour.

Does /compact reduce Claude Code token costs?

Sometimes, but not as a blanket cache fix. /compact can reduce conversation size at a natural boundary, but it also rewrites context and can break cache reuse. Use it when the conversation is genuinely too large, not as the first reaction to one expensive turn.

Should I enable 1-hour TTL?

Only if the route supports it and the reuse pattern justifies the higher write multiplier. A 1-hour write costs more than a 5-minute write, so the longer TTL helps when later reads would otherwise miss because the shorter cache expired.

Is /cost my Claude bill?

Not by itself. /cost and SDK cost fields are useful estimates for monitoring, especially on API-style routes. The authoritative bill is Claude Console, the Usage and Cost API, a provider invoice, or the subscription surface that belongs to the active route.

When should I blame a Claude Code cache bug?

After you have same-route evidence: timestamp, version, model, route, MCP setup, tool permissions, cache fields, similar hit/miss turns, and billing surface. Without that packet, a normal invalidator or TTL expiry is more likely than a confirmed bug.

Share:

laozhang.ai

One API, All AI Models

AI Image

Gemini 3 Pro Image

$0.05/img
80% OFF
AI Video

Sora 2 · Veo 3.1

$0.15/video
Async API
AI Chat

GPT · Claude · Gemini

200+ models
Official Price
Served 100K+ developers
|@laozhang_cn|Get $0.1