Skip to main content

Claude Code Cache Miss Token Costs: Why One Turn Gets Expensive and How to Prove It

A
10 min readClaude Code

A Claude Code cache miss is not a separate fee. Check route owner, cache creation tokens, cache read tokens, invalidators, and billing proof before blaming a bug.

Claude Code Cache Miss Token Costs: Why One Turn Gets Expensive and How to Prove It

A Claude Code cache miss usually means the cheap cache read did not happen for the reusable prefix. The expensive turn is the cache write or full input processing side of prompt caching, plus any output tokens generated by the model.

As of May 24, 2026, Anthropic's cache pricing makes the common "12.5x" shorthand simple: a 5-minute cache write is priced at 1.25x base input, while a cache read is 0.1x base input for the same tokens. That is a write/read comparison, not an official cache-miss surcharge.

Before changing workflow, name the route that produced the number: subscription usage, API key billing, an Agent SDK estimate, Bedrock, Vertex, Foundry, or a gateway. Then check cache_creation_input_tokens, cache_read_input_tokens, recent invalidators such as model/MCP changes or /compact, and the billing surface that owns that route. If those records do not line up, collect timestamps, model, route, version, cache fields, and invalidator history before calling it a bug.

Fast Answer: Prove Route Before Fixing Cache

QuestionShort answerProof to collect
Is a Claude Code cache miss a separate fee?No. It means the cached prefix was not read cheaply and had to be processed or written again.Cache creation/read fields for the same route.
Why does one turn look 12.5x more expensive?1.25x cache write divided by 0.1x cache read equals 12.5x for the same token volume.Current Anthropic pricing page and usage fields.
What should you check first?Route owner, then cache creation and cache read tokens./status, /cost when relevant, SDK usage, Console usage, provider invoice.
When is it suspicious?Repeated creation on similar turns after you expected stable cache reuse.Similar timestamps, same model, same route, stable MCP/tool setup, field deltas.
What should not be the first fix?Blindly compacting or switching routes.Identify the invalidator before changing context shape.

The broader OpenAI vs Claude cache pricing comparison is useful when you are choosing providers. Stay here when the symptom is narrower: Claude Code shows a costly turn, high cache creation, low cache read, or confusing usage drain, and you need to explain the current run.

The 12.5x Number Is Write/Read Math

Anthropic's Claude API pricing page separates cache writes and cache reads. A 5-minute cache write is 1.25x the base input rate. A 1-hour cache write is 2x the base input rate. A cache read is 0.1x the base input rate.

For the same reusable prefix:

text
5-minute cache write / cache read = 1.25 / 0.1 = 12.5 1-hour cache write / cache read = 2.0 / 0.1 = 20

That ratio is a comparison between token classes. It does not mean Claude Code has a separate "cache miss fee" line item. It means the turn did not benefit from the cheap read path for that reusable context.

The difference matters most when the prefix is large: long conversation history, project context, tool results, MCP server definitions, subagent context, file packs, or a large system prompt. A small cache miss may be noise. A large cache write repeated turn after turn is the pattern that deserves attention.

Token Class Ledger

Claude Code cache creation and cache read token ledger

Read the usage record by token class, not by total tokens alone.

Token or meterWhat it meansHow to use it
Full input tokensNormal prompt input processed by the model.Baseline cost when cache is absent or only part of the prefix matches.
cache_creation_input_tokensTokens written into prompt cache.One expected write can be normal; repeated high creation is the diagnostic signal.
cache_read_input_tokensTokens read from an existing cache entry.High reads relative to creation usually mean caching is working.
Output tokensTokens generated by the model.Cache savings do not make output free.
Subscription usagePlan or seat usage, often shown as limit/reset behavior.Do not compare directly to API invoices.
/cost and SDK cost fieldsHelpful local or client-side estimates.Useful for tracking, weaker than Console or provider billing proof.
Console, Usage and Cost API, provider invoiceAuthoritative billing surfaces for their routes.Use for final spend proof and support packets.

Anthropic's Agent SDK cost tracking docs describe cost fields such as total_cost_usd or costUSD as estimates. They are still useful for catching a runaway session, but they are not the same kind of evidence as an API billing record.

If the problem is "which route is active?", use the sibling Claude Code API key vs subscription billing route map before doing cache math. A subscription session, an API key route, and a provider route can show different meters for the same human workflow.

What Invalidated The Cache In Claude Code?

Claude Code cache invalidator matrix for model MCP compact and route changes

Claude Code's prompt caching docs explain that cache matching is prefix-based. Changes near the beginning of the request can invalidate everything after that point. In practical terms, the last workflow change often matters more than the total token number.

Recent actionCache riskWhat to check next
Switching modelsHighCompare model field and timestamp before/after the spike.
Connecting or disconnecting MCP serversHighMCP tool definitions can change the early prefix.
Denying an entire toolHighA whole-tool denial can change the available tool context.
Running /compactHighCompacting rewrites conversation context and can break reuse.
Upgrading Claude CodeHighVersion changes can alter prompt shape or cache scope.
Changing provider, API route, or gatewayHighCache scope and billing owner may change with the route.
Long idle gapMediumCache TTL may have expired before the next read.
Subagents, forks, worktrees, or directory changesMediumContext can split by path, process, route, or agent scope.
Editing filesLowerFile edits add context, but do not necessarily rewrite the early prefix.
Changing output style or permission modeLowerStill worth noting when the timing matches the spike.
Running /recap or rewindingLowerUseful for navigation, but still verify field changes.

The safest interpretation is conservative: a cache write after a meaningful context change can be expected. A repeated cache write on similar turns, same model, same route, same MCP setup, and short time gap is stronger evidence of abnormal cache behavior.

For broader context bloat from too many MCP servers or large tool definitions, use the Claude Code MCP context overload cleanup path. Cache miss cost is one symptom; context overload is the wider design problem.

Route-Specific Evidence

The same local Claude Code session can produce numbers from different evidence surfaces. Treat each route as a different contract.

RouteWhat the number can meanStronger proof
Claude subscription loginPlan or seat usage, limit windows, reset timing, and included usage behavior.Help Center usage/limits language, account status, /status, plan messages.
Claude API keyPay-as-you-go API token billing.Claude Console usage, Usage and Cost API, project billing settings.
Agent SDKClient-side usage and cost estimate from SDK messages.SDK fields plus Console or Usage and Cost API for billing confirmation.
Bedrock, Vertex, or FoundryProvider-routed model use with provider billing and cache support boundaries.Cloud provider invoice and route-specific request logs.
GatewayGateway usage and provider pass-through behavior.Gateway logs plus upstream provider billing where available.

Anthropic's Claude Code costs docs and Help Center usage article are important because they keep subscription usage and API billing separate. Do not use a subscription limit message to prove API spend. Do not use an API estimate to prove a subscription seat was charged.

When the route is unclear, stop the cache investigation and verify route first:

bash
claude /status /cost

/status names the active account or route. /cost can help monitor API-style spend, but the billing proof still belongs to the route that produced the run.

Fix Ladder

Claude Code cache miss fix ladder and support packet

Use the smallest fix that matches the evidence.

  1. Identify the route. Make sure the number came from subscription usage, API key billing, SDK estimate, provider route, or gateway route.
  2. Read the cache fields. Compare cache_creation_input_tokens with cache_read_input_tokens over several similar turns.
  3. Name the invalidator. Look for model switches, MCP changes, whole-tool denial, /compact, upgrade timing, provider change, TTL expiry, subagents, forks, worktree changes, or directory scope changes.
  4. Apply the smallest fix. Keep the model stable, avoid changing MCP setup mid-task, keep reusable context early and stable, and split unrelated work with /clear.
  5. Use /compact at natural breaks. Compacting is useful when the conversation is too large, but it is not a universal cache-cost fix.
  6. Consider 1-hour TTL only when the cadence fits. A longer TTL has a higher write multiplier, so it helps only when later reads justify that write.
  7. Verify the next turn. A successful fix should reduce repeated creation and increase reads for similar context.

If the real symptom is a plan-window interruption, use Claude Code rate limit or Claude Code rate limit reached. Cache math can explain why a session used more input, but it is not the full quota recovery playbook.

Suspicious Spike Packet

Do not file a vague "cache bug" report. Collect a packet that lets a teammate or support engineer compare the same route and same context shape.

EvidenceWhy it matters
Timestamp and timezoneLets usage records and provider invoices line up.
Claude Code versionUpgrades can affect cache shape.
ModelModel switches are known invalidators.
Active routeSubscription, API key, SDK, provider, or gateway evidence must not be mixed.
MCP server list before and afterMCP connect/disconnect can change the early prefix.
Tool permission changesWhole-tool denial is a known invalidator.
/compact, /clear, /recap, rewind historyConversation-shape actions affect interpretation.
cache_creation_input_tokens and cache_read_input_tokensThe core proof of write versus read behavior.
Similar hit and miss turnsA comparison turn is stronger than one isolated screenshot.
Billing surfaceConsole, Usage and Cost API, provider invoice, or subscription limit message.

The escalation boundary is simple: one cache write after a context change is not enough. Repeated creation with stable route, stable model, stable MCP setup, short timing, and similar context is worth escalating.

FAQ

Is a Claude Code cache miss a separate fee?

No. It is better described as the expensive side of prompt caching: the reusable prefix was processed or written again instead of being read from cache at the cheaper read rate.

Why do people say a cache miss costs 12.5x more?

The number comes from Anthropic's cache multipliers. A 5-minute cache write is 1.25x base input, and a cache read is 0.1x base input. For the same token volume, 1.25 divided by 0.1 equals 12.5.

Which fields prove a cache miss?

Start with cache_creation_input_tokens and cache_read_input_tokens. High creation with low reads means the prefix was written or processed instead of reused. Repeated high creation across similar turns is the stronger signal.

Does /compact reduce Claude Code token costs?

Sometimes, but not as a blanket cache fix. /compact can reduce conversation size at a natural boundary, but it also rewrites context and can break cache reuse. Use it when the conversation is genuinely too large, not as the first reaction to one expensive turn.

Should I enable 1-hour TTL?

Only if the route supports it and the reuse pattern justifies the higher write multiplier. A 1-hour write costs more than a 5-minute write, so the longer TTL helps when later reads would otherwise miss because the shorter cache expired.

Is /cost my Claude bill?

Not by itself. /cost and SDK cost fields are useful estimates for monitoring, especially on API-style routes. The authoritative bill is Claude Console, the Usage and Cost API, a provider invoice, or the subscription surface that belongs to the active route.

When should I blame a Claude Code cache bug?

After you have same-route evidence: timestamp, version, model, route, MCP setup, tool permissions, cache fields, similar hit/miss turns, and billing surface. Without that packet, a normal invalidator or TTL expiry is more likely than a confirmed bug.

Share:

laozhang.ai

One API, All AI Models

AI Image

Gemini 3 Pro Image

$0.05/img
80% OFF
AI Video

Sora 2 · Veo 3.1

$0.15/video
Async API
AI Chat

GPT · Claude · Gemini

200+ models
Official Price
Served 100K+ developers
|@laozhang_cn|Get $0.1