A Claude Code cache miss usually means the cheap cache read did not happen for the reusable prefix. The expensive turn is the cache write or full input processing side of prompt caching, plus any output tokens generated by the model.
As of May 24, 2026, Anthropic's cache pricing makes the common "12.5x" shorthand simple: a 5-minute cache write is priced at 1.25x base input, while a cache read is 0.1x base input for the same tokens. That is a write/read comparison, not an official cache-miss surcharge.
Before changing workflow, name the route that produced the number: subscription usage, API key billing, an Agent SDK estimate, Bedrock, Vertex, Foundry, or a gateway. Then check cache_creation_input_tokens, cache_read_input_tokens, recent invalidators such as model/MCP changes or /compact, and the billing surface that owns that route. If those records do not line up, collect timestamps, model, route, version, cache fields, and invalidator history before calling it a bug.
Fast Answer: Prove Route Before Fixing Cache
| Question | Short answer | Proof to collect |
|---|---|---|
| Is a Claude Code cache miss a separate fee? | No. It means the cached prefix was not read cheaply and had to be processed or written again. | Cache creation/read fields for the same route. |
| Why does one turn look 12.5x more expensive? | 1.25x cache write divided by 0.1x cache read equals 12.5x for the same token volume. | Current Anthropic pricing page and usage fields. |
| What should you check first? | Route owner, then cache creation and cache read tokens. | /status, /cost when relevant, SDK usage, Console usage, provider invoice. |
| When is it suspicious? | Repeated creation on similar turns after you expected stable cache reuse. | Similar timestamps, same model, same route, stable MCP/tool setup, field deltas. |
| What should not be the first fix? | Blindly compacting or switching routes. | Identify the invalidator before changing context shape. |
The broader OpenAI vs Claude cache pricing comparison is useful when you are choosing providers. Stay here when the symptom is narrower: Claude Code shows a costly turn, high cache creation, low cache read, or confusing usage drain, and you need to explain the current run.
The 12.5x Number Is Write/Read Math
Anthropic's Claude API pricing page separates cache writes and cache reads. A 5-minute cache write is 1.25x the base input rate. A 1-hour cache write is 2x the base input rate. A cache read is 0.1x the base input rate.
For the same reusable prefix:
text5-minute cache write / cache read = 1.25 / 0.1 = 12.5 1-hour cache write / cache read = 2.0 / 0.1 = 20
That ratio is a comparison between token classes. It does not mean Claude Code has a separate "cache miss fee" line item. It means the turn did not benefit from the cheap read path for that reusable context.
The difference matters most when the prefix is large: long conversation history, project context, tool results, MCP server definitions, subagent context, file packs, or a large system prompt. A small cache miss may be noise. A large cache write repeated turn after turn is the pattern that deserves attention.
Token Class Ledger

Read the usage record by token class, not by total tokens alone.
| Token or meter | What it means | How to use it |
|---|---|---|
| Full input tokens | Normal prompt input processed by the model. | Baseline cost when cache is absent or only part of the prefix matches. |
cache_creation_input_tokens | Tokens written into prompt cache. | One expected write can be normal; repeated high creation is the diagnostic signal. |
cache_read_input_tokens | Tokens read from an existing cache entry. | High reads relative to creation usually mean caching is working. |
| Output tokens | Tokens generated by the model. | Cache savings do not make output free. |
| Subscription usage | Plan or seat usage, often shown as limit/reset behavior. | Do not compare directly to API invoices. |
/cost and SDK cost fields | Helpful local or client-side estimates. | Useful for tracking, weaker than Console or provider billing proof. |
| Console, Usage and Cost API, provider invoice | Authoritative billing surfaces for their routes. | Use for final spend proof and support packets. |
Anthropic's Agent SDK cost tracking docs describe cost fields such as total_cost_usd or costUSD as estimates. They are still useful for catching a runaway session, but they are not the same kind of evidence as an API billing record.
If the problem is "which route is active?", use the sibling Claude Code API key vs subscription billing route map before doing cache math. A subscription session, an API key route, and a provider route can show different meters for the same human workflow.
What Invalidated The Cache In Claude Code?

Claude Code's prompt caching docs explain that cache matching is prefix-based. Changes near the beginning of the request can invalidate everything after that point. In practical terms, the last workflow change often matters more than the total token number.
| Recent action | Cache risk | What to check next |
|---|---|---|
| Switching models | High | Compare model field and timestamp before/after the spike. |
| Connecting or disconnecting MCP servers | High | MCP tool definitions can change the early prefix. |
| Denying an entire tool | High | A whole-tool denial can change the available tool context. |
Running /compact | High | Compacting rewrites conversation context and can break reuse. |
| Upgrading Claude Code | High | Version changes can alter prompt shape or cache scope. |
| Changing provider, API route, or gateway | High | Cache scope and billing owner may change with the route. |
| Long idle gap | Medium | Cache TTL may have expired before the next read. |
| Subagents, forks, worktrees, or directory changes | Medium | Context can split by path, process, route, or agent scope. |
| Editing files | Lower | File edits add context, but do not necessarily rewrite the early prefix. |
| Changing output style or permission mode | Lower | Still worth noting when the timing matches the spike. |
Running /recap or rewinding | Lower | Useful for navigation, but still verify field changes. |
The safest interpretation is conservative: a cache write after a meaningful context change can be expected. A repeated cache write on similar turns, same model, same route, same MCP setup, and short time gap is stronger evidence of abnormal cache behavior.
For broader context bloat from too many MCP servers or large tool definitions, use the Claude Code MCP context overload cleanup path. Cache miss cost is one symptom; context overload is the wider design problem.
Route-Specific Evidence
The same local Claude Code session can produce numbers from different evidence surfaces. Treat each route as a different contract.
| Route | What the number can mean | Stronger proof |
|---|---|---|
| Claude subscription login | Plan or seat usage, limit windows, reset timing, and included usage behavior. | Help Center usage/limits language, account status, /status, plan messages. |
| Claude API key | Pay-as-you-go API token billing. | Claude Console usage, Usage and Cost API, project billing settings. |
| Agent SDK | Client-side usage and cost estimate from SDK messages. | SDK fields plus Console or Usage and Cost API for billing confirmation. |
| Bedrock, Vertex, or Foundry | Provider-routed model use with provider billing and cache support boundaries. | Cloud provider invoice and route-specific request logs. |
| Gateway | Gateway usage and provider pass-through behavior. | Gateway logs plus upstream provider billing where available. |
Anthropic's Claude Code costs docs and Help Center usage article are important because they keep subscription usage and API billing separate. Do not use a subscription limit message to prove API spend. Do not use an API estimate to prove a subscription seat was charged.
When the route is unclear, stop the cache investigation and verify route first:
bashclaude /status /cost
/status names the active account or route. /cost can help monitor API-style spend, but the billing proof still belongs to the route that produced the run.
Fix Ladder

Use the smallest fix that matches the evidence.
- Identify the route. Make sure the number came from subscription usage, API key billing, SDK estimate, provider route, or gateway route.
- Read the cache fields. Compare
cache_creation_input_tokenswithcache_read_input_tokensover several similar turns. - Name the invalidator. Look for model switches, MCP changes, whole-tool denial,
/compact, upgrade timing, provider change, TTL expiry, subagents, forks, worktree changes, or directory scope changes. - Apply the smallest fix. Keep the model stable, avoid changing MCP setup mid-task, keep reusable context early and stable, and split unrelated work with
/clear. - Use
/compactat natural breaks. Compacting is useful when the conversation is too large, but it is not a universal cache-cost fix. - Consider 1-hour TTL only when the cadence fits. A longer TTL has a higher write multiplier, so it helps only when later reads justify that write.
- Verify the next turn. A successful fix should reduce repeated creation and increase reads for similar context.
If the real symptom is a plan-window interruption, use Claude Code rate limit or Claude Code rate limit reached. Cache math can explain why a session used more input, but it is not the full quota recovery playbook.
Suspicious Spike Packet
Do not file a vague "cache bug" report. Collect a packet that lets a teammate or support engineer compare the same route and same context shape.
| Evidence | Why it matters |
|---|---|
| Timestamp and timezone | Lets usage records and provider invoices line up. |
| Claude Code version | Upgrades can affect cache shape. |
| Model | Model switches are known invalidators. |
| Active route | Subscription, API key, SDK, provider, or gateway evidence must not be mixed. |
| MCP server list before and after | MCP connect/disconnect can change the early prefix. |
| Tool permission changes | Whole-tool denial is a known invalidator. |
/compact, /clear, /recap, rewind history | Conversation-shape actions affect interpretation. |
cache_creation_input_tokens and cache_read_input_tokens | The core proof of write versus read behavior. |
| Similar hit and miss turns | A comparison turn is stronger than one isolated screenshot. |
| Billing surface | Console, Usage and Cost API, provider invoice, or subscription limit message. |
The escalation boundary is simple: one cache write after a context change is not enough. Repeated creation with stable route, stable model, stable MCP setup, short timing, and similar context is worth escalating.
FAQ
Is a Claude Code cache miss a separate fee?
No. It is better described as the expensive side of prompt caching: the reusable prefix was processed or written again instead of being read from cache at the cheaper read rate.
Why do people say a cache miss costs 12.5x more?
The number comes from Anthropic's cache multipliers. A 5-minute cache write is 1.25x base input, and a cache read is 0.1x base input. For the same token volume, 1.25 divided by 0.1 equals 12.5.
Which fields prove a cache miss?
Start with cache_creation_input_tokens and cache_read_input_tokens. High creation with low reads means the prefix was written or processed instead of reused. Repeated high creation across similar turns is the stronger signal.
Does /compact reduce Claude Code token costs?
Sometimes, but not as a blanket cache fix. /compact can reduce conversation size at a natural boundary, but it also rewrites context and can break cache reuse. Use it when the conversation is genuinely too large, not as the first reaction to one expensive turn.
Should I enable 1-hour TTL?
Only if the route supports it and the reuse pattern justifies the higher write multiplier. A 1-hour write costs more than a 5-minute write, so the longer TTL helps when later reads would otherwise miss because the shorter cache expired.
Is /cost my Claude bill?
Not by itself. /cost and SDK cost fields are useful estimates for monitoring, especially on API-style routes. The authoritative bill is Claude Console, the Usage and Cost API, a provider invoice, or the subscription surface that belongs to the active route.
When should I blame a Claude Code cache bug?
After you have same-route evidence: timestamp, version, model, route, MCP setup, tool permissions, cache fields, similar hit/miss turns, and billing surface. Without that packet, a normal invalidator or TTL expiry is more likely than a confirmed bug.
