Skip to main content

Claude Code Rate Limit Guide: Understand, Prevent, and Optimize Your Usage (2026)

A
22 min readClaude Code

Claude Code rate limits operate as three independent systems — RPM, TPM, and daily/weekly quotas — and your dashboard percentage reflects only one of them. This guide explains why you can hit limits at 6% reported usage, how to prevent rate limits before they happen, and how to choose between Pro, Max, and API billing based on your actual coding patterns.

Claude Code Rate Limit Guide: Understand, Prevent, and Optimize Your Usage (2026)

Claude Code rate limits confuse developers because the system is more complex than it appears on the surface. Unlike the simple message-based limits of the Claude chat interface, Claude Code operates under three independent rate limit layers that can each independently block your requests. Understanding how these layers interact — and why a dashboard reading of 6% daily usage does not protect you from per-minute throttling — is the difference between a productive coding session and constant interruptions. This guide covers the full rate limit architecture, explains why Claude Code burns through tokens at 10 to 100 times the rate of regular chat, and provides seven concrete strategies that can reduce your effective token consumption by 30 to 60 percent without sacrificing output quality.

TL;DR

  • Claude Code has three independent rate limit layers: RPM (requests per minute), TPM (tokens per minute), and daily/weekly quotas. Hitting one does not affect the others, which is why you can be rate-limited at 6% daily usage.
  • A single Claude Code command generates 8–12 API calls through tool use, consuming 50,000–150,000 tokens for what feels like a simple request. This is 10–100x more than a comparable Claude chat interaction.
  • Pro ($20/mo) provides roughly 40–80 hours of Sonnet per week. Max 5x ($100/mo) gives 140–280 hours. Max 20x ($200/mo) gives 240–480 hours. API billing charges per token with no hard caps.
  • Prevention beats reaction: configuring .claudeignore, using --include for focused context, routing simple tasks to Haiku, and managing sessions strategically can reduce your token usage by 30–60%.
  • Known bugs exist: some users report rate limiting at low reported usage due to platform-side issues, not personal quota exhaustion. If your dashboard shows under 50% but you are being limited, check our detailed fix guide.

Understanding Claude Code's Three-Layer Rate Limit System

Diagram showing Claude Code's three independent rate limit layers: RPM, TPM, and daily-weekly quota operating independently

The most common source of confusion around Claude Code rate limits is that three entirely separate systems can each independently stop your requests, and the error message looks the same regardless of which one triggered it. Understanding this architecture is not just theoretical — it directly determines which fix works for your specific situation and which optimizations will actually help.

The first layer is Requests Per Minute (RPM), which caps how frequently you can call the API within any 60-second window. This is measured in raw request count, regardless of how much data each request carries. For developers on Tier 1 API access (after a $5 credit purchase), the limit is 50 RPM. This sounds generous until you realize that a single Claude Code command can generate 8 to 12 internal API calls through its tool-use architecture — meaning that five rapid commands in sequence could exhaust your entire RPM budget within seconds. The RPM counter resets every 60 seconds, so brief waits resolve RPM issues quickly, but the frustration comes from the invisible multiplication happening behind each visible command.

The second layer is Tokens Per Minute (TPM), which caps the total volume of data flowing through the API within any 60-second window. Anthropic tracks input and output tokens separately, and for Claude Code users, input tokens are almost always the binding constraint. This is because every API call carries the full conversation context — system prompts, conversation history, file contents, and tool definitions — and this context grows with each exchange in a session. A developer who has been working in the same Claude Code session for 30 minutes might find that a single request sends 200,000+ input tokens simply because the accumulated context is included with every call. Tier 1 provides 30,000 ITPM for Sonnet models, while Tier 4 (after $400 in cumulative credit purchases) provides 2,000,000 ITPM (Anthropic official docs, March 2026). The critical optimization detail here is that Anthropic's TPM limits are cache-aware: cached input tokens do not count toward your ITPM limit for most current models, making prompt caching one of the most powerful throughput multipliers available.

The third layer is the daily or weekly quota, which sets the total budget for your usage over a longer period. For subscription users (Pro, Max), this manifests as the usage percentage shown on your dashboard and is measured against rolling windows — a five-hour rolling window for burst activity and a seven-day weekly ceiling introduced on August 28, 2025 (TechCrunch, July 2025). The dashboard percentage that reads "6%" reflects consumption against this daily ceiling only. A developer at 6% daily quota can simultaneously be at 100% of their TPM allocation for the current minute. This is the "burst within budget" problem that confuses nearly every Claude Code user at some point: the daily quota is generous enough to sustain hours of work, but the per-minute limits gate how fast that work can happen.

These three layers do not share a counter and do not interact. A generous daily budget does not help if per-minute throughput is too narrow for your workload. Conversely, having ample RPM and TPM headroom does not matter if you have exhausted your weekly quota. When you encounter a rate limit error, diagnosing which layer triggered it is the essential first step toward resolving it — because the fix for each layer is completely different. An RPM issue resolves with a brief pause or by spacing out commands. A TPM issue requires reducing context size or switching to a smaller model. A quota issue requires waiting for the reset window or upgrading your plan. Applying the wrong fix wastes time while the right one gets you back to coding within minutes.

For API users, there is an additional nuance worth understanding: rate limit headers accompany every API response, not just error responses. The anthropic-ratelimit-requests-remaining and anthropic-ratelimit-tokens-remaining headers tell you exactly how much capacity you have left before any limit triggers. Monitoring these headers proactively — before you hit a 429 — lets you implement intelligent throttling that avoids the disruption entirely.

Why Claude Code Burns Through Tokens So Fast

Visual breakdown of how a single Claude Code command generates 35,000+ tokens through system prompts, file reads, and tool calls

Every developer who has used Claude Code for more than a few days has experienced the same surprise: what felt like twenty minutes of light usage somehow consumed most of their daily quota. The explanation lies in the fundamental architectural difference between Claude Code and the Claude chat interface, and understanding this difference is essential for making informed decisions about plan selection and usage optimization.

When you type a message in the Claude web chat, the token exchange is relatively straightforward — your message goes in, the response comes back, and the total token count is roughly proportional to the combined length of both texts. Claude Code operates fundamentally differently because it is an agentic system that uses tools extensively. Each interaction involves a multi-turn conversation that includes the system prompt (typically 2,000+ tokens from your CLAUDE.md and built-in instructions), the accumulated conversation history, the contents of files pulled into context, and the tool-use tokens generated by operations like file reads, codebase search, and bash command execution.

Consider what happens when you ask Claude Code to "fix the authentication bug in the login module." The system reads your CLAUDE.md file for project context. It searches for relevant files using ripgrep, which is a tool call. It reads the contents of each matching file — more tool calls, more input tokens. It analyzes the code and proposes changes, generating output tokens. It writes the changes to disk through another tool call. It may run tests to verify the fix, adding yet another tool call. Each of these steps is a separate API interaction, and each one carries the full conversation context. A seemingly simple request can easily generate 35,000 or more tokens across 8 to 12 internal API calls (SitePoint, March 2026).

The token multiplication effect becomes even more dramatic over the course of a session. Each subsequent prompt in the same conversation carries the growing context, which means token consumption per request increases over time — not linearly, but proportionally to the total accumulated history. A developer who starts a session and issues 15 iterative commands may find the final command sending over 200,000 input tokens simply because the entire conversation history is included with every call.

This consumption pattern means that certain workflows burn through tokens dramatically faster than others. Multi-file refactoring sessions, where Claude Code needs to read, analyze, modify, and verify changes across multiple files, consume tokens at 3 to 5 times the rate of single-file editing. Running tests after each change adds another multiplier because test output, error messages, and retry logic all contribute to the conversation context, which grows with each iteration. The table below provides rough estimates based on common development tasks:

Task TypeTypical TokensAPI CallsSession Duration Impact
Single file edit30,000–60,0004–6Low
Code review (1 file)40,000–80,0006–8Low-Medium
Multi-file refactor100,000–300,00010–15High
"Lint, fix, test, fix" cycle150,000–400,00012–20Very High
Full project analysis200,000–500,000+15–25Extreme

Understanding these consumption patterns directly informs which optimization strategies will have the biggest impact on your specific workflow. If you primarily do single-file edits, your bottleneck is likely RPM rather than TPM. If you do extensive multi-file work, context management and session resets become critical.

Every Rate Limit Number You Need to Know

Anthropic deliberately keeps some rate limit numbers approximate, particularly for subscription plans where limits are described as "activity limits" rather than exact token counts. The numbers below represent the best available data from official documentation and multiple third-party analyses, verified as of March 2026.

Subscription Plan Limits

PlanMonthly CostWeekly Sonnet HoursWeekly Opus Hours5-Hour WindowBest For
Free$0Very limitedNot available2–5 promptsQuick experiments
Pro$20/mo ($17 annual)40–80 hrsNot available10–40 prompts2–3 hrs/day coding
Max 5x$100/mo140–280 hrs15–35 hrs50–200 prompts4–6 hrs/day coding
Max 20x$200/mo240–480 hrs24–40 hrs200–800 promptsFull-time development

All subscription plans share a common usage bucket across the Claude chat interface and Claude Code. Max plans multiply the allowance relative to Pro, but exact multipliers for per-minute limits (RPM/TPM) are not publicly documented (claude.com/pricing, March 2026). Weekly caps were introduced on August 28, 2025 and Anthropic reports they affect fewer than 5% of subscribers based on usage patterns.

API Rate Limits by Tier

For developers using Claude Code with their own API key, limits are explicit and scale with cumulative credit purchases:

TierCredit RequirementRPMInput TPM (Sonnet)Output TPMDaily Budget
Tier 1$55030,0008,000~10M tokens
Tier 2$401,000450,00090,000~33M tokens
Tier 3$2002,000800,000160,000~83M tokens
Tier 4$4004,0002,000,000400,000~166M tokens

The Anthropic API uses a token bucket algorithm, meaning your capacity continuously replenishes up to your maximum rather than resetting at fixed intervals (platform.claude.com/docs/en/api/rate-limits, March 2026). This matters because short bursts above the per-second rate are sometimes permitted as long as the overall per-minute budget is not exceeded.

Current Promotions

As of March 2026, Anthropic is running a promotion through March 27, 2026, that doubles your five-hour usage allocation during off-peak hours — specifically outside 8:00 AM to 2:00 PM Eastern Time (support.claude.com, March 13, 2026). These promotions are not always well-publicized, so periodically checking the Claude Help Center is worthwhile.

Pro vs Max vs API Billing: Choosing the Right Plan

Comparison of Claude Code plans showing Pro, Max 5x, Max 20x, and API billing with pricing and usage recommendations

Choosing the right plan is fundamentally a question of matching your actual usage pattern to the pricing structure that minimizes either cost or disruption. The wrong choice either wastes money on unused capacity or creates constant rate-limit interruptions that cost more in lost productivity than the savings on subscription fees.

If you code 2–3 focused hours per day, Pro at $20 per month is typically sufficient. The daily reset means you start each day with a fresh quota, which works well for consistent, moderate usage. Morning code reviews, afternoon debugging sessions, and occasional architecture questions fit comfortably within Pro limits. The plan breaks down when you have intensive sessions that exceed the daily allocation — if you hit the Pro limit before finishing your work more than twice per week, the upgrade math favors Max.

If you code 4–6 hours per day and rely on Claude Code as a primary development tool, Max 5x at $100 per month is the sweet spot. The 5x multiplier over Pro provides substantially more headroom for extended coding sessions, and Max plans include priority access during high-traffic periods, which means fewer rate limits caused by platform-wide capacity constraints rather than personal quota exhaustion. The break-even point between Pro and Max 5x occurs at roughly 4 to 5 hours of daily Claude Code usage — if you consistently exhaust Pro limits before finishing your work, the $80 monthly premium typically pays for itself in recovered productivity within the first week.

If you code 8+ hours per day or run concurrent sessions, Max 20x at $200 per month provides the highest subscription-tier throughput available. This tier is designed for power users doing extensive automated refactoring, running multiple Claude Code instances, or working on large codebases where context sizes regularly exceed 100,000 tokens per request.

API pay-as-you-go billing removes subscription limits entirely and charges per token: $3 per million input tokens and $15 per million output tokens for Sonnet 4.6 (claude.com/pricing, March 2026). For a developer averaging 100,000 combined tokens per day, the monthly cost would be approximately $25 to $40, comparable to Pro but without hard limits. The advantage is complete flexibility — you only hit per-minute API tier limits, which can be raised by depositing more credits. The disadvantage is cost unpredictability: an intensive session could cost $20 to $50 in a single day. For teams evaluating API-based access, services like laozhang.ai provide API relay access with competitive per-token pricing and no speed restrictions, offering a cost-effective alternative to direct Anthropic billing while avoiding subscription rate limits entirely.

The Batch API is worth considering for non-urgent tasks. It processes requests asynchronously at 50% of standard pricing and operates under separate rate limits from real-time usage (claude.com/pricing, March 2026). Offloading batch-compatible work — documentation generation, code quality analysis across multiple modules, review summaries, and test generation — to the Batch API frees up your real-time quota for interactive development. This is particularly powerful for teams where some tasks are time-sensitive (active debugging, live code review) while others can tolerate a delay of minutes or hours (generating comprehensive documentation, running security audits across the codebase). The cost savings compound quickly: a team generating 1,000 pages of documentation per month through the Batch API saves roughly 50% compared to real-time pricing, while simultaneously preserving real-time capacity for the interactive work that cannot wait.

To make the decision concrete, consider tracking your actual usage for one week before committing to a plan change. Monitor how many times you hit rate limits, what time of day the limits occur, and what type of work you were doing when the limit triggered. This data transforms the plan decision from guesswork into a calculation. If you hit limits primarily during intensive afternoon coding sessions but rarely in the morning, the March 2026 off-peak promotion alone might solve your problem without any plan upgrade. If you hit limits consistently throughout the day, a tier upgrade or switch to API billing is the appropriate solution.

Seven Strategies to Prevent Rate Limits Before They Hit

The most effective way to avoid rate limits is to reduce token consumption per interaction while maintaining output quality. These strategies can be implemented in under thirty minutes and typically reduce effective token usage by 30 to 60 percent.

Strategy 1: Configure .claudeignore to exclude irrelevant files. When Claude Code indexes your project, every file that enters the context window consumes tokens. Create a .claudeignore file in your project root — its syntax mirrors .gitignore — and exclude directories like node_modules/, dist/, .git/, build/, large data files, generated code, and binary assets. A typical JavaScript project can reduce per-request context by 40 to 70 percent with a well-configured .claudeignore file. This is the single highest-impact optimization because it reduces token consumption on every subsequent interaction without changing your workflow at all. For a practical starting point, most web projects benefit from ignoring test fixtures, mock data, compiled output, and vendored dependencies. The key insight is that Claude Code does not need to see files that you would never ask it to modify — and in most codebases, 70 to 90 percent of files fall into that category. Review your .claudeignore periodically as your project structure evolves, because new build artifacts or generated files can silently inflate context sizes over time.

Strategy 2: Use focused context with the --include flag. Instead of letting Claude Code search your entire project for relevant files, use the --include flag to specify exactly which files to load. Running claude "review the auth logic" --include src/auth/** restricts the context to the authentication module, avoiding the token cost of loading unrelated code. For targeted tasks like fixing a bug in a specific module, this single change can reduce input tokens by 50 to 80 percent compared to an unfocused request.

Strategy 3: Route tasks to appropriate models. Not every task needs the most capable model. Reserve Opus 4.6 for complex multi-file refactoring, security-sensitive code review, and architectural decisions where reasoning depth matters. Use Sonnet 4.6 for standard code reviews, documentation generation, and straightforward implementations — it handles most professional development tasks at a fraction of Opus token costs. Switch to Haiku 4.5 for quick questions, simple edits, syntax checks, and formatting tasks. You can switch models mid-session with /model sonnet or /model haiku, and this change takes effect immediately for the next prompt. Many developers find that Haiku handles 60 to 70 percent of routine coding tasks adequately while consuming a fraction of the token budget. A practical routing heuristic: if the task involves understanding relationships between multiple files or requires creative problem-solving, use Sonnet or Opus; if the task involves applying a known pattern to a single file, Haiku is sufficient. This mental model helps you make quick routing decisions without overthinking each interaction, and over the course of a week it can reduce your overall token consumption by 25 to 40 percent.

Strategy 4: Manage sessions to control context growth. Claude Code conversations accumulate context over time, and a session that starts with 5,000 tokens of history can reach 50,000 tokens after thirty minutes of active development. Each subsequent prompt carries this growing context, meaning the fifteenth command in a session costs dramatically more tokens than the first — not because the command is more complex, but because the accumulated history has ballooned. The most effective mitigation is breaking long sessions into shorter, focused conversations. When you finish one logical task — fixing a bug, implementing a feature, reviewing a module — start a new Claude Code session for the next task rather than continuing in the same conversation. This resets the context window and keeps per-interaction costs from spiraling. The /compact command provides a middle ground between a full session reset and letting context grow unchecked. It summarizes the current conversation into a condensed form, preserving key decisions and context while discarding the verbose intermediate exchanges. Use /compact every 10 to 15 exchanges, or whenever you notice response times slowing down — slower responses are often a signal that the context window has grown large enough to impact both performance and token consumption.

Strategy 5: Batch related requests into single prompts. Every new prompt carries the full conversation context, so five small questions cost far more tokens than one comprehensive request. Instead of asking "What does function X do?" followed by "What does function Y do?" followed by "How do X and Y interact?", combine them into a single prompt: "Explain functions X and Y and how they interact, including shared state and dependencies." This reduces API calls from three to one and eliminates redundant context transmission.

Strategy 6: Save complex explanations locally. When Claude Code provides a detailed explanation of your codebase architecture, database schema, or API design, save it to a local file: claude "explain the database schema" > docs/schema-explanation.md. Referencing this saved file later costs far fewer tokens than asking Claude Code to re-analyze and re-explain the same code from scratch. This approach also keeps valuable documentation readily available even when you are offline or rate-limited.

Strategy 7: Schedule intensive work strategically. Per-minute counters reset every 60 seconds, and daily quotas reset on schedules that vary by plan type. Distributing your most token-intensive work across the day rather than concentrating it in a two-hour burst prevents repeated TPM ceiling collisions. If you can shift heavy coding to off-peak hours, promotions like the current March 2026 double-usage period (outside 8 AM–2 PM ET through March 27) effectively give you twice the quota at no additional cost.

What to Do When You Hit the Limit

Despite the best prevention strategies, rate limits will occasionally trigger — especially during intensive coding sessions or when platform-wide demand is high. The key is resolving the issue quickly and getting back to work within minutes rather than hours.

The fastest fix is switching to a lighter model. Type /model haiku in your Claude Code session to switch to Haiku 4.5, which may still have available quota when your Sonnet or Opus allocation is exhausted. Haiku handles straightforward tasks like formatting, simple edits, and syntax questions effectively, letting you continue productive work while your primary model quota recovers.

If model switching does not help, check your exact usage and reset time. Run claude --account in your terminal to see your subscription tier and approximate usage. Visit claude.ai, navigate to Settings, and check your usage percentage and countdown to the next reset. Pro plans use daily rolling resets, while Max plans use weekly rolling windows.

For developers who cannot afford downtime, switching to API billing provides immediate relief. API billing through console.anthropic.com charges per token with no hard subscription caps. Configure Claude Code with your API key by running claude config set apiKey YOUR_API_KEY. This approach trades cost predictability for guaranteed availability.

If the error persists despite low reported usage, you may be encountering a known bug rather than a legitimate rate limit. GitHub issue #29579 documents cases where Max subscribers received rate limit errors at only 16% reported usage, and issue #33120 describes scenarios where every command returns a rate limit error regardless of actual activity. Try signing out with claude logout and back in with claude login, check for orphaned background processes with ps aux | grep claude, and if the issue persists across machines, contact Anthropic support. For a comprehensive walkthrough of every diagnostic step, our complete fix guide for "Rate Limit Reached" errors covers the full diagnostic flowchart including subscription vs API vs bug identification.

While rate-limited, consider using alternative tools to maintain productivity rather than stopping work entirely. Gemini CLI offers a generous free tier with 60 RPM and 1,000 requests per day through Google OAuth authentication and a massive 1 million token context window — install it alongside Claude Code as a fallback that takes under two minutes to set up. GitHub Copilot CLI is included with Copilot subscriptions and handles completions and chat effectively through an interface familiar to most developers. For a detailed comparison of Claude Code against self-hosted alternatives that eliminate rate limit concerns entirely, see our Claude Code vs OpenClaw analysis.

The most productive approach during a rate limit period is to focus on tasks that genuinely do not require AI assistance: writing tests manually, reviewing pull requests from teammates, updating documentation, handling administrative tasks, or tackling straightforward bug fixes that rely on your existing knowledge of the codebase. Many developers report that forced breaks from AI-assisted coding actually improve their understanding of their own project, because they spend more time reading and reasoning about code rather than delegating that cognitive work to an AI tool. Rate limits, while frustrating in the moment, can serve as a natural checkpoint that prevents over-reliance on AI assistance for tasks where human judgment is both faster and more reliable.

Frequently Asked Questions

How long does it take for Claude Code rate limits to reset?

Reset timing depends on which rate limit layer you have hit. RPM and TPM counters reset every 60 seconds, so per-minute limits resolve quickly. Subscription daily quotas reset on a rolling basis — Pro plans reset continuously throughout the day, while Max plans use a weekly rolling window. The exact reset time is shown in your claude.ai Settings panel. API tier limits use a token bucket algorithm that continuously replenishes, so partial capacity returns within seconds of any gap in usage.

Why does Claude Code use so many more tokens than Claude chat?

Claude Code is an agentic system that executes tool calls — file reads, searches, command execution, and file writes — as part of fulfilling your requests. Each tool call is a separate API interaction that carries the full conversation context. A single user command can generate 8 to 12 internal API calls, each transmitting the accumulated system prompt, conversation history, and file contents. The Claude chat interface, by comparison, involves simple request-response exchanges without tool use, resulting in dramatically lower token consumption per interaction.

Is it worth upgrading from Pro to Max just for Claude Code?

The upgrade is worth it if you consistently hit Pro limits before finishing your work. The break-even calculation is straightforward: if rate-limit downtime costs you more than $80 per month in lost productivity (the price difference between Pro and Max 5x), upgrading pays for itself. For professional developers billing at $100+ per hour, even one hour of downtime per week exceeds the cost difference. If you hit Pro limits fewer than twice per week, optimization strategies (model routing, context management) may be more cost-effective than upgrading.

Can I use Claude Code for free?

The Claude Free plan provides limited daily messages but does not include full Claude Code functionality. Pro at $20 per month ($17 with annual billing) is the minimum tier with Claude Code and Cowork access (claude.com/pricing, March 2026). For free AI coding alternatives, Gemini CLI offers 60 RPM and 1,000 requests per day with Google OAuth, and GitHub Copilot CLI is included with existing Copilot subscriptions.

What is the difference between a 429 error and a 529 error?

A 429 HTTP status code means you have exceeded a rate limit — your request was valid but you need to wait before sending more. A 529 status code means the API servers are overloaded regardless of your personal quota. Both require retry logic, but the strategies differ: for 429 errors, respect the retry-after header and implement exponential backoff; for 529 errors, use a starting delay of 1 to 5 seconds with exponential growth, and do not count the wait time against your rate limit backoff timer. Claude Code has built-in retry logic for both, so by the time you see an error, internal retries have already been attempted.

How can I monitor my rate limit usage in real time?

Every API response from Anthropic includes rate limit headers: anthropic-ratelimit-requests-remaining shows how many requests you have left in the current minute window, anthropic-ratelimit-tokens-remaining shows your remaining token budget, and anthropic-ratelimit-tokens-reset provides a timestamp for when limits replenish. For subscription users, the claude.ai Settings page shows usage percentage and reset countdown, though there is a reported lag between actual consumption and dashboard updates. For real-time accuracy, header-based monitoring is the only reliable method. If you are building tools on top of the Claude API, monitoring these headers proactively lets you implement intelligent throttling that slows requests as you approach the limit rather than triggering 429 errors.

Does prompt caching help with rate limits?

Yes, and this is one of the most underutilized optimizations available. Anthropic's ITPM (Input Tokens Per Minute) limits are cache-aware: cached input tokens do not count toward your ITPM limit for most current models. When you have consistent content that repeats across interactions — your CLAUDE.md system prompt, project documentation, frequently referenced files — prompt caching lets you effectively bypass the input token bottleneck. With an 80 percent cache hit rate, you could process five times your nominal ITPM limit, meaning a Tier 1 developer with a 30,000 ITPM limit could effectively handle 150,000 input tokens per minute of cached content. To maximize cache hits, keep your CLAUDE.md content stable across sessions and structure your prompts so that the unchanging context appears first.

Share:

laozhang.ai

One API, All AI Models

AI Image

Gemini 3 Pro Image

$0.05/img
80% OFF
AI Video

Sora 2 · Veo 3.1

$0.15/video
Async API
AI Chat

GPT · Claude · Gemini

200+ models
Official Price
Served 100K+ developers
|@laozhang_cn|Get $0.1