Skip to main content

Grok API Pricing and Cost Guide 2026: Tokens, Free Tier, and Real Use-Case Examples

A
14 min readAPI Guides

A current Grok API cost worksheet with official xAI rows, free-tier caveats, and realistic examples for chat, documents, coding, and research agents.

Grok API Pricing and Cost Guide 2026: Tokens, Free Tier, and Real Use-Case Examples

As checked against xAI Docs on July 2, 2026, Grok API pricing for grok-4.3 is listed at $1.25 input, $0.20 cached input, and $2.50 output per 1M tokens. Public xAI docs do not guarantee a permanent official free API tier; the safe assumption is that credits, eligibility, limits, and billing mode must be verified in your own xAI console before you scale.

Treat the official token row as the base, not the budget. A useful estimate is: input tokens plus cached input plus output and reasoning tokens, plus server-side tools, storage, batch or priority mode, retries, and the spending limits on your team account. That is why the practical question is not just "what is the Grok API price?" but "what will this workload cost when it succeeds?"

Keep the contracts separate: xAI API pricing is not the same thing as Grok app or X subscription access, and third-party free or provider routes are not official xAI pricing rows. Start with a small prepaid test, log token and tool usage, compare it with the worksheet below, then decide whether the workload is safe to scale.

Quick Worksheet Before You Spend

Use this first when you need a fast budget answer.

QuestionCurrent answerBudget action
What is the main Grok API price row?xAI Docs list grok-4.3 at $1.25 input, $0.20 cached input, and $2.50 output per 1M tokens.Use this as the base row for general Grok API work, then recheck the pricing page before publishing a number.
Is there a free Grok API tier?Public docs do not guarantee a permanent official free API tier. The quickstart tells users to load credits before using the API.Check your console for account-specific credits or promotions; do not assume another account's credit state applies to you.
What changes the invoice most?Token mix, cached input, output length, tool calls, batch eligibility, priority service, storage, retries, and rate-limit tier.Build a per-workload worksheet instead of copying one sticker price.
What is the safest test route?Small prepaid usage with postpaid limit set low or to zero, plus token and tool logging.Stop if the console, logs, or model row does not match the worksheet.

The worksheet uses xAI documentation as the source of record for official prices and treats third-party "free" routes only as separate provider contracts.

Current Official Grok API Price Rows

The official price table lives in xAI's pricing documentation, not in search snippets or provider calculators. On July 2, 2026, the relevant Chat API rows visible in that page were:

Model rowContext listedInput per 1MCached input per 1MOutput per 1MPractical use
grok-4.31M$1.25$0.20$2.50Default current Grok API row for most text and image-input work.
grok-build-0.1256k$1.00$0.20$2.00Lower listed row for build-oriented work where it is available and fits quality needs.
grok-4.20-multi-agent-03091M$1.25$0.20$2.50Specialized row; verify availability and intended use before routing traffic.
grok-4.20-0309-reasoning1M$1.25$0.20$2.50Reasoning row; budget from measured output and reasoning behavior, not name alone.
grok-4.20-0309-non-reasoning1M$1.25$0.20$2.50Non-reasoning row; test quality and output length before choosing it for cost.

The Grok 4.3 model page lists grok-4.3, aliases such as grok-4.3-latest and grok-latest, text and image input, text output, a 1M-token context window, and the same $1.25 / $0.20 / $2.50 per-1M price row. It also says requests exceeding the 200K context window can use different rates, so very long-context tests should be measured separately.

Official Grok API price variables board with grok-4.3, grok-build-0.1, tools, batch, priority, and storage rows.

Do not freeze these rows as timeless. xAI can change pricing, available models, rate limits, regions, or console access. For production documentation, copy the date, the exact model ID, and the docs URL into your release note or internal budget sheet.

Tool, Batch, Priority, and Storage Costs

Most bad Grok API budgets fail because they price tokens only. xAI's pricing page also lists add-on costs and modifiers that can matter as much as the model row.

Cost surfaceListed price or rule checked July 2, 2026When it matters
Web Search$5 per 1k callsAgents that need current web evidence.
X Search$5 per 1k callsSocial or real-time X evidence workflows.
Code Execution$5 per 1k callsCoding, data, or sandboxed execution agents.
File Attachments search$10 per 1k callsLarge document workflows using uploaded files.
Collections Search / RAG$2.50 per 1k callsRetrieval-heavy knowledge-base work.
Batch API20%-50% off standard text/language token rates, usually within 24 hoursNon-urgent bulk jobs where latency is flexible.
Priority Processing2x standard token rates after prompt caching discountsLatency-sensitive work where priority service is explicitly used.
File storage$0.025/GiB/dayUploaded files retained across jobs.
Collection storage$0.10/GiB/dayStored retrieval collections.
Downloads$0.20/GiB downloadedFile or collection export workflows.

This table is why "Grok API pricing" should be written as a formula. A research agent with web search can spend more on tool calls than a simple support draft spends on tokens. A repeated support prompt can be much cheaper if cached input actually hits. A batch summarization job can be cheaper than synchronous traffic if the job can wait.

Does Grok API Have a Free Tier?

The safe public answer is no durable official free API tier is guaranteed by the xAI docs checked on July 2, 2026. The quickstart says to sign up, then load the account with credits to start using the API. That is not the same as a permanent free tier.

Use this split:

RouteWhat it can meanHow to describe it safely
Official xAI APIUsage billed or credited inside your xAI team/account."Verify credits, eligibility, and billing mode in your xAI console."
Console credits or promotionsAccount-specific credit state that can change."Credits may exist for your account, but public docs do not make them universal."
Third-party free routeA provider absorbs, sponsors, proxies, or limits usage under its own contract."This is a provider route, not the official xAI price row."
Grok app or X subscriptionConsumer access to Grok through an app or subscription."This is separate from API billing."

The difference matters. If your budget sheet says "free" because one tutorial showed a provider route, your production API plan may be wrong on day one. If your account has promotional credits, record the credit balance and expiration behavior separately from model pricing. If your billing mode changes from prepaid-only to postpaid, update the stop rule.

The Grok API Cost Formula

Use this formula before you compare workloads:

text
estimated cost = input_tokens / 1,000,000 * input_price + cached_input_tokens / 1,000,000 * cached_input_price + output_tokens / 1,000,000 * output_price + tool_calls / 1,000 * tool_call_price + storage_gib_days * storage_price + downloads_gib * download_price + retry_cost + priority_multiplier_or_batch_discount

That formula is intentionally explicit. It prevents three common mistakes:

  1. Treating cached input as automatic savings before cache hits are measured.
  2. Treating server-side tools as free because they are hidden behind an agent flow.
  3. Treating the base row as final even when retry, priority, or storage behavior changes the effective cost.

A practical worksheet should track these columns for each workload:

Worksheet columnWhy it belongs
Model IDAliases can move; a pinned row is easier to audit.
Input tokens per taskLong context, policies, examples, and retrieved text can dominate.
Cached input tokens per taskRepeated prefixes can reduce cost only when cache behavior is real.
Output tokens per taskLong answers, JSON, and repair loops inflate output spend.
Tool calls per taskSearch, code, file, and RAG tools have their own price surfaces.
Retry rateA cheap first attempt is expensive if the third attempt is the accepted result.
Batch eligibleNon-urgent work may earn token discounts.
Priority requiredPriority mode can double token rates.
Storage retainedFile and collection storage becomes a daily cost.
Console limitSpending limits decide whether mistakes stop early or run into postpaid usage.

Four Use-Case Cost Examples

These examples are worksheets, not universal monthly quotes. Replace the assumptions with your own logs before you scale.

Grok API use-case cost examples comparing support chat, documents, coding, and research workloads by variables.

Support Chat

A support bot is usually output-sensitive and cache-sensitive. The repeated system prompt, policy block, tone rules, and tool instructions may be good cached-input candidates. The expensive part often becomes accepted answer length, handoff summaries, and retries after a poor answer.

AssumptionExample valueCost implication
Requests100,000 replies/monthHigh volume makes small per-task differences visible.
Input800 fresh tokens/replyBase input is usually manageable.
Cached input1,200 repeated tokens/replyCache hit rate can materially reduce cost.
Output350 tokens/replyOutput price matters more than many teams expect.
Tools0 to 1 retrieval/search call/replyTool calls can overtake token savings if used on every reply.

The control rule: cache stable instructions, cap answer length, log accepted vs retried replies, and sample quality before routing all tickets.

Documents and RAG

Document workflows are input-heavy. A single answer can include retrieved passages, file search, user query, policy text, and a long output. The token row may look cheap until the retrieval layer adds file search or collections calls.

AssumptionExample valueCost implication
Requests20,000 answers/monthMedium volume with large context can still be expensive.
Input6,000 fresh tokens/answerRetrieval size is the main lever.
Cached input1,000 repeated tokens/answerStable instructions help, but retrieved chunks are usually fresh.
Output700 tokens/answerCitations and summaries increase output.
ToolsCollections Search or File Attachments searchTool rows must be counted separately.

The control rule: retrieve fewer, better chunks; keep citations compact; set a maximum context budget; and compare answer quality before widening retrieval.

Coding Assistant

Coding work can be cheap for short suggestions and expensive for agentic loops. The cost driver is not just tokens; it is the number of attempts, tool calls, code execution, and review time before a change is accepted.

AssumptionExample valueCost implication
Tasks5,000 coding turns/monthTurn count can hide multi-attempt work.
Input2,500 tokens/turnFiles, diffs, tests, and instructions add up.
Cached input500 tokens/turnReused repo instructions may help.
Output900 tokens/turnPatch explanations and structured responses can be long.
ToolsCode Execution when enabledTool fees and retry loops need their own line.

The control rule: log accepted changes, failed test runs, retries, and human review minutes. Successful-task cost matters more than first-response token cost.

Research Agent

A research agent can look cheap in tokens and expensive in tools. Web Search, X Search, file search, and long evidence summaries can dominate. This is also the workload where stale or unsupported facts are most costly.

AssumptionExample valueCost implication
Reports1,000 reports/monthLower volume can still be expensive per task.
Input4,000 tokens/reportQuery plan, evidence, and instructions are substantial.
Cached input800 tokens/reportReusable report scaffolds may cache.
Output1,500 tokens/reportEvidence packets and summaries are output-heavy.
ToolsMultiple Web Search or X Search callsTool calls can dominate the total.

The control rule: cap tool calls, require source quality, batch non-urgent research, and stop if the agent cannot show which facts came from current official sources.

Rate Limits and Billing Controls

xAI's rate-limit documentation says each API team has per-model RPS and TPM limits, and that tiers are based on cumulative API spend since January 1, 2026. It also says all consumed tokens count toward TPM: prompt, completion, reasoning, cached prompt, image, and audio tokens. Treat the model page's rate-limit numbers as useful, but still verify your team console.

The billing management API exposes invoices, payment methods, prepaid credit balance, top-ups, historical usage, postpaid invoice preview, and spending limits. For a first production test, the safest control pattern is:

  1. Start with prepaid credits.
  2. Set the postpaid limit low or to zero if you want prepaid-only behavior.
  3. Log tokens, cached tokens, model ID, tool calls, retries, errors, latency, and accepted result.
  4. Compare actual spend against the worksheet after a small sample.
  5. Raise limits only after the estimate and logs match.

Grok API spend control checklist for docs verification, console credits, limits, logging, cache, batch, and old model warnings.

Do not wait until the monthly invoice to learn that an agent was calling tools on every retry. Put the stop rule in code: if request volume, tool-call count, retry rate, or output tokens exceed the worksheet by a defined threshold, pause the route.

Watch Old Model Rows After the May 15 Retirement

The May 15 retirement notice is the freshness warning for this topic. xAI says several retired slugs redirect to grok-4.3 after May 15, 2026, and deprecated slugs after that date are billed at grok-4.3 pricing. That means old snippets that still center Grok 4.1 Fast, Grok 3, or pre-retirement rows can be dangerous budget inputs.

Use this rule:

If you see...Treat it as...Safer move
Grok 4.1 Fast as the current cheap defaultStale until proven otherwise in xAI Docs or your console.Recheck the pricing page and console model list.
A blog post promising universal free creditsAccount-specific or provider-specific until official docs say otherwise.Verify your console and credit balance.
A provider calculator with "free" usageSeparate provider contract.Keep it out of the official xAI row.
A current model aliasConvenient but movable.Pin the exact model ID for cost tests.

This is not pedantry. A retired slug can still work while your cost assumption is wrong. Budget from current official rows and your own console behavior.

A Safe Test Plan

Before scaling Grok API usage, run a small test that matches the workload you actually want.

StepWhat to doPass signal
1. Pin the modelStart with grok-4.3 or the exact row you intend to test.Logs show the expected model ID and team/account.
2. Set a spend stopUse prepaid credits and a low postpaid limit.A runaway test cannot create a large invoice.
3. Run a representative sampleUse real prompts, retrieval, tools, and output format.The sample resembles production work.
4. Track successful-task costCount accepted outputs, retries, tool calls, and review time.Cost per accepted result is clear.
5. Compare alternativesTest a smaller row, batch mode, cache, or fewer tools where quality allows.The cheaper route still passes quality.
6. Scale graduallyRaise limits only after logs match the worksheet.Spend, quality, latency, and failure rate remain stable.

For model-migration decisions rather than budget decisions, use the Grok 4.3 API guide for model IDs, aliases, migration timing, and rollout testing. Keep the pricing worksheet focused on cost and workload behavior.

FAQ

How much does Grok API cost?

As checked on July 2, 2026, xAI Docs list grok-4.3 at $1.25 input, $0.20 cached input, and $2.50 output per 1M tokens. Real cost also depends on output length, cache hits, tool calls, batch or priority mode, storage, retries, and your account limits.

Is Grok API free?

Public xAI docs checked on July 2, 2026 do not guarantee a permanent official free API tier. The quickstart says to load credits to start using the API. Some accounts or providers may have credits or free routes, but those are separate from the official xAI price row and must be verified in your own console or provider contract.

Which Grok model should I budget for first?

For general current Grok API work, start with the official grok-4.3 row unless your console and workload point to another available model. If you test grok-build-0.1 or a specialized grok-4.20 row, keep quality, availability, and output behavior in the worksheet, not just base token price.

Why is cached input cheaper?

Cached input discounts repeated prompt content when cache behavior applies. It is useful for stable system prompts, policy blocks, or repeated instructions, but it is not automatic savings. Measure cache hits before lowering your budget.

Do tools change Grok API pricing?

Yes. xAI lists separate tool-call prices for Web Search, X Search, Code Execution, File Attachments search, and Collections Search/RAG. If a workflow uses those tools, include them in the cost formula.

Should I use Batch API?

Use Batch only when the job is not latency-sensitive. xAI lists 20%-50% token discounts for eligible text/language model batch work, typically within 24 hours, but image and video generation through Batch may still be billed at standard rates.

What is the biggest budget mistake?

The biggest mistake is copying a single token row and calling it the budget. The safer budget is successful-task cost: official row plus workload tokens, cache behavior, output length, tools, retries, storage, batch or priority mode, and the spending limits in your xAI team console.

#Grok API#xAI API#API pricing#Grok pricing#AI API cost
Share: