The cheapest LLM API provider is not one company for every workload. As of July 1, 2026, DeepSeek V4 Flash is the lowest verified official paid token floor checked here, but a production choice still depends on output length, cache rate, quality threshold, retries, latency, quota, gateway fee, and support owner.
Start with the route. Use an official direct API when you want the model vendor's contract and source-owned prices; use a gateway when OpenAI-compatible migration, multi-model routing, logs, or one support owner matters; use free routes only for bounded experiments; and compute cost per accepted output before moving real traffic.
| Route | First test | Why it can be cheap | Stop rule |
|---|---|---|---|
| Official direct API | DeepSeek V4 Flash for the lowest verified paid token floor; Gemini 2.5 Flash-Lite Batch/Flex for a low-cost official scale lane | Vendor-owned price rows and clearer contract owner | Stop if quality, region, quota, or lifecycle does not fit the workload. |
| Gateway or aggregator | OpenRouter, SiliconFlow, or laozhang.ai after live model/API verification | One compatible API, routing breadth, logs, and model switching can reduce migration work | Stop if fee, failed-call billing, quota, support owner, or data policy is unclear. |
| Free experiment route | Free models, trial credits, or sandbox quotas | Good for prototypes and same-prompt tests | Stop before production unless rate limits, terms, and availability are verified. |
| BYOK or self-hosted | Your own key or infrastructure | Can control data, routing, and long-term unit economics | Stop if operations, latency, maintenance, or utilization erase the savings. |
The quick formula is: effective cost = total bill / accepted outputs. Do not switch production traffic until the provider passes the same prompt, the billable units are current, and the rollout has a spend cap.
Current Low-Cost Official Price Lanes
Official prices are the safest anchor because the model vendor owns the row, the billing terms, and the lifecycle notice. They still are not the final answer. A low official token price can lose if the model needs longer outputs, more retries, or a higher-cost fallback for the work that fails.
These official rows were checked on July 1, 2026:
| Official route | Current low-cost row | Why it matters | Boundary |
|---|---|---|---|
| DeepSeek direct | deepseek-v4-flash: $0.14 cache-miss input and $0.28 output per 1M tokens; cache-hit input is listed far lower | Lowest verified official paid token floor in this comparison | Do not treat it as the best model for every coding, reasoning, region, or reliability target. DeepSeek also notes compatibility-name deprecation for deepseek-chat and deepseek-reasoner on 2026-07-24 15:59 UTC. |
| Google Gemini API | Gemini 2.5 Flash-Lite: $0.10 input and $0.40 output per 1M tokens; Batch/Flex: $0.05 input and $0.20 output | Strong official low-cost lane for high-volume jobs that tolerate Batch/Flex behavior | Do not reuse older Gemini 2.0 Flash-Lite rows as current advice. |
| OpenAI API | gpt-5.4-nano: $0.20 input and $1.25 output per 1M tokens; Batch/Flex rows lower that cost | Useful OpenAI-owned low-cost baseline when compatibility, tooling, or account policy matters | Not the lowest official paid floor, but it can reduce migration and reliability risk for OpenAI-native stacks. |
| Mistral API | Mistral Small 4: $0.15 input and $0.60 output per 1M tokens | Competitive official lane for open-model and European governance needs | Compare governance, latency, and quality, not only token price. |
| Anthropic API | Claude Haiku 4.5: $1 input and $5 output per MTok; Sonnet 5 introductory pricing is date-bound through 2026-08-31 | Not the cheapest raw lane, but worth testing where Claude behavior reduces retries or review effort | Keep the Sonnet 5 cutoff visible and recheck after the introductory window. |
The useful takeaway is not "always pick DeepSeek." It is "use DeepSeek V4 Flash as the paid official token-floor test, then prove the workload accepts the output." If a cheap model produces twice as many rejected answers, the price table lied by omission.

Gateway And Provider Routes
Gateways and aggregators are provider routes. They can be cheaper in practice when they reduce migration work, expose many models behind one API shape, make logging easier, or let a team test fallback routes quickly. They can also add platform fees, routing ambiguity, region differences, or a second support boundary.
Treat each gateway as its own contract:
| Provider route | What to verify | Why it may be useful | Do not claim |
|---|---|---|---|
| OpenRouter | Model row, provider route, tokenizer differences, free model limits, and the 5.5% Pay-as-you-go platform fee | Broad model catalog, no-minimum-spend testing, and a Models API that can sort by pricing-low-to-high | Do not call OpenRouter's model metadata an official OpenAI, Google, Anthropic, DeepSeek, or Mistral price row. |
| SiliconFlow | Provider-owned model price, model version, region, terms, and current availability | Visible low-cost DeepSeek-family provider route; its pricing page listed DeepSeek-V4-Flash provider rows in this check | Do not treat a provider-owned DeepSeek row as the same thing as DeepSeek direct pricing. |
| laozhang.ai | Current model list, feature flags, exact price row, billing mode, logs, support path, and console/API data | Useful when a developer wants an OpenAI-compatible API gateway, model switching, usage visibility, and one support owner | Do not publish exact laozhang.ai per-model prices unless the current Models API or console row has been verified. |
For laozhang.ai specifically, the safe recommendation is conditional: use it when the job is gateway access, OpenAI-compatible migration, model coverage checks, usage logs, or API-route consolidation. The public docs describe pay-as-you-go API integration and an OpenAI-compatible GET /v1/models endpoint for model list, features, and pricing information. That is a verification route, not permission to freeze a stale price table.
If your main decision is gateway architecture, keep Claude gateway laozhang.ai and OpenClaw API guide as adjacent setup reading. The narrower spend decision here is provider-route comparison before money moves.
Calculate Accepted-Output Cost
The cheapest LLM API provider is the provider that gives you the lowest cost per accepted output at your quality bar. Raw input price is only one factor.

Use this worksheet:
textaccepted-output cost = total bill for the sample run / outputs that passed your acceptance bar
Then break the total bill into the variables that actually move:
| Variable | Why it changes the winner | What to measure |
|---|---|---|
| Input tokens | System prompts, tools, context, retrieval chunks, and conversation history can dominate short outputs | Average billable input per accepted task |
| Output tokens | Some models need longer answers or explanations to pass review | Average accepted output length, not maximum output length |
| Cache hit rate | Cached input can turn a prompt-heavy workflow cheaper | Cacheable prefix share and cache-hit percentage |
| Retry rate | Timeouts, schema failures, weak reasoning, or unsafe outputs increase billable attempts | Attempts per accepted answer |
| Quality threshold | Higher bars can reject cheap outputs more often | Acceptance rate from a labeled sample |
| Latency and quota | Rate limits can force a higher-cost fallback or slower batch route | P95 latency, TPM/RPM headroom, and fallback share |
| Gateway fee | Platform fee, provider route, markup, or minimum spend can change the final bill | Total provider invoice divided by accepted outputs |
A simple example: if Provider A costs $0.20 for 1,000 candidate outputs but only 600 pass review, the accepted-output cost is $0.000333 per accepted output. If Provider B costs $0.25 but 900 outputs pass, the accepted-output cost is $0.000278. Provider B is more expensive in the raw table and cheaper in the product.
That is why pricing comparisons like Claude API vs OpenAI API pricing should be read as starting points. The production spreadsheet needs your task's acceptance rate.
Free, Trial, BYOK, And Self-Hosted Lanes
Free routes are useful, but "free" is not a production price. It usually means one of four things:
| Lane | Good for | Hidden cost | Production boundary |
|---|---|---|---|
| Free model through a gateway | Prototypes, prompt tests, and teaching demos | Strict limits, provider fallback, lower priority, or model changes | Do not depend on it until rate limits, terms, and uptime expectations are verified. |
| Trial credits from a model vendor | Comparing a new official API | Expiration, regional availability, account limits | Move to paid rows before launch math. |
| BYOK through a gateway | One routing layer while keeping your vendor account | Gateway fee, key management, data path, and support split | Know whether the vendor or gateway owns the failure. |
| Self-hosted open model | Data control, fixed infrastructure, high-utilization workloads | GPU utilization, engineering time, monitoring, quantization quality, and maintenance | Only cheaper when utilization is high and quality is good enough. |
If your question is only whether a free Gemini lane exists, use Gemini API free tier as a narrower companion. For this provider decision, free lanes should feed the same-prompt test, not replace production due diligence.
Verification Workflow Before Switching
Do not migrate production traffic from a static price table. Use the table to pick candidates, then verify the live route.

Run this sequence:
- Check the official model-vendor pricing page for the direct API row.
- If you are using a gateway, query its current model/API metadata or console before quoting a price.
- Run the same prompt set against each candidate route.
- Record input tokens, output tokens, cache behavior, failures, retries, latency, and accepted outputs.
- Compare total bill divided by accepted outputs.
- Inspect failed-call billing, quota, logs, support owner, data retention, and regional terms.
- Move a small slice of traffic behind a spend cap and a quality fallback.
For OpenRouter, the Models API is useful because it exposes model pricing metadata and supports low-price sorting. For laozhang.ai, the current model list/API or console is the right verification point before you claim exact provider pricing. For SiliconFlow, verify whether the provider-owned row, model version, and region match the workload you will actually run.
Stop the migration if any of these are unclear:
- failed calls are billable and the failure rate is unknown
- latency has no headroom at expected concurrency
- the model name or compatibility alias is near a lifecycle change
- logs and usage export are not enough for budget control
- data retention or region terms conflict with the workload
- the provider cannot tell you who owns support when the upstream model fails
For agent workloads, pair this with a spend guardrail such as LLM agent API spend kill switch. Cheap provider choices fail fast when an agent loop can keep spending after quality has already failed.
Recommendations By Workload
Use these as first tests, not final procurement answers.
| Workload | First route to test | Backup route | Why |
|---|---|---|---|
| Cheap chat, extraction, and light summarization | DeepSeek V4 Flash direct | Gemini 2.5 Flash-Lite or OpenAI gpt-5.4-nano | Start at the official paid floor, then test acceptance rate and output length. |
| Large asynchronous summarization | Gemini 2.5 Flash-Lite Batch/Flex | OpenAI Batch/Flex low-cost rows | Batch-style lanes can beat interactive routes when latency is not urgent. |
| OpenAI-compatible migration with many candidate models | OpenRouter or laozhang.ai after live model/API verification | Official direct API for the winning model | Gateway convenience can save engineering time, but only after fee and source-owner checks. |
| DeepSeek-family access through a provider route | DeepSeek direct first, then SiliconFlow if the provider route helps region, payment, or operational needs | Another gateway with verified model metadata | Provider-owned DeepSeek rows need provider labels and live verification. |
| Coding or agentic tasks | Same-prompt test across DeepSeek, OpenAI, Claude, and a gateway fallback | The model with the lowest accepted-output cost, not the lowest input row | Retry rate and tool reliability can dominate raw token price. |
| Governance-sensitive workloads | Mistral or a vendor/direct route with the required region and data terms | Self-hosted or BYOK only if operations are realistic | Compliance and data owner can be worth paying for. |
| Prototypes and learning | Free gateway model, trial credit, or sandbox route | Low-cost paid official lane | Keep free routes out of production math until limits and terms are known. |
The cheapest practical answer often changes by section of the same product. A support classifier might run on a low-cost official row, a coding assistant might need a stronger model, and a gateway might only own fallback routing. Do not force one provider to own every job.
Provider Checklist
Before you call any route cheapest, answer these questions:
- Which organization owns the price row: model vendor, gateway, cloud platform, reseller, or your infrastructure team?
- Is the row input-only, output-only, cached input, batch/flex, per request, per second, or per image/tool call?
- What model version, region, and lifecycle status does the row cover?
- How are failed calls, timeouts, safety refusals, and retries billed?
- What are the RPM, TPM, daily quota, and spend-limit behaviors?
- What logs, usage export, and alerting are available?
- Who owns support when the upstream model fails?
- What data retention, training, and regional terms apply?
- Does the route pass the same prompt set at your quality bar?
- Is the rollout capped so a failure cannot create an open-ended bill?
This checklist is deliberately stricter than a price table. It is how you keep a cheap provider test from becoming an expensive incident.
FAQ
Who is the cheapest LLM API provider right now?
For the official paid token floor checked on July 1, 2026, DeepSeek V4 Flash is the lowest verified row in this comparison. That does not make it the cheapest practical provider for every workload. Compare accepted-output cost after output length, cache rate, retries, latency, quota, and support owner.
Is OpenRouter cheaper than using a direct API?
Sometimes, but not automatically. OpenRouter can reduce integration work and expose many models through one gateway, but its Pay-as-you-go route includes a platform fee and model prices depend on the selected route. Treat OpenRouter pricing metadata as gateway-owned and verify the live model row before production.
Should I use laozhang.ai as the cheapest LLM API provider?
Use laozhang.ai when the job is a developer gateway job: OpenAI-compatible API access, model switching, usage visibility, or one support owner. Do not call it the cheapest provider unless the current model/API or console row proves the exact model price for your workload. Provider pricing is not official vendor pricing.
Are free LLM APIs safe for production?
Assume no until the limits, terms, uptime, quota, and support path are verified. Free routes are excellent for prompt comparison and early prototypes. Production routes need predictable billing, logs, fallback behavior, and a support owner.
Why can a low input price lose?
Because the bill is not only input tokens. Long outputs, low cache hit rate, schema failures, retries, stricter quality review, latency fallbacks, and gateway fees can make a low input row more expensive per accepted output.
How often should I recheck prices?
Recheck before every production migration, before every major volume increase, and whenever a model lifecycle note, provider fee, or free-tier term changes. Date-bound rows such as Anthropic's Sonnet 5 introductory pricing need a scheduled recheck before the cutoff.
Bottom Line
Use the official token floor to pick a first candidate, not a final provider. DeepSeek V4 Flash deserves the first cheap paid test for many text workloads. Gemini 2.5 Flash-Lite Batch/Flex deserves a serious test for asynchronous scale. OpenAI, Anthropic, and Mistral can win when compatibility, quality, governance, or reliability reduces rejected output. Gateways such as OpenRouter, SiliconFlow, and laozhang.ai can win when routing, API compatibility, logs, or support consolidation saves more than the provider fee.
The purchase decision is simple only after you do the work: verify the current row, run the same prompt, divide the total bill by accepted outputs, and roll out behind a cap.
