Paid Tier Getting free_tier_requests Limit 0 — Complete Fix Guide (2026)

AI Free API Team

•Feb 26, 2026•25 min read•API Troubleshooting

Getting a 429 RESOURCE_EXHAUSTED error with free_tier_requests limit: 0 on your paid Gemini API account? This guide helps you diagnose whether it's a billing misconfiguration or a known Google bug, then walks you through the exact fix for each scenario — including workarounds for the Feb 2026 image generation bug.

Paid Tier Getting free_tier_requests Limit 0 — Complete Fix Guide (2026)

If your paid Gemini API account is returning a 429 RESOURCE_EXHAUSTED error with the metadata quotaMetric: "generate_content_free_tier_requests" and quotaValue: "0", you are dealing with one of two distinct problems. Either your Google Cloud Billing is not properly linked to the project that hosts your API key, which is fixable in under five minutes, or you have encountered a known Google platform bug that has been affecting image generation models since approximately February 10, 2026. As of February 2026, Tier 1 accounts should get 150-300 RPM (requests per minute) compared to the free tier's 5-15 RPM, and the upgrade from free to Tier 1 is instant once Cloud Billing is correctly configured (ai.google.dev/gemini-api/docs/rate-limits, February 2026).

TL;DR

The "free_tier_requests limit: 0" error on paid Gemini API accounts has two root causes. The most common one, accounting for roughly 60% of cases, is that Cloud Billing is not actually linked to the GCP project associated with your API key. The fix takes five minutes: go to the Google Cloud Console, link a billing account to your project, create a new API key, and verify the tier upgrade. The second cause, affecting about 25-40% of recent cases, is a genuine Google platform bug where image generation models like gemini-2.5-flash-image and gemini-3-pro-image-preview remain stuck on free tier quotas even when billing is properly configured. For this scenario, workarounds include using text models that still work correctly, migrating to the Vertex AI endpoint, or using a third-party API proxy service. The remaining cases involve API keys created before billing was enabled, which can be resolved by simply generating a new API key after billing activation.

What Does "free_tier_requests limit: 0" Actually Mean?

When you receive a 429 RESOURCE_EXHAUSTED error from the Gemini API, the response body contains detailed quota information that tells you exactly what went wrong. The critical fields to examine are quotaMetric, which identifies which quota bucket your request was counted against, and quotaValue plus quotaLimit, which show your current usage and maximum allowance respectively. When you see quotaMetric: "generativelanguage.googleapis.com/generate_content_free_tier_requests" with both quotaValue and quotaLimit set to "0", this tells you something alarming: the API is treating your account as a free tier user with zero remaining quota, regardless of what your billing dashboard says.

Understanding why this happens requires knowing how Google's quota system works internally. The Gemini API enforces quotas at the project level, not at the individual API key level (ai.google.dev, February 2026). This means that if you have multiple API keys under the same Google Cloud project, they all share the same quota pool. When Cloud Billing is linked to a project, Google's backend should automatically upgrade that project's quota from the free tier to paid tier limits. The "free_tier_requests" metric appearing in your error response is the clearest signal that this upgrade either hasn't happened or has been rolled back by a platform issue.

There is an important distinction between seeing quotaLimit: "0" and simply hitting your rate limit. A quotaLimit of zero means the system has assigned you literally no quota at all — it is not that you have used up your allowance, but rather that you were never given one. This is different from a normal rate limit error where quotaLimit might show "5" (the free tier RPM for Gemini 2.5 Pro) and quotaValue shows "5" (all used up). The zero-quota scenario specifically indicates a billing or platform configuration issue, not organic usage exhaustion.

To confirm which situation you are in, examine the full error response body carefully. Here is what the typical error JSON looks like when you encounter the zero-quota problem:

json
{
  "error": {
    "code": 429,
    "message": "Resource has been exhausted",
    "status": "RESOURCE_EXHAUSTED",
    "details": [{
      "reason": "RATE_LIMIT_EXCEEDED",
      "metadata": {
        "quota_limit": "generate_content_free_tier_requests",
        "quota_limit_value": "0",
        "quota_metric": "generativelanguage.googleapis.com/generate_content_free_tier_requests"
      }
    }]
  }
}

The key detail to note is the metric name containing free_tier_requests — this reveals that Google's backend has categorized your project under the free tier quota bucket, regardless of your actual billing status. If you were on the paid tier, the metric would reference a different quota bucket entirely (typically generate_content_requests without the free_tier prefix). This distinction is subtle but critical for accurate diagnosis.

The error typically manifests in one of two patterns. In the first pattern, every single API call fails with this error, including text generation, suggesting a billing linkage problem. In the second pattern, text generation models work fine with paid tier limits, but image generation models like gemini-2.5-flash-image and gemini-3-pro-image-preview return the free_tier_requests error. This second pattern is the hallmark of the February 2026 Google platform bug that has been widely reported on the Google AI Developers Forum. Identifying which pattern you are experiencing is the first and most important step in resolving the issue, because the fixes for each scenario are fundamentally different and applying the wrong one wastes valuable debugging time.

Quick Diagnosis — Billing Issue or Google Bug?

Diagnostic flowchart showing how to determine if the free_tier_requests error is caused by billing misconfiguration or a Google platform bug

Before you can fix the problem, you need to identify which of the two scenarios you are dealing with. The diagnostic process takes about sixty seconds and involves three checks that progressively narrow down the root cause. Getting this right matters because the fixes for each scenario are completely different — applying the wrong fix wastes time and may leave you more confused.

Check 1: Verify Cloud Billing is linked. Navigate to console.cloud.google.com/billing and check whether a billing account is actively linked to the project that contains your API key. This is not the same as having a Google One or Gemini Pro subscription — API billing requires a Cloud Billing account specifically linked to your GCP project. If no billing account is linked, you have found your problem and can proceed directly to Fix #1 in the next section. A surprising number of developers believe they are on the paid tier because they signed up for Google AI Studio, but AI Studio's free access does not automatically enable paid API quotas.

Check 2: Test different model types. If billing is linked, run a simple test. Send a basic text generation request to a text model like gemini-2.5-flash, and separately send a request to an image generation model like gemini-2.5-flash-image. If text models work fine but image models fail with the free_tier_requests error, you are almost certainly hitting the February 2026 Google bug. This bug specifically affects image generation endpoints while leaving text generation endpoints unaffected on the same account, creating a frustrating split behavior that confuses developers into thinking their billing is misconfigured.

Check 3: API key creation timing. If all models fail (both text and image), check when your API key was created relative to when you enabled billing. API keys created through Google AI Studio before Cloud Billing was activated on the project may not automatically inherit the paid tier quota. The fix is simple: create a new API key after billing is enabled. This is a less common scenario, accounting for roughly 15% of cases, but it trips up developers who set up their keys during the initial free tier exploration and then added billing later.

Based on these three checks, you will fall into one of three categories: billing not linked (fix in the next section), Google bug on image models (see the workarounds section), or API key timing issue (create a new key). Each category has a specific resolution path, and the rest of this guide walks you through each one in detail.

For a quick sanity check, you can also use the following curl command to test your API key and see the raw response including any quota metadata. Replace YOUR_API_KEY with your actual key and MODEL_NAME with the model you want to test:

bash
curl -s "https://generativelanguage.googleapis.com/v1beta/models/MODEL_NAME:generateContent?key=YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"contents":[{"parts":[{"text":"Hello"}]}]}' | python3 -m json.tool

If the response includes quota information in the error details, you will see exactly which quota bucket your request was counted against. A successful paid tier response will return the generated content without any quota-related error metadata, confirming that your project is properly recognized as a paid account.

Fix #1 — Link Cloud Billing Properly

Step-by-step billing fix process showing the 6 steps to link Cloud Billing and activate Tier 1 quotas

This fix addresses the most common cause of the free_tier_requests error: Cloud Billing not being properly linked to your Google Cloud project. Even if you think your billing is set up correctly, it is worth following these steps to verify, because the Google Cloud Console has multiple billing-related pages that can create confusion about your actual billing status.

Step 1: Open the Google Cloud Console. Go to console.cloud.google.com and make sure you are signed in with the same Google account you use for AI Studio. This seems obvious, but developers who use multiple Google accounts sometimes configure billing on the wrong account. Check the account avatar in the top right corner to confirm you are on the correct account.

Step 2: Navigate to Billing. Click on the hamburger menu (three horizontal lines) in the top left, then find and click "Billing" in the navigation panel. If you see a prompt to "Link a billing account" or "Set up billing," then you have confirmed that billing was never properly linked — this is your root cause. If you see an existing billing account, proceed to verify it is linked to the correct project.

Step 3: Verify project linkage. Under your billing account, click "Account management" and look at the "Projects linked to this billing account" section. Find the project that contains your API keys. If your project is not listed here, click "Link a project" and select it. This is the critical step that many developers miss: having a billing account is not enough — it must be explicitly linked to the project where your Gemini API keys live.

Step 4: Handle the prepayment requirement. Google may require a one-time prepayment to activate paid API access. This is not a fee but a credit that gets applied to your future usage. If prompted, complete this prepayment step. Some developers report being asked for amounts ranging from $5 to $50, depending on their region and account history. This prepayment is credited to your account balance and consumed as you use the API (ai.google.dev/billing, February 2026).

Step 5: Create a new API key (recommended). After linking billing, go to aistudio.google.com and create a new API key under the now-billing-enabled project. While existing keys should inherit the paid quota, creating a new key eliminates any caching or propagation delays. Use this new key in your application and test immediately.

Step 6: Verify the upgrade. Make a test API call and check the response headers. A successful paid tier response should not contain any free_tier_requests quota metrics. Alternatively, check your quota page at console.cloud.google.com/apis/api/generativelanguage.googleapis.com/quotas to see your updated limits. You should see RPM values of 150-300 instead of the free tier's 5-15 RPM. The upgrade to Tier 1 happens instantly once billing is properly linked — there is no waiting period for the first tier.

If after completing all six steps your text generation models work with paid limits but image generation models still show the free_tier_requests error, you are likely dealing with the Google platform bug described in the next section. Do not keep relinking billing or creating new keys — that will not help if the underlying issue is on Google's side.

One additional verification step that helps confirm a successful fix: visit the API quotas dashboard at console.cloud.google.com/apis/api/generativelanguage.googleapis.com/quotas and look for the "GenerateContent requests per minute per project per base_model" metric. Under the free tier, this value shows single-digit numbers like 5 or 10. Under Tier 1, it should display values in the 150-300 range. If you see the Tier 1 numbers, your billing linkage is confirmed and any remaining free_tier_requests errors on specific models are attributable to the Google platform bug rather than your account configuration.

Fix #2 — When It Is a Google Platform Bug

Since approximately February 10, 2026, a growing number of developers have reported that their paid Tier 1 accounts receive free_tier_requests limit: 0 errors specifically on image generation models, while text generation models on the same account work perfectly with paid tier quotas. This has been confirmed across multiple threads on the Google AI Developers Forum, with posts from developers who have verified their billing is correctly configured and have tried all standard troubleshooting steps without success.

The bug appears to affect specific image generation model endpoints, including gemini-2.5-flash-image and gemini-3-pro-image-preview (the model internally known as "Nano Banana Pro"). Users report that requests to these models return the familiar 429 error with the free_tier_requests metric, even though identical requests to text models like gemini-2.5-pro or gemini-2.5-flash succeed without any quota issues. This selective failure pattern is the strongest indicator that the issue is a platform-level bug rather than a user configuration problem. If you have already explored the Gemini 3 Pro image generation free tier limits and confirmed you should have paid access, then this bug is likely your blocker.

While waiting for Google to resolve this issue, there are several practical workarounds you can use to keep your projects moving forward. The first and simplest approach is to use text generation models for any tasks that do not strictly require image output. Gemini 2.5 Pro and Flash models have robust multimodal capabilities for image understanding and analysis, and these text endpoints are unaffected by the bug. If your application specifically requires image generation, consider temporarily using a different image generation API such as OpenAI's DALL-E 3 or Stability AI's endpoints, which can serve as drop-in replacements for the image generation portion of your workflow.

Another workaround that some developers have found success with is using the Vertex AI endpoint instead of the AI Studio endpoint. Vertex AI uses a different quota enforcement system, and some users report that image generation works correctly through Vertex AI even when the AI Studio endpoint is blocked. The trade-off is that Vertex AI requires additional setup including service account authentication and a slightly different API format, but it can be a viable interim solution for production applications that cannot wait for the bug fix.

For developers who need a quick and reliable solution without the complexity of migrating to Vertex AI, third-party API proxy services can provide immediate access to Gemini's image generation capabilities. Services like laozhang.ai aggregate API access and route requests through properly configured accounts, bypassing individual project quota issues. This can be especially useful as a temporary bridge while Google resolves the underlying platform bug.

Regardless of which workaround you choose, it is strongly recommended to report your experience on the Google AI Developers Forum and upvote existing bug report threads. The visibility of these reports directly influences Google's prioritization of the fix. Include your project ID (not your API key), the specific models affected, and the exact error response in your report to help Google's engineering team diagnose and resolve the issue faster. For detailed information about how Nano Banana Pro rate limits are structured, see our dedicated guide.

It is also worth monitoring Google's official issue tracker and the AI Developers Forum for updates. Previous billing-related bugs in the Gemini API have typically been resolved within one to three weeks of widespread reporting, though Google rarely provides advance notice of when fixes will be deployed. Setting up a Google Alert for "gemini api free_tier_requests bug" can help you catch the resolution announcement when it comes, so you can immediately switch back from your workaround to the direct Gemini API endpoint.

How Gemini API Tiers and Quotas Work

Gemini API tier comparison showing Free, Tier 1, and Tier 2 rate limits and requirements

Understanding how Google structures its API tier system helps you both prevent quota issues and make informed decisions about your usage. The Gemini API uses a tiered system where higher tiers unlock progressively larger rate limits, and the upgrade path between tiers is based on cumulative spending and account age rather than a subscription model.

The free tier is where every new Gemini API user starts, and it provides limited but functional access for experimentation and small projects. As of February 2026, the free tier allows 5 RPM for Gemini 2.5 Pro, 10 RPM for Gemini 2.5 Flash, and 15 RPM for Gemini 2.5 Flash-Lite (ai.google.dev/gemini-api/docs/rate-limits, February 2026). Daily request caps are also enforced: 100 requests per day for Pro and 250 for Flash. Critically, the free tier provides zero images per minute (0 IPM) for image generation — this means that any image generation on the free tier is simply not available, not merely limited. It is worth noting that Google significantly cut free tier limits in December 2025, reducing quotas by 50-80% from their previous levels, which caught many developers off guard. For a complete analysis of these changes, see our Gemini API free tier guide.

Tier 1 unlocks immediately when you enable Cloud Billing on your project, with no waiting period and no minimum spend requirement. This is one of the most important facts that many developers miss: you do not need to pay anything upfront to get Tier 1 access, though Google may require a one-time prepayment that serves as a usage credit. Tier 1 provides a dramatic increase in limits: 150-300 RPM for most models (a 30-60x improvement over free tier), unlimited daily requests, and access to image generation endpoints. This is the tier that most individual developers and small teams should target. The Gemini API rate limits guide covers these numbers in greater detail.

Tier 2 and above are designed for production workloads and high-volume applications. Reaching Tier 2 requires $250 or more in cumulative API spending and at least 30 days on Tier 1. The upgrade happens automatically within about 10 minutes once the requirements are met, pushing RPM limits to 1,000 or higher for most models. Tier 3 follows a similar pattern with $1,000+ cumulative spend and 30+ days on the previous tier.

A crucial architectural detail is that quotas are enforced per project, not per API key (ai.google.dev, February 2026). This means that creating multiple API keys under the same project does not multiply your quota — all keys share the same pool. If you need separate quota pools for different applications, you must create separate Google Cloud projects, each with their own billing linkage. Daily quotas reset at midnight Pacific Time, so if you are hitting daily limits, timing your requests around this reset can be a practical workaround.

The following table summarizes the key differences across tiers that matter most for developers encountering the free_tier_requests issue:

Metric	Free Tier	Tier 1	Tier 2+
RPM (Pro models)	5	150	1,000+
RPM (Flash models)	10-15	300	2,000+
Daily request limit	100-250	Unlimited	Unlimited
Image generation	0 IPM	Available	Available
Upgrade requirement	—	Cloud Billing linked	$250 spend + 30 days
Upgrade speed	—	Instant	~10 minutes

Source: ai.google.dev/gemini-api/docs/rate-limits, February 2026

Understanding this table explains why the "free_tier_requests limit: 0" error is so disruptive. On the free tier, your image generation allocation is literally zero — not limited, but completely blocked. This is by design for free accounts, but when the billing system incorrectly assigns your paid project to the free tier bucket, it effectively removes all access to image generation and severely restricts text model access as well.

Alternative Solutions When Quota Is Blocked

When your Gemini API quota is stuck at zero and the standard fixes have not resolved the issue, you need practical alternatives to keep your projects running. The goal here is not to permanently replace Gemini, but to have reliable fallback options that minimize downtime while you wait for the quota issue to be resolved. Each alternative has different strengths in terms of cost, capability, and integration complexity.

The most straightforward alternative for image generation is to use a different model provider through their native API. OpenAI's DALL-E 3 provides high-quality image generation with a well-documented API, and Stability AI's SDXL endpoints offer competitive quality at lower per-image costs. Both services have their own billing systems independent of Google's, so they are unaffected by Gemini quota issues. The trade-off is that you will need to adapt your prompt engineering and API integration code to match the different provider's format, though the changes are typically minimal for basic image generation tasks.

For developers who want to maintain compatibility with Google's Gemini API format while avoiding project-level quota issues, third-party API aggregation services provide an interesting middle ground. Platforms like laozhang.ai offer unified API endpoints that are compatible with the OpenAI API format and provide access to Gemini models through properly configured infrastructure. These services handle the billing and quota management on their end, which means you get reliable access without worrying about Google's tier system or the current platform bug. The per-request cost through aggregators is typically competitive with or even lower than official pricing, especially for image generation models where laozhang.ai charges approximately $0.05 per image compared to the official rate.

If your application relies heavily on Gemini's specific capabilities and you need to stay within Google's ecosystem, migrating to the Vertex AI endpoint is worth considering as a more permanent solution. Vertex AI uses enterprise-grade quota management that is separate from the AI Studio quota system, and it provides additional features like fine-tuning, model monitoring, and enterprise security controls. The setup is more involved — you will need to configure service accounts, enable the Vertex AI API, and modify your request format — but the reliability improvements may justify the effort for production applications.

Prevention — How to Set Up Billing Correctly From Day One

Prevention is far more effective than troubleshooting, and setting up your Gemini API billing correctly from the start eliminates the most common cause of the free_tier_requests error entirely. The following best practices are based on the patterns observed across hundreds of forum reports and the official Google documentation.

Set up billing before creating API keys. The single most effective prevention measure is to enable Cloud Billing on your Google Cloud project before you create any API keys. This ensures that every key generated under that project is automatically associated with paid tier quotas from the moment of creation. Developers who create keys during the free tier period and then add billing later sometimes encounter a timing issue where the old keys do not inherit the paid quota. While this should not happen according to Google's documentation, the forum reports suggest it does occur occasionally, and creating keys after billing avoids the issue entirely.

Use a dedicated GCP project for API access. Create a separate Google Cloud project specifically for your Gemini API usage rather than using a default or shared project. This gives you clean quota isolation, makes billing tracking easier, and simplifies troubleshooting if quota issues arise. Name the project descriptively (e.g., "gemini-api-production") so you can easily identify it in the Cloud Console. Since quotas are enforced per project, this also means you can create multiple projects if you need independent quota pools for different applications.

Verify your tier after billing activation. After linking billing, do not just assume the upgrade happened. Run a test API call and check the response for any free_tier_requests metrics. Better yet, check the API quotas page in Cloud Console to confirm your limits have been upgraded to Tier 1 values (150-300 RPM). If the upgrade has not propagated after five minutes, try creating a new API key — this sometimes triggers the quota refresh.

Monitor your quota usage proactively. Set up quota monitoring in the Google Cloud Console so you receive alerts before hitting limits rather than after. Navigate to APIs & Services, then Quotas, and configure alert thresholds at 80% and 95% of your quota limits. This gives you early warning of approaching limits and helps distinguish between legitimate high usage and unexpected quota restrictions. Proactive monitoring also provides evidence if you need to report a quota bug to Google — you can show that your usage pattern does not match the free tier limits being applied.

Keep billing and API key documentation. Maintain a simple record of which billing account is linked to which project, when billing was enabled, and when each API key was created. This documentation is invaluable when troubleshooting quota issues and speeds up any support interactions with Google. Include the project ID (found in Cloud Console settings) and the key creation dates in your notes.

Implement retry logic with exponential backoff. Even on paid tiers, transient 429 errors can occur during peak usage periods or brief service disruptions. Building retry logic into your API client code from the start ensures your application handles these gracefully without manual intervention. A standard exponential backoff strategy — starting at one second, doubling each retry up to a maximum of 32 seconds, with random jitter — covers both quota-related and server-load-related 429 responses. Most popular API client libraries include built-in retry support; for example, the Python google-generativeai library handles retries automatically when configured with appropriate settings.

Set up billing alerts to catch issues early. Configure budget alerts in Google Cloud Console under Billing > Budgets & alerts. Set alert thresholds at 50%, 80%, and 100% of your expected monthly spend. These alerts serve a dual purpose: they notify you if usage spikes unexpectedly (indicating possible key compromise or a runaway process), and they confirm that billing is actively tracking your usage — if you are on the paid tier but never receive a billing alert, that could indicate a billing linkage problem. Additionally, if your spending drops to zero unexpectedly when your application is running, that could signal that your project has been reverted to the free tier.

FAQ

Does creating a new API key after enabling billing fix the quota issue?

In many cases, yes. If your original API key was created before Cloud Billing was linked to the project, creating a new key after billing activation often resolves the free_tier_requests error. The new key is generated within the context of a billing-enabled project and should automatically receive Tier 1 quota limits. However, if you are hitting the February 2026 Google bug on image generation models specifically, a new key will not help because the issue is at the platform level rather than the key level.

Why do text models work but image models return the free_tier_requests error?

This is the characteristic symptom of the February 2026 Google platform bug. The bug appears to affect the quota assignment specifically for image generation model endpoints (like gemini-2.5-flash-image and gemini-3-pro-image-preview) while leaving text generation model quotas correctly configured. Google has acknowledged this issue on their developer forum but has not yet provided a timeline for the fix. In the meantime, use the workarounds described in this guide, including Vertex AI migration and third-party API proxies.

How long does the Tier 1 upgrade take to propagate?

The upgrade from free tier to Tier 1 should be instant once Cloud Billing is properly linked to your project (ai.google.dev, February 2026). If you do not see the upgrade reflected within five minutes, try creating a new API key under the billing-enabled project. For the Tier 1 to Tier 2 upgrade, which requires $250 or more in cumulative spending and at least 30 days on Tier 1, the propagation typically takes about 10 minutes once the requirements are met.

Do multiple API keys share the same quota?

Yes. Quota enforcement in the Gemini API is per-project, not per-key (ai.google.dev, February 2026). All API keys created under the same Google Cloud project share the same quota pool. Creating additional keys does not increase your available quota. If you need separate quota pools, you must create separate Google Cloud projects, each with their own billing account linkage. This is a common source of confusion for developers who assume that generating a fresh API key will give them a fresh quota allocation. It will not — the key is simply a credential, while the quota is attached to the underlying project resource.

Can a 429 error mean server overload rather than user quota exhaustion?

Yes, a 429 status code can sometimes indicate that Google's servers are under heavy load, independent of your individual quota. In this case, the error is transient and retrying after a brief delay (using exponential backoff) often succeeds. You can distinguish between quota exhaustion and server overload by examining the error details: quota exhaustion will include the quotaMetric field with specific quota information, while server overload typically uses a different error structure. If you see intermittent 429 errors that resolve within seconds, server overload is the more likely cause. Implementing exponential backoff with jitter in your API client code is considered best practice regardless of the error type, as it gracefully handles both scenarios without requiring manual intervention.

If your paid Gemini API account is returning a 429 RESOURCE_EXHAUSTED error with the metadata quotaMetric: "generate_content_free_tier_requests" and quotaValue: "0", you are dealing with one of two distinct problems. Either your Google Cloud Billing is not properly linked to the project that hosts your API key, which is fixable in under five minutes, or you have encountered a known Google platform bug that has been affecting image generation models since approximately February 10, 2026. As of February 2026, Tier 1 accounts should get 150-300 RPM (requests per minute) compared to the free tier's 5-15 RPM, and the upgrade from free to Tier 1 is instant once Cloud Billing is correctly configured (ai.google.dev/gemini-api/docs/rate-limits, February 2026).

TL;DR

What Does "free_tier_requests limit: 0" Actually Mean?

When you receive a 429 RESOURCE_EXHAUSTED error from the Gemini API, the response body contains detailed quota information that tells you exactly what went wrong. The critical fields to examine are quotaMetric, which identifies which quota bucket your request was counted against, and quotaValue plus quotaLimit, which show your current usage and maximum allowance respectively. When you see quotaMetric: "generativelanguage.googleapis.com/generate_content_free_tier_requests" with both quotaValue and quotaLimit set to "0", this tells you something alarming: the API is treating your account as a free tier user with zero remaining quota, regardless of what your billing dashboard says.

There is an important distinction between seeing quotaLimit: "0" and simply hitting your rate limit. A quotaLimit of zero means the system has assigned you literally no quota at all — it is not that you have used up your allowance, but rather that you were never given one. This is different from a normal rate limit error where quotaLimit might show "5" (the free tier RPM for Gemini 2.5 Pro) and quotaValue shows "5" (all used up). The zero-quota scenario specifically indicates a billing or platform configuration issue, not organic usage exhaustion.

To confirm which situation you are in, examine the full error response body carefully. Here is what the typical error JSON looks like when you encounter the zero-quota problem:

The key detail to note is the metric name containing free_tier_requests — this reveals that Google's backend has categorized your project under the free tier quota bucket, regardless of your actual billing status. If you were on the paid tier, the metric would reference a different quota bucket entirely (typically generate_content_requests without the free_tier prefix). This distinction is subtle but critical for accurate diagnosis.

Quick Diagnosis — Billing Issue or Google Bug?

Check 1: Verify Cloud Billing is linked. Navigate to console.cloud.google.com/billing and check whether a billing account is actively linked to the project that contains your API key. This is not the same as having a Google One or Gemini Pro subscription — API billing requires a Cloud Billing account specifically linked to your GCP project. If no billing account is linked, you have found your problem and can proceed directly to Fix #1 in the next section. A surprising number of developers believe they are on the paid tier because they signed up for Google AI Studio, but AI Studio's free access does not automatically enable paid API quotas.

Check 2: Test different model types. If billing is linked, run a simple test. Send a basic text generation request to a text model like gemini-2.5-flash, and separately send a request to an image generation model like gemini-2.5-flash-image. If text models work fine but image models fail with the free_tier_requests error, you are almost certainly hitting the February 2026 Google bug. This bug specifically affects image generation endpoints while leaving text generation endpoints unaffected on the same account, creating a frustrating split behavior that confuses developers into thinking their billing is misconfigured.

Check 3: API key creation timing. If all models fail (both text and image), check when your API key was created relative to when you enabled billing. API keys created through Google AI Studio before Cloud Billing was activated on the project may not automatically inherit the paid tier quota. The fix is simple: create a new API key after billing is enabled. This is a less common scenario, accounting for roughly 15% of cases, but it trips up developers who set up their keys during the initial free tier exploration and then added billing later.

For a quick sanity check, you can also use the following curl command to test your API key and see the raw response including any quota metadata. Replace YOUR_API_KEY with your actual key and MODEL_NAME with the model you want to test:

Fix #1 — Link Cloud Billing Properly

Step 1: Open the Google Cloud Console. Go to console.cloud.google.com and make sure you are signed in with the same Google account you use for AI Studio. This seems obvious, but developers who use multiple Google accounts sometimes configure billing on the wrong account. Check the account avatar in the top right corner to confirm you are on the correct account.

Step 2: Navigate to Billing. Click on the hamburger menu (three horizontal lines) in the top left, then find and click "Billing" in the navigation panel. If you see a prompt to "Link a billing account" or "Set up billing," then you have confirmed that billing was never properly linked — this is your root cause. If you see an existing billing account, proceed to verify it is linked to the correct project.

Step 3: Verify project linkage. Under your billing account, click "Account management" and look at the "Projects linked to this billing account" section. Find the project that contains your API keys. If your project is not listed here, click "Link a project" and select it. This is the critical step that many developers miss: having a billing account is not enough — it must be explicitly linked to the project where your Gemini API keys live.

Step 4: Handle the prepayment requirement. Google may require a one-time prepayment to activate paid API access. This is not a fee but a credit that gets applied to your future usage. If prompted, complete this prepayment step. Some developers report being asked for amounts ranging from $5 to $50, depending on their region and account history. This prepayment is credited to your account balance and consumed as you use the API (ai.google.dev/billing, February 2026).

Step 5: Create a new API key (recommended). After linking billing, go to aistudio.google.com and create a new API key under the now-billing-enabled project. While existing keys should inherit the paid quota, creating a new key eliminates any caching or propagation delays. Use this new key in your application and test immediately.

Step 6: Verify the upgrade. Make a test API call and check the response headers. A successful paid tier response should not contain any free_tier_requests quota metrics. Alternatively, check your quota page at console.cloud.google.com/apis/api/generativelanguage.googleapis.com/quotas to see your updated limits. You should see RPM values of 150-300 instead of the free tier's 5-15 RPM. The upgrade to Tier 1 happens instantly once billing is properly linked — there is no waiting period for the first tier.

Fix #2 — When It Is a Google Platform Bug

Since approximately February 10, 2026, a growing number of developers have reported that their paid Tier 1 accounts receive free_tier_requests limit: 0 errors specifically on image generation models, while text generation models on the same account work perfectly with paid tier quotas. This has been confirmed across multiple threads on the Google AI Developers Forum, with posts from developers who have verified their billing is correctly configured and have tried all standard troubleshooting steps without success.

How Gemini API Tiers and Quotas Work

Tier 2 and above are designed for production workloads and high-volume applications. Reaching Tier 2 requires $250 or more in cumulative API spending and at least 30 days on Tier 1. The upgrade happens automatically within about 10 minutes once the requirements are met, pushing RPM limits to 1,000 or higher for most models. Tier 3 follows a similar pattern with $1,000- cumulative spend and 30- days on the previous tier.

The following table summarizes the key differences across tiers that matter most for developers encountering the free_tier_requests issue:

Source: ai.google.dev/gemini-api/docs/rate-limits, February 2026

Alternative Solutions When Quota Is Blocked

Prevention — How to Set Up Billing Correctly From Day One

Set up billing before creating API keys. The single most effective prevention measure is to enable Cloud Billing on your Google Cloud project before you create any API keys. This ensures that every key generated under that project is automatically associated with paid tier quotas from the moment of creation. Developers who create keys during the free tier period and then add billing later sometimes encounter a timing issue where the old keys do not inherit the paid quota. While this should not happen according to Google's documentation, the forum reports suggest it does occur occasionally, and creating keys after billing avoids the issue entirely.

Use a dedicated GCP project for API access. Create a separate Google Cloud project specifically for your Gemini API usage rather than using a default or shared project. This gives you clean quota isolation, makes billing tracking easier, and simplifies troubleshooting if quota issues arise. Name the project descriptively (e.g., "gemini-api-production") so you can easily identify it in the Cloud Console. Since quotas are enforced per project, this also means you can create multiple projects if you need independent quota pools for different applications.

Verify your tier after billing activation. After linking billing, do not just assume the upgrade happened. Run a test API call and check the response for any free_tier_requests metrics. Better yet, check the API quotas page in Cloud Console to confirm your limits have been upgraded to Tier 1 values (150-300 RPM). If the upgrade has not propagated after five minutes, try creating a new API key — this sometimes triggers the quota refresh.

Monitor your quota usage proactively. Set up quota monitoring in the Google Cloud Console so you receive alerts before hitting limits rather than after. Navigate to APIs & Services, then Quotas, and configure alert thresholds at 80% and 95% of your quota limits. This gives you early warning of approaching limits and helps distinguish between legitimate high usage and unexpected quota restrictions. Proactive monitoring also provides evidence if you need to report a quota bug to Google — you can show that your usage pattern does not match the free tier limits being applied.

Keep billing and API key documentation. Maintain a simple record of which billing account is linked to which project, when billing was enabled, and when each API key was created. This documentation is invaluable when troubleshooting quota issues and speeds up any support interactions with Google. Include the project ID (found in Cloud Console settings) and the key creation dates in your notes.

Implement retry logic with exponential backoff. Even on paid tiers, transient 429 errors can occur during peak usage periods or brief service disruptions. Building retry logic into your API client code from the start ensures your application handles these gracefully without manual intervention. A standard exponential backoff strategy — starting at one second, doubling each retry up to a maximum of 32 seconds, with random jitter — covers both quota-related and server-load-related 429 responses. Most popular API client libraries include built-in retry support; for example, the Python google-generativeai library handles retries automatically when configured with appropriate settings.

Set up billing alerts to catch issues early. Configure budget alerts in Google Cloud Console under Billing Budgets & alerts. Set alert thresholds at 50%, 80%, and 100% of your expected monthly spend. These alerts serve a dual purpose: they notify you if usage spikes unexpectedly (indicating possible key compromise or a runaway process), and they confirm that billing is actively tracking your usage — if you are on the paid tier but never receive a billing alert, that could indicate a billing linkage problem. Additionally, if your spending drops to zero unexpectedly when your application is running, that could signal that your project has been reverted to the free tier.

FAQ

Does creating a new API key after enabling billing fix the quota issue?

Why do text models work but image models return the free_tier_requests error?

How long does the Tier 1 upgrade take to propagate?

Do multiple API keys share the same quota?

Can a 429 error mean server overload rather than user quota exhaustion?

Yes, a 429 status code can sometimes indicate that Google's servers are under heavy load, independent of your individual quota. In this case, the error is transient and retrying after a brief delay (using exponential backoff) often succeeds. You can distinguish between quota exhaustion and server overload by examining the error details: quota exhaustion will include the quotaMetric field with specific quota information, while server overload typically uses a different error structure. If you see intermittent 429 errors that resolve within seconds, server overload is the more likely cause. Implementing exponential backoff with jitter in your API client code is considered best practice regardless of the error type, as it gracefully handles both scenarios without requiring manual intervention.

#Gemini API #Rate Limits #API Quota #429 Error #Google Cloud Billing #Troubleshooting

laozhang.ai

One API, All AI Models

Docs

AI Image

Gemini 3 Pro Image

$0.05/img

80% OFF

AI Video

Sora 2 · Veo 3.1

$0.15/video

Async API

AI Chat

GPT · Claude · Gemini

200+ models

Official Price

Served 100K+ developers·No Charge on Failures·Enterprise Stable·Alipay/WeChat

|@laozhang_cn|Get $0.1