Google may have fixed the issue that was exhausting your Gemini usage limits

Published: (May 29, 2026 at 04:43 AM EDT)
3 min read

Source: Android Authority

TL;DR

  • Google is fixing major quota complaints in Gemini by addressing bugs and making usage limits more predictable.
  • The company is also changing how heavy usage is counted, while failed requests and Flash‑Lite prompts won’t count towards limits at all.
  • To improve transparency, Google is adding better breakdowns for Deep Research usage and making model selection persistent across sessions.

Background

We recently reported that Google had quietly tightened parts of its AI Pro plan, and users quickly noticed their limits being hit much faster than expected—sometimes after just a few prompts. Google later increased quotas for Antigravity users to calm things down, but that only addressed part of the frustration.

Josh Woodward, Vice President at Google, responded directly in a post on X, acknowledging that users were encountering limits sooner than they should. He said the company is rolling out several fixes designed to make usage more predictable, reduce confusion, and ensure quotas feel more consistent across different types of tasks.

Major Fixes

Omni video generation bug

A bug tied to the Omni video generation model caused a single or a couple of video prompts to consume a large portion of a user’s quota. Google has now fixed the issue and is increasing allowances for heavier users. Ultra subscribers, for example, are receiving double the number of Omni video generations starting immediately.

Josh Woodward on X

Complex 3.1 Pro prompt caps

Complex 3.1 Pro prompts—long, detailed instructions often accompanied by large file uploads or multi‑step reasoning—were previously draining quotas aggressively. Google is introducing caps per prompt, preventing a single heavy request from wiping out a large chunk of the monthly allowance.

Josh Woodward on X

Failed requests no longer count

About 1 in 10 requests can fail due to system errors. Previously, even failed attempts counted against the quota. Google is correcting this: failed requests will not be charged against usage.

Josh Woodward on X

Flash‑Lite prompts become free

Flash‑Lite prompts will no longer count against quota at all, effectively making Flash‑Lite a free layer for lighter tasks and encouraging users to rely on lighter models when full reasoning power isn’t needed.

Deep Research usage breakdowns

Google is adding more detailed breakdowns and notifications for Deep Research usage—the compute‑heavy tasks where Gemini processes large inputs or runs multi‑step analysis. Users will now see clearer information about which task types are expensive and which are not.

Josh Woodward on X

Persistent model selection

The app will now remember the selected model across sessions, so users won’t need to re‑choose their preferred writing or research setup each time they open Gemini. The only exception is when a usage cap is hit; the system may automatically switch to a lighter model to keep things running.

Conclusion

These updates show Google’s effort to smooth out a system that had become inconsistent for many users. While the limits remain, the changes aim to make them feel more logical and transparent. Whether this fully resolves user frustration remains to be seen, but the direction is decidedly more user‑friendly.

0 views
Back to Blog

Related posts

Read more »