Google may have fixed the issue that was exhausting your Gemini usage limits
Source: Android Authority
TL;DR
- Google is fixing major quota complaints in Gemini by addressing bugs and making usage limits more predictable.
- The company is also changing how heavy usage is counted, while failed requests and Flash‑Lite prompts won’t count towards limits at all.
- To improve transparency, Google is adding better breakdowns for Deep Research usage and making model selection persistent across sessions.
Background
We recently reported that Google had quietly tightened parts of its AI Pro plan, and users quickly noticed their limits being hit much faster than expected—sometimes after just a few prompts. Google later increased quotas for Antigravity users to calm things down, but that only addressed part of the frustration.
Josh Woodward, Vice President at Google, responded directly in a post on X, acknowledging that users were encountering limits sooner than they should. He said the company is rolling out several fixes designed to make usage more predictable, reduce confusion, and ensure quotas feel more consistent across different types of tasks.
Major Fixes
Omni video generation bug
A bug tied to the Omni video generation model caused a single or a couple of video prompts to consume a large portion of a user’s quota. Google has now fixed the issue and is increasing allowances for heavier users. Ultra subscribers, for example, are receiving double the number of Omni video generations starting immediately.

Complex 3.1 Pro prompt caps
Complex 3.1 Pro prompts—long, detailed instructions often accompanied by large file uploads or multi‑step reasoning—were previously draining quotas aggressively. Google is introducing caps per prompt, preventing a single heavy request from wiping out a large chunk of the monthly allowance.

Failed requests no longer count
About 1 in 10 requests can fail due to system errors. Previously, even failed attempts counted against the quota. Google is correcting this: failed requests will not be charged against usage.

Flash‑Lite prompts become free
Flash‑Lite prompts will no longer count against quota at all, effectively making Flash‑Lite a free layer for lighter tasks and encouraging users to rely on lighter models when full reasoning power isn’t needed.
Deep Research usage breakdowns
Google is adding more detailed breakdowns and notifications for Deep Research usage—the compute‑heavy tasks where Gemini processes large inputs or runs multi‑step analysis. Users will now see clearer information about which task types are expensive and which are not.

Persistent model selection
The app will now remember the selected model across sessions, so users won’t need to re‑choose their preferred writing or research setup each time they open Gemini. The only exception is when a usage cap is hit; the system may automatically switch to a lighter model to keep things running.
Conclusion
These updates show Google’s effort to smooth out a system that had become inconsistent for many users. While the limits remain, the changes aim to make them feel more logical and transparent. Whether this fully resolves user frustration remains to be seen, but the direction is decidedly more user‑friendly.