Brady Snyder / Android Authority
TL;DR
- Google is fixing major quota complaints in Gemini by addressing bugs and making usage limits more predictable.
- The company is also changing how heavy usage is counted, while failed requests and Flash-Lite prompts won’t count towards limits at all.
- To improve transparency, Google is adding better breakdowns for deep research usage and making model selection persistent across sessions.
Now, Josh Woodward, Vice President at Google, has responded more directly in a post on X, acknowledging that users were encountering limits sooner than they should. He said the company is now rolling out several fixes designed to make usage more predictable, reduce confusion, and ensure quotas feel more consistent across different types of tasks.

One of the biggest fixes involves a bug tied to Omni video generation. In some cases, users were finding that just one or two video prompts were eating up a large portion of their quota. For example, someone experimenting with short clips or testing different styles could suddenly see their allowance drop far more than expected after only a couple of attempts. Google says this issue has now been fixed, and it is also increasing allowances for heavier users. Ultra subscribers, for instance, are getting double the number of Omni video generations starting immediately.
Another area that caused complaints was Google’s Complex 3.1 Pro prompts. These are long, detailed instructions, often accompanied by large file uploads or multi-step reasoning tasks. These prompts were also consuming quotas in a way that felt too aggressive. Google is now changing this by introducing caps per prompt. Instead of one very heavy request potentially draining a large chunk of your usage, the system will now limit how much a single prompt can consume. The idea is to prevent extreme outliers where one task wipes out too much of your monthly allowance.

There is also a change that users will likely appreciate in everyday use. Woodward noted that about 1 in 10 requests can fail due to system errors. Earlier, even failed attempts could still count against your quota, which understandably felt unfair. That is now being corrected. If a request fails, it will not be charged against your usage. So if Gemini glitches out while generating a response, that attempt no longer eats into your limit.

A notable update is that Flash-Lite prompts will no longer count against quota at all. This effectively turns Flash-Lite into a free layer for lighter tasks. It also subtly encourages users to rely on lighter models when they do not need full reasoning power, which should help stretch the limits of higher tiers further.
Google is also working on more detailed breakdowns and notifications for Deep Research usage. These are the more compute-heavy tasks where Gemini processes large inputs or runs multi-step analysis. Many users currently have little visibility into why their quotas drop faster on some days than others. The goal is to make that much clearer, so users can actually see which types of tasks are expensive and which are not.

Finally, there is a useful improvement in how model selection works. Once you choose a specific model inside Gemini, the app will remember it across sessions. So if you prefer a particular writing or research setup, you won’t need to select it every time you open the app. The only exception is when you hit a usage cap, in which case the system may automatically switch to a lighter model to keep things running.
These changes definitely feel like Google trying to smooth out a system that had become inconsistent for many users. The limits are still there, but the company is clearly trying to make them feel more logical. Whether that fully fixes the frustration remains to be seen, but at least the direction now feels more user-friendly than opaque.
Thank you for being part of our community. Read our Comment Policy before posting.





