Users say Gemini starts forgetting long before it's supposed to

[ad_1]

Joe Maring / Android Authority

TL;DR

Google says Gemini on Pro and Ultra plans offers a context window of up to one million tokens.
However, some users online have complained that Gemini chats don’t support this context window.
We’ve asked Google whether it plans to offer more prominent info about the chat context window.

Google has several paid AI plans, offering increased Gemini usage limits, access to more advanced models, and cloud storage. The Pro and Ultra plans also offer an expanded token context window (or a longer “memory”). However, a couple of users have highlighted a huge apparent gap between Google’s claims and Gemini’s actual context window.

Google says in its promotional material that Gemini on Pro and Ultra plans offers an expanded context window of up to one million tokens. The firm says this means you can process up to 1,500 pages of text or 30,000 lines of code. See the screenshots below.

Don’t want to miss the best from Android Authority?

Now, X user @Soso_fun_yt claims that this context window is misleading for chat users:

While the backend can successfully ingest a massive static file initially on the first prompt, the active conversational memory (the dynamic context window / KV cache for the chat) appears to be severely bottlenecked, dropping significantly to a 16k~ limit. (Or 25-30 messages in average)

As a result, the model quickly suffers from amnesia within the exact same chat session, completely forgetting earlier instructions, code blocks, or constraints.

In other words, Gemini’s servers can indeed handle up to one million tokens of context, but the chatbot can’t analyze them for you in one session without starting to forget earlier parts of the conversation. This issue was also raised by Redditors last month, although some users noted that the AI Studio platform offered the correct context window.

So is Google misleading users with this claim? It certainly seems like the company could be far more transparent about the difference between the model’s overall context window and the chat’s context window. It’s a bit like your ISP offering a 1Gbps line on its website, but not prominently disclosing the 50Mbps upload speeds.

Google does offer details about input and output tokens on a developer support website. The site reports that many models output roughly 65,000 tokens. However, it’s unclear whether this figure only applies to developers or if it also applies to the Gemini chat.

We’ve asked Google about the discrepancy between the token context window and chat window. We’ve also asked the company whether it plans to offer more prominent information about chat window context. We’ll update our article as soon as the company has answers for us.

Thank you for being part of our community. Read our Comment Policy before posting.

[ad_2]

Source link

Post Views: 23

Users say Gemini starts forgetting long before it’s supposed to

Leave a Reply Cancel reply