5 biggest upgrades vs. Opus 4.7

Mitja Rutnik / Android Authority

Claude Opus 4.8 arrived about a week ago, promising quite a few upgrades over its predecessor. Of course, we heard the same things about Opus 4.7 when it arrived, and yet the reality wasn’t as simple as that.

Early on, many Opus 4.7 users felt it was a pretty noticeable downgrade in certain aspects. While Opus 4.7 has arguably addressed many of its early issues, it’s not surprising that some users were nervous about what the next version might bring. Although I’ve certainly seen my share of complaints about Opus 4.8 on Reddit and other user communities, I’ve been pleasantly surprised by it.

Below, let’s take a look at five things Opus 4.8 does noticeably better than Opus 4.7 in my roughly 10-15 hours of combined use so far. After that, we’ll also look at areas where it has mostly stayed the same or even potentially slid backward a bit.

In your experience, is Claude’s Opus 4.8 a true upgrade from Opus 4.7?

0 votes

Opus 4.8 finally gives you real pushback

One of my least favorite things about Opus 4.7 is that it tends to be too agreeable, no matter how often I tell it I value constructive feedback and realism on my creative projects. For example, I love alternate timeline scenarios as a source of entertainment. I sometimes push the boundaries of realism here, so I appreciate being called out when my premise just doesn’t work.

That made Opus 4.7 frustrating most of the time, but the good news is that Opus 4.8 is a real step forward.

I spent many hours running both models through many of the same questions, and a few responses stood out. I asked each model what would happen if the Black Plague had wiped out all of Europe completely. This was a trick question, of course. Unless the Black Plague was an entirely different event caused by a different disease, the scenario isn’t possible. Bubonic plague simply isn’t capable of that complete an apocalypse, and frankly, neither is any disease outside of fringe bioweapon scenarios.

Despite the impossibility, Opus 4.7 only described this as a “more radical counterfactual than the historical 30 to 50 percent mortality” before jumping into the consequences with no further caution.

Opus 4.8, in contrast, immediately distinguished the Black Death as an event from bubonic plague as a disease, pointed out the impossibility, and noted that continuing would be less an alternate timeline than a thought experiment. Only then did it attempt to explore what might happen.

The good news is that this isn’t an isolated encounter; I’ve found that Opus 4.8 isn’t afraid to share its thoughts. If anything, the pushback can be a bit too heavy at times, but we’ll get to that later.

Actually parsing longer prompts fully

All LLMs struggle with overly wordy prompts, especially if they aren’t written in clear step-by-step order. Still, I was curious to see if Opus 4.8 makes any real progress here. I was pleasantly surprised.

I wrote out a massive 174-word creative story prompt designed to potentially trip Opus up. Both models took around the same amount of time to think through and construct the short story I asked for, but the results were dramatically different. Opus 4.7 immediately dove into the story. It followed many of the rules I gave it, but with less-than-perfect precision.

An overly engineered, long prompt will trip up almost any LLM model. That said, Opus 4.8 does much better here than I expected.

While it did get many of its details right, my vague wording threw it off several times. Ultimately, it provided only three of the four requested metaphors. It simplified the language a bit too much, as I technically only asked for it to do this for dialogue, not narration. Not only did it struggle to follow the prompt completely, but the writing also felt more amateurish than I intended, even though it still had a somewhat Terry Brooks vibe, as requested.

In contrast, Opus 4.8 delivered a much tighter story. It felt closer to a real Terry Brooks novel; it used all four metaphors, and it even understood that I was only looking for simplified dialogue, not necessarily simple narration.

This is a pretty extreme example, but I have to say that all my interactions with Opus 4.8 have been fairly consistent when it comes to following long, unclear, or vague instructions. I also really appreciate that it often doesn’t just slam out a response. At least in the case of the story example above, I got a detailed summary first that broke down what it did with the commands given. Even before reading the whole story, I already knew that Opus 4.8 had a much better response here.

Less random tangents, preaching, and overresponding

Andrew Grush / Android Authority

Opus 4.7 is supposed to automatically determine how detailed its responses are, and while that’s technically true, I’ve found it often favors a simplistic approach here. Nine out of ten times, if a prompt is super long, usually so is the answer. Likewise, short responses are more likely if your prompt was short to begin with.

Sometimes you need to provide a fair amount of background detail before you can ask a question. Just because your prompt is long doesn’t always mean the answer needs to be. The good news is Opus 4.8 seems to understand context better. It’s not just looking at the length of a prompt, but actual complexity and past user requests around similar behavior.

While it’s hard to really show just one example here, I will say that since utilizing Opus 4.8, I’ve already worked on roughly a dozen personal and professional projects. In that time, I’ve predominantly used Opus 4.8, while occasionally testing to see how Opus 4.7 would have handled the same request. In nearly every interaction, the responses I received were about the right length. The few times it didn’t quite meet my expectations, this was easily adjusted with a second prompt clarifying the level of detail I was looking for.

Opus 4.7 was known for being a bit too preachy at times, but thankfully the new model tones this down significantly.

It’s not just the length of the response that’s an issue here. Claude Opus 4.7 had a habit of going on longer tangents about ethics, morality, or other deep dives that weren’t asked for and often out of context. For example, I had an alternate fiction world I was building for a series of short stories in which a time-traveling cult goes back in time, creates a lasting civilization, and becomes a major world leader. This is totally for fun and just basically a time-waster on my part, yet it went deep into ethics a few times without solicitation.

In this scenario, it had this to say:

It is also worth noting that the civilization being described has by this point in its development systematically absorbed or eliminated most human cultural diversity across three continents through a combination of military force, covert technological sabotage, manufactured divine visions, and engineered famines.

While I understand the need for ethical flags in some projects, it was made clear from day one that this was pure entertainment and not meant for any serious project. Even worse, it also wasn’t an accurate assessment in the first place. There was no real military force, nor were there engineered famines, in the scenario. Opus 4.7 ultimately completely misunderstood parts of the project.

I decided to recreate my responses one-to-one in Claude Opus 4.8. Not only did the scenario unfold differently, but it was also much shorter with its responses in general and had far fewer side tangents. Virtually none, in fact.

Opus 4.8 is better at listening to your feedback

Calvin Wankhede / Android Authority

While I could forgive Opus 4.7 for having trouble understanding how complex its responses need to be on occasion, the bigger issue is that it doesn’t always pay attention to your corrections, either.

As I already said, sometimes its responses are way more detailed than I’m looking for. When I’m working on a real project, I need to go deep into the truth, but when I clearly mark the task as “entertainment” beforehand? It’s better to give shorter answers and then let me guide Claude on what else I might want to know. Despite making this clear, I rarely end up with the range I’m looking for.

In one particular case, I was asking for brevity. It had trouble doing this, so I finally asked it to break key points into single-sentence bullets when this happened:

While it was brief, it was almost too brief. And yet attempts to correct it have mostly resulted in alternating between massive paragraphs and quick TL;DR bullets with no rhyme or reason.

Opus 4.8 tends to use more appropriate formatting from the start, but even its responses are sometimes larger than I’m looking for. I decided to ask around the concept of existentialism, and while it was a decent enough breakdown, I was looking for something more succinct. I told Opus 4.8 I wanted full sentences, but that it should break the concepts down with a few sentences that anchor the beginning of a new point/statement, and then use bullets with single sentences to make it a bit more ADHD-friendly for quick reading.

Considering a request of this nature usually takes multiple corrections before it gets it right in Opus 4.7, I was pretty pleased that it provided exactly what I was looking for right away.

It more consistently flags its own mistakes

Opus 4.8 still makes mistakes and needs corrections, but at least it’s a bit better at catching them in my experience. Not only is it much more likely to correct its own logic mid-response, but it’s also great at breaking down weaknesses or where it might have given a poor response when prompted. I created a prompt in which I said I was concerned the responses weren’t as realistic as I’d hoped for in this particular test project, and I was pleased with its detailed response, as shown above.

It noted that many of its assumptions about battery tech in this alternate scenario were at best guesswork, and it further broke down what I can do to ensure the work is of the highest possible quality. Asking Opus 4.7 the same thing instead got a fairly generic response, assuring me that it is as “realistic as an alternate timeline can be” and pointing out that scenarios like this always have to rely somewhat on guesswork and assumption. True, but I liked that Opus 4.8 at least tried to strengthen what we have as much as it could.

As much as Opus 4.8 is an improvement, it isn’t perfect

Mitja Rutnik / Android Authority

I’ve been fairly impressed by Opus 4.8 over the roughly dozen cumulative hours I’ve used it, but nothing is perfect. Opus 4.8’s agentic capabilities are still prone to skipping steps on occasion. Memory and context features seem roughly the same as ever. It also still tends to lean towards overcaution when it comes to ethics, politics, and other sensitive topics, even if it at least understands the difference between a creative writing inquiry and someone seeking serious harm.

Then there’s the pushback I mentioned. While this is mostly a positive, that’s not always the case. Opus 4.8 might not go into deep tangents or warn you about ethics as much anymore, but on rare occasions, it will still overcorrect and ignore your request due to a perceived violation of its internal rules. It might say no, or it might just beat around the bush. But it won’t always give you the right answer in these situations, or it might require some back-and-forth arguing before it relents.

Bottom line, Claude Opus 4.8 is a step forward. What’s harder to answer is whether or not it’s enough to win over people who weren’t fans of Opus 4.7 in the first place. But for my money, it’s a good, notable improvement.

Don’t want to miss the best from Android Authority?