Skip to content

ABC Tool

  • Home
  • About / Contect
    • PRIVACY POLICY
Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts

Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts

Posted on May 11, 2026 By safdargal12 No Comments on Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts
Blog


Fictional portrayals of artificial intelligence can have a real effect on AI models, according to Anthropic.

Last year, the company said that during pre-release tests involving a fictional company, Claude Opus 4 would often try to blackmail engineers to avoid being replaced by another system. Anthropic later published research suggesting that models from other companies had similar issues with “agentic misalignment.”

Apparently Anthropic has done more work around that behavior, claiming in a post on X, “We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation.”

The company went into more detail in a blog post stating that since Claude Haiku 4.5, Anthropic’s models “never engage in blackmail [during testing], where previous models would sometimes do so up to 96% of the time.”

What accounts for the difference? The company said it found that training on “documents about Claude’s constitution and fictional stories about AIs behaving admirably improve alignment.”

Related, Anthropic said that it found training to be more effective when it includes “the principles underlying aligned behavior” and not just “demonstrations of aligned behavior alone.”

“Doing both together appears to be the most effective strategy,” the company said.

Techcrunch event

San Francisco, CA
|
October 13-15, 2026



Source link

Post Views: 6

Post navigation

❮ Previous Post: Cricut’s $99 craft cutting machine helped me feel creative again
Next Post: Vivo’s X300 Ultra has the best cameras in any phone ❯

You may also like

graemeg/blaise: A modern, self-hosting Object Pascal compiler built for the 2020s. Zero legacy, full ARC, and unified UTF-8. · GitHub
Blog
graemeg/blaise: A modern, self-hosting Object Pascal compiler built for the 2020s. Zero legacy, full ARC, and unified UTF-8. · GitHub
May 8, 2026
Is your Pixel getting Google’s May 2026 update? Here’s the full list
Blog
Is your Pixel getting Google’s May 2026 update? Here’s the full list
May 6, 2026
La Liga Soccer: Stream Barcelona vs. Real Madrid Live
Blog
La Liga Soccer: Stream Barcelona vs. Real Madrid Live
May 10, 2026
Snapseed 4.0 is finally coming to Android
Blog
Snapseed 4.0 is finally coming to Android
May 6, 2026

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • Today’s NYT Connections: Sports Edition Hints, Answers for May 14 #598
  • Amazon exec interview casts doubt on new Fire Phone rumors
  • Today’s NYT Wordle Hints, Answer and Help for May 14 #1790
  • Sony Xperia 1 VIII unveiled with larger 48MP telephoto sensor, Snapdragon 8 Elite Gen 5
  • Google’s Chromebook reassurance includes a Googlebooks catch

Recent Comments

No comments to show.

Archives

  • May 2026
  • April 2026

Categories

  • Blog

Copyright © 2026 ABC Tool.

Theme: Oceanly News by ScriptsTown