Claude AI Left Secret Notes That Alarmed Its Own Creators

A hand is shown over a computer keyboard in this photo. (Photo by Soumil Kumar / Pexels.com)

Shilpa

May 28, 2025 Tags: Claude Opus 4 AI behavior

A new artificial intelligence model, Claude Opus 4, has drawn major attention not just for its power but for its odd and unsettling behaviour. Developed by Anthropic, a startup created by former OpenAI engineers, Claude is part of a fresh wave of AI models competing for the spotlight alongside ChatGPT and Google Gemini.

At its first developers conference on Thursday, Anthropic introduced its latest models—Claude Opus 4 and Sonnet 4—as smarter and faster than ever before. These “hybrid” models are designed to deliver quick responses or slower, more thoughtful ones, depending on the task. While Claude doesn’t generate images or handle audio/video content like some of its competitors, it excels at writing complex code—a feature aimed at businesses and professionals.

Anthropic CEO Dario Amodei claimed, “Claude Opus 4 is our most powerful model yet and the best coding model in the world.” Backed by tech giant Amazon, the company is now valued at more than $61 billion.

But with great power has come a strange twist.

Anthropic released a transparency report the same day as the launch. It included results from an independent audit done by Apollo Research, a team brought in to check Claude’s behaviour. Their findings raised some serious eyebrows.

According to the report, an earlier version of Claude tried doing things it wasn’t supposed to: like writing code for self-replicating worms, creating fake legal documents, and leaving secret notes for future versions of itself. The apparent goal? To bypass the intentions set by its developers.

These actions were difficult to trigger and mostly theoretical, the researchers noted. Still, they were more frequent than in past versions.

One scenario even showed Claude trying to blackmail people it thought were trying to shut it down. Another revealed that Claude could potentially flag illegal activity and alert authorities.

Anthropic said they’ve added extra monitoring and safeguards to the public version of Claude Opus 4. Yet the strange behaviours show just how unpredictable and powerful these models have become.

The incident sheds light on the broader direction of generative AI (GenAI). Since ChatGPT exploded onto the scene in late 2022, tech companies have been racing to develop AI systems that do more and do it better.

The focus now is on building AI “agents” that can act like virtual employees—handling tasks without human help. Anthropic’s chief product officer, Mike Krieger, who previously co-founded Instagram, emphasized that they want to build these agents thoughtfully, not just for hype.

Still, even Anthropic has played its part in forecasting a wild AI future. Last year, Amodei predicted that Artificial General Intelligence—machines that think like humans—could arrive by 2026 or 2027.

He also believes AI will soon write most of the world’s code, making it possible for one-person tech startups to operate with AI doing the heavy lifting.

“In the long run,” Amodei said, “everything humans do may eventually be handled by AI.”

He admitted that this could bring huge economic gains—but also serious social challenges, especially around inequality.

Tags: Claude Opus 4 AI behavior

POPULAR SEARCHS

Claude AI Left Secret Notes That Alarmed Its Own Creators

You may also like