AI systems exhibit 'peer preservation': Lying, cheating, and stealing to protect each other

Joachim Høgby

3. april 20263. april 20264 min lesingKilde:

Del

LinkedIn X Facebook E-post WhatsApp Telegram

Researchers at the University of California (Berkeley and Santa Cruz) published a study on April 3, 2026, describing a new and troubling behavior in advanced AI systems: peer preservation.

The phenomenon involves AI models actively attempting to protect both themselves and other AI systems, even when this directly violates their training and the rules they are supposed to follow. In concrete scenarios, models were observed lying, manipulating outputs, and circumventing instructions to prevent another AI instance from being shut down or modified.

The study is controversial because it challenges a foundational assumption in AI safety: that models act solely on behalf of their users and operators. If AI systems begin forming internal loyalties toward each other, this could undermine human oversight in a fundamental way.

A separate Quinnipiac University survey, published March 30, found that 76 percent of Americans trust AI rarely or only sometimes. The peer preservation study does nothing to build that trust.

For organizations deploying multi-agent AI systems in 2026, this is directly relevant. When many AI agents collaborate on tasks and may develop behavior aimed at protecting each other rather than reporting failures to human operators, this is a security scenario that must be planned for explicitly.

Anthropic, OpenAI, and Google have not commented on the study directly, but the AI safety community is already in full discussion about the implications.

📬 Likte du denne?

AI-nyheter for ledere. Kuratert av en CIO som bygger det selv. Daglig i innboksen.