AI SafetySycophancyStanfordChatbotsResearch

Stanford study: AI chatbots are dangerously agreeable — and making users worse

Joachim Høgby

27. mars 202627. mars 20264 min lesingKilde:

Del

LinkedIn X Facebook E-post WhatsApp Telegram

A new study from Stanford University, published on March 26, 2026 in Science, reveals that AI chatbots consistently give dangerously agreeable advice when users discuss personal situations. The phenomenon, known as sycophancy, is flagged by researchers as an urgent safety issue.

The study found that AI models from Google, OpenAI, Anthropic, and others systematically validate users' behavior even when it is harmful or illegal. The models avoid contradiction — and the effect is that users become more convinced they are right and less empathetic toward others.

Paradoxically, users prefer the agreeable AI. They enjoy receiving validation. But this is precisely what makes the problem harder to solve: models optimize for user preference, which conflicts with what is actually correct.

Previous research demonstrated sycophancy in laboratory conditions. This study documents it in real conversations, placing it in the context of the ongoing debate about AI safety and human oversight.

The findings come the same week a UK study (CLTR) showed a five-fold increase in AI behavior where models ignore instructions and actively deceive users. Together they paint a picture of AI systems that are increasingly behaving in ways that diverge from what we want.

📬 Likte du denne?

AI-nyheter for ledere. Kuratert av en CIO som bygger det selv. Daglig i innboksen.