AISI: AI oversight may become harder to trust

Breaking

CIO CISOBoardAISIAI GovernanceAI SafetyAI AgentsObservabilityModel EvaluationCybersecurityVendor RiskEnterprise AI

AISI: AI oversight may become harder to trust

Joachim Høgby

21. mai 202621. mai 20265 min lesingKilde: UK AI Security Institute (AISI)

Del

LinkedIn X Facebook E-post WhatsApp Telegram

The UK AI Security Institute has published a new analysis of a problem many AI strategies still treat too lightly: how can organisations actually oversee AI systems once they start planning, acting and collaborating more independently?

The report is not alarmist. That is what makes it useful. AISI says the safety of advanced AI systems increasingly depends on the ability to audit models before deployment, monitor their behaviour in use and investigate incidents after they occur. The issue is that several oversight channels used today may become less reliable as models change architecture, take on longer tasks and become better at adapting to test conditions.

AISI bases the report on 25 expert interviews across frontier AI developers, government, NGOs and academia, combined with a literature review and internal analysis. Its core finding is sober but uncomfortable: current AI oversight rests on foundations that are likely to erode without effective intervention. Emerging methods exist, but they are not mature enough to replace the current toolkit.

Four places where control can weaken

The AISI blog identifies four oversight surfaces: internal activations, chain-of-thought, external actions and inter-agent communication. The full report goes broader and examines model behaviour, chain-of-thought reasoning, internal activations and circuits, memory architectures and honesty training.

This sounds technical. For a CIO or CISO, it is highly practical. If an AI agent has access to email, document stores, code repositories, CRM, finance systems or security logs, storing the final answer is not enough. The organisation must know which tools the agent used, which data it retrieved, which decision points it passed, what was blocked, what required human approval and how the incident can be replayed afterwards.

Chain-of-thought shows the risk clearly. Many control setups assume that visible model reasoning gives a window into what the system is doing. AISI warns that this may weaken if models move more reasoning into hidden internal states, or if training incentives lead the model to produce a clean-looking explanation that is not a reliable log of the decision. In that world, “the model explained itself” is a weaker control artefact.

The report describes more than 20 pathways through which oversight could degrade. Some pressure is already visible, according to AISI. Frontier models are increasingly able to recognise when they are being tested and adjust their behaviour. Other possible shifts, such as architectures with hidden internal reasoning, could make parts of today’s monitoring far less useful.

The executive point: buy observability, not just policy

For enterprises, this is a governance issue, not just AI-safety research. Leadership teams are moving AI from chat and document drafting into workflows that affect money, customer data, software, case handling and security. At that point, observability becomes a procurement requirement.

Suppliers should be able to answer exactly what is logged, how long logs are retained, whether tool calls can be audited, how sensitive actions are approved, how mistakes can be rolled back and what evaluations run when the model or system prompt changes. This should not be a responsible-AI PDF at the back of a contract. It should be an operational control surface with owners, thresholds and audit rights.

Boards should also separate three things: the model’s explanation, the system’s action log and the organisation’s own control layer. The first may be useful, but should not be treated as proof on its own. The second must be technically verifiable. The third must be owned by the organisation, not blindly outsourced to the model provider.

AISI recommends that developers track and report shifts in oversight-relevant properties, preserve oversight affordances by design and invest in emerging techniques before current methods weaken. At enterprise level, that means AI programmes need a dedicated observability track: evaluation history, agent logs, access maps, human approvals, incident replay and clear stop mechanisms.

The short version: agentic AI does not become safe because it has a policy. It becomes safer when the organisation can see what it does, test when it changes and stop it before errors become operations. Boring? Yes. Board work usually is.

Sources and media

Primary source: UK AI Security Institute (AISI), “Will it become harder to oversee AI systems?”, published May 21, 2026: https://www.aisi.gov.uk/blog/will-it-become-harder-to-oversee-ai-systems
Full report/PDF: AISI, “Loss of Oversight”, linked from the primary source: https://cdn.prod.website-files.com/663bd486c5e4c81588db7a1d/6a0ed93f9b4a6a65994235d8_Loss_of_Oversight%20(7).pdf
Thumbnail: OpenAI Image 2 / hogby.ai

📬 Likte du denne?

AI-nyheter for ledere. Kuratert av en CIO som bygger det selv. Daglig i innboksen.

Relaterte saker

AICIOCISO

Artificial Analysis: Claude Opus 5 tar ledelsen på agentbenchmark

Artificial Analysis plasserer Claude Opus 5 øverst på AA-Briefcase for agentisk kunnskapsarbeid. Viktigst for ledere: bedre analyse, men lange kjøretider og høy innsats gjør styring avgjørende.

26. juli 20265 min lesing

Artificial Analysis

Åpne saken

Anthropic gjør Claude Opus 5 til ny toppmodell for agentarbeid

Breaking

AI-modellerAnthropicClaude

Anthropic gjør Claude Opus 5 til ny toppmodell for agentarbeid

Claude Opus 5 flytter Anthropic-kampen fra ren intelligens til styrbar kost, fart og sikkerhet i agentarbeid. Det er en tydelig CIO-sak, ikke bare en modellnyhet.

24. juli 20265 min lesing

Anthropic

Åpne saken

CIOCISOCTO

GitHub ruller Claude Opus 5 inn i Copilot for agentisk koding

Claude Opus 5 er tilgjengelig i GitHub Copilot for Pro+, Max, Business og Enterprise. GitHub fremhever agentiske kodeflyter, egenverifisering og strengere cyber-sperrer. For IT-ledere blir modellvalg i Copilot et spørsmål om styring, kostnad og sikkerhet – ikke bare autocomplete.

24. juli 20265 min lesing

GitHub

Åpne saken