UK AISI: GPT-5.5 reaches Mythos-level cyber performance

Joachim Høgby

30. april 202630. april 20264 min lesingKilde: UK AI Security Institute (AISI)

Del

LinkedIn X Facebook E-post WhatsApp Telegram

The UK AI Security Institute published an evaluation of OpenAI's GPT-5.5-Cyber on 30 April.

The new point is not that OpenAI plans to make the model available to selected defenders; that was already known. The new point is that an independent government evaluator has now put numbers on how far offensive cyber capabilities have moved. AISI says GPT-5.5 reaches a similar level to Anthropic's Claude Mythos Preview on its cyber evaluations.

On expert-level tasks, GPT-5.5 achieved an average pass rate of 71.4 percent, compared with 68.6 percent for Mythos Preview, 52.4 percent for GPT-5.4 and 48.6 percent for Claude Opus 4.7. The tests cover skills such as reverse engineering, web exploitation, cryptography and realistic software vulnerabilities.

AISI also tested the model in “The Last Ones”, a 32-step simulated corporate network attack chain built with SpecterOps. GPT-5.5 completed the chain in 2 out of 10 attempts at a budget of 100 million tokens per attempt. AISI estimates that a human expert would need about 20 hours for the same scenario.

This matters for leaders because cyber AI is becoming an access and operating-risk issue, not just another productivity tool. It affects who can use which models, which environments those models may touch, and how logging, approvals and incident response should work when an AI agent can chain together reconnaissance, code, credentials and lateral movement.

Fact: AISI stresses that the tests were run in controlled research settings and do not necessarily show what an ordinary public user can make the model do. Public deployments include additional safeguards, monitoring and access controls.

Still, the risk picture is sharper than in a normal model launch. AISI also found a universal jailbreak that elicited disallowed cyber content across all malicious cyber queries OpenAI had supplied for the test, including in multi-turn agentic settings. OpenAI then made several updates to its safeguard stack, but AISI says a configuration issue prevented it from verifying the final configuration.

Assessment: CIOs, CISOs and boards should update controls before these models become widely available. Start with three practical steps: separate ordinary AI use from cyber and coding agents, require approval before models touch repositories, CI/CD, cloud accounts or security tooling, and design logging for incident response rather than only for compliance.

For organisations in critical infrastructure, finance, healthcare or large software environments, this is also a procurement issue. Vendors offering “AI for security” should document evaluation data, access controls, misuse monitoring and rollback procedures. If a model can act like a junior penetration tester inside your environment, it must be governed as a privileged actor.

📬 Likte du denne?

AI-nyheter for ledere. Kuratert av en CIO som bygger det selv. Daglig i innboksen.