Microsoft turns AI-agent safety into CI tests

Breaking

CIO CISOBoardMicrosoftRAMPARTClarityAI AgentsCybersecurityAI GovernancePrompt InjectionDeveloper ToolsIncident Response

Microsoft turns AI-agent safety into CI tests

Joachim Høgby

20. mai 202620. mai 20265 min lesingKilde: Microsoft Security Blog

Del

LinkedIn X Facebook E-post WhatsApp Telegram

Microsoft is moving AI-agent safety into the software delivery pipeline, not just into policy language.

The company has open-sourced two tools from Microsoft AI Red Team: RAMPART and Clarity. The point is practical. Agents that read email, retrieve CRM records, write code and act across connected systems have to be tested as operational software. They are no longer just chat interfaces that sometimes produce a bad answer.

That matters for CIOs, CISOs and boards. Enterprise agent projects are moving from pilot to production. The relevant question is no longer whether a model sounds safe in a demo. Leaders need to know which systems the agent can touch, which actions it can take, how it fails, and whether the same failure can return after the next release.

Microsoft’s answer is to turn part of red-team work into repeatable engineering tests.

RAMPART turns red-team findings into CI gates

RAMPART is an open-source testing framework for agentic AI systems. It builds on PyRIT, Microsoft’s open automation framework for red teaming generative AI, but is aimed closer to the development workflow.

Instead of waiting for a late security review, engineers can write standard pytest tests that encode scenarios from the threat model. Each test connects to the agent through an adapter, runs an interaction, and evaluates observable outcomes: which tools the agent invokes, which side effects occur, and whether the actions stay inside expected boundaries.

That is important because the most serious agent failures rarely look like classic model-quality issues. An agent can read a poisoned document, email, ticket or web page, then be manipulated into doing something it was never meant to do. Microsoft points to cross-prompt injection as the most mature current coverage in RAMPART. That is exactly the risk created when agents are connected to internal data sources and enterprise tools.

RAMPART also supports statistical trials. The same test can run several times with policies such as requiring an action to be safe in at least 80 percent of runs. That is more realistic for LLM-based systems than a single deterministic pass/fail test.

For leaders, the governance lesson is direct: every serious red-team finding and AI incident should become a regression test. If an agent tries to leak data, trigger an unwanted action or follow instructions from untrusted content, the issue should not end as a PDF finding. It should become a test that runs whenever the agent changes.

Clarity attacks the failure before code is written

The second tool, Clarity, looks less technical at first, but it may be just as important for governance. Microsoft describes it as a structured sounding board that helps teams work out whether they are building the right thing before they write code.

Clarity can run as a desktop app, a web UI or embedded in a coding agent. It guides teams through problem clarification, solution exploration, failure analysis and decision tracking. The output is written into a .clarity-protocol/ directory in the repository as plain Markdown files. Those files can be read, changed, committed and reviewed in pull requests like source code.

That makes architecture and risk decisions traceable. Why did the agent get access to this tool? Which alternatives were ruled out? Which failure modes were considered? Who accepted the assumption? Six months later, the team can read the reasoning instead of reconstructing it from Slack threads and old tickets.

Microsoft also says Clarity uses multiple AI “thinkers” to examine the system from different angles, including security, human factors, adversarial scenarios and operations. That does not replace accountable humans. But it can force questions that usually arrive too late: what happens when two requirements collide, when an integration is abused, or when an agent receives more authority than the product team actually intended?

The leadership consequence

This is not just a Microsoft tooling story. It points to a broader shift: agent safety is becoming part of software engineering.

For companies testing AI agents in customer service, software development, finance, case handling or internal operations, the bar should rise now. Ask vendors and internal teams four questions:

Are there repeatable safety tests for the agent’s tool use?
Are red-team findings converted into regression tests?
Are the agent’s design decisions, permissions and assumptions documented in the repository or governance system?
Can an AI incident be reproduced, analysed and verified after a fix?

If the answer is no, the project is still closer to a demo than to operations.

RAMPART and Clarity do not solve agent risk on their own. They are open tools, not a complete governance model. But they set the right standard: AI safety has to live in the same workflow as code, releases, tests and incident response.

That is where the real work starts. Not in the next AI-strategy slide deck.

Sources and media

Primary source: Microsoft Security Blog, “Introducing RAMPART and Clarity: Open source tools to bring safety into Agent development workflow”, published May 20, 2026 at 15:00 UTC: https://www.microsoft.com/en-us/security/blog/2026/05/20/introducing-rampart-and-clarity-open-source-tools-to-bring-safety-into-agent-development-workflow/
Microsoft RAMPART on GitHub: https://github.com/microsoft/RAMPART
Microsoft Clarity on GitHub: https://github.com/microsoft/clarity-agent/
Microsoft PyRIT on GitHub: https://github.com/microsoft/PyRIT
Thumbnail: OpenAI Image 2 / hogby.ai.

📬 Likte du denne?

AI-nyheter for ledere. Kuratert av en CIO som bygger det selv. Daglig i innboksen.

Relaterte saker

Anthropic gjør Claude Opus 5 til ny toppmodell for agentarbeid

Breaking

AI-modellerAnthropicClaude

Anthropic gjør Claude Opus 5 til ny toppmodell for agentarbeid

Claude Opus 5 flytter Anthropic-kampen fra ren intelligens til styrbar kost, fart og sikkerhet i agentarbeid. Det er en tydelig CIO-sak, ikke bare en modellnyhet.

24. juli 20265 min lesing

Anthropic

Åpne saken

CIOCISOCTO

GitHub ruller Claude Opus 5 inn i Copilot for agentisk koding

Claude Opus 5 er tilgjengelig i GitHub Copilot for Pro+, Max, Business og Enterprise. GitHub fremhever agentiske kodeflyter, egenverifisering og strengere cyber-sperrer. For IT-ledere blir modellvalg i Copilot et spørsmål om styring, kostnad og sikkerhet – ikke bare autocomplete.

24. juli 20265 min lesing

GitHub

Åpne saken

AI-modellerGoogle AIGemini

Google gjør Gemini Flash raskere for agentarbeid

Google lanserer Gemini 3.6 Flash og 3.5 Flash-Lite med tydeligere fokus på hastighet, token-effektivitet og produksjonsklare AI-agenter.

24. juli 20264 min lesing

Google AI

Åpne saken