Microsoft Lets GPT and Claude Fact-Check Each Other in New Copilot Cowork
Microsoft today launched Copilot Cowork, a new Microsoft 365 capability that lets AI agents handle long-running, multi-step tasks autonomously. The headline feature is the "Critique" layer: OpenAI's GPT drafts a response, then Anthropic's Claude reviews it for accuracy and correct citations. Roles can be reversed, and a new "Model Council" feature lets users compare outputs from both models side by side.
The approach delivered a 13.8 percent improvement on the DRACO benchmark for the Researcher agent. Microsoft calls it a step toward more reliable AI by having rival models quality-check each other, reducing hallucinations.
Copilot Cowork is available through Microsoft's Frontier program for early access. Users describe their workflow, and the AI creates a plan and executes tasks across Word, Excel, Outlook, Teams, and SharePoint, while humans can monitor and course-correct along the way.
For CIOs, this means the multi-model strategy is now a reality in productivity tools. The question is whether this becomes the norm: AI systems fact-checking themselves using competitors' models.
📬 Likte du denne?
AI-nyheter for ledere. Kuratert av en CIO som bygger det selv. Daglig i innboksen.