Microsoft launches in-house MAI models in a direct challenge to OpenAI and Google
Microsoft has launched three in-house foundation models under the MAI brand: MAI-Transcribe-1 for speech-to-text, MAI-Voice-1 for voice generation, and MAI-Image-2 for image creation. The launch was first reported on April 2, 2026, and the models are immediately available through Microsoft Foundry and MAI Playground.
The real story is not just that Microsoft shipped new models. It is that the company is now signaling a much stronger push to own more of the model layer itself, rather than only distributing OpenAI technology through Azure and Copilot. According to Microsoft's published benchmarks, MAI-Transcribe-1 outperforms Whisper-large-v3 on the FLEURS benchmark across 25 languages, while MAI-Image-2 is rolling into Bing and PowerPoint.
For CIOs, this matters beyond product news. Microsoft is building a broader vertically integrated AI stack across speech, voice, and image generation, tightly connected to enterprise distribution and productivity workflows. That could translate into lower serving costs, more strategic control, and less dependence on partner roadmaps.
If your organization is betting on Azure AI, Copilot, or Microsoft’s broader enterprise stack, this is a launch worth tracking closely through the rest of 2026.
📬 Likte du denne?
AI-nyheter for ledere. Kuratert av en CIO som bygger det selv. Daglig i innboksen.