Meta launches Llama 4: Open multimodal AI with 10 million token context window
Meta released Llama 4 Scout and Llama 4 Maverick on April 5th, two open-weight AI models built on a Mixture-of-Experts (MoE) architecture. These are Meta's first natively multimodal models capable of processing text, images, and video simultaneously.
Llama 4 Scout stands out with a 10-million-token context window, the longest available in any open-weight model. It features 17 billion active parameters across 16 experts and can run on a single NVIDIA H100 GPU with Int4 quantization. Llama 4 Maverick uses the same active parameter count but is trained with 128 experts totaling 400 billion parameters, targeting conversational and creative tasks.
Both models were pre-trained on over 30 trillion tokens, double the training data of Llama 3, covering 200 languages. They are available for download on Hugging Face and llama.com, and are already powering Meta AI in WhatsApp, Messenger, and Instagram.
Behind the scenes is Llama 4 Behemoth, a massive model with 288 billion active parameters and nearly 2 trillion total parameters, used as a teacher model for Scout and Maverick. Meta claims Behemoth already outperforms GPT-4.5, Claude Sonnet, and Gemini 2.0 Pro on STEM benchmarks.
For enterprises and developers, this is especially significant: open models with long context and multimodal support enable advanced internally-hosted AI solutions without cloud provider lock-in.
📬 Likte du denne?
AI-nyheter for ledere. Kuratert av en CIO som bygger det selv. Daglig i innboksen.