Mistral Small 4: One Model Replaces Three — Open, Multimodal, and Fast
Mistral AI has released Mistral Small 4, a 119-billion-parameter model with a Mixture-of-Experts architecture that consolidates the capabilities of three previously separate specialist models into a single package.
Released under the Apache 2.0 license, it represents a significant step forward for open-weight language models. The model combines the reasoning capabilities of Magistral, image understanding from Pixtral, and agentic coding from Devstral. Developers no longer need to maintain separate models for different tasks.
Despite 119 billion total parameters, only around 6 billion activate per token thanks to the MoE architecture with 128 experts. This makes inference highly efficient. Compared to Mistral Small 3, it delivers 40% faster end-to-end completion and handles three times more requests per second in throughput-optimized deployments.
The context window reaches 256,000 tokens, enabling analysis of long documents and extended conversation threads. The model natively supports text and image inputs.
Mistral Small 4 is accessible via Mistral's own API, Hugging Face, Ollama, and vLLM. It's also available on NVIDIA Build, and Mistral has joined the NVIDIA Nemotron Coalition to advance open frontier models.
Benchmarks suggest Mistral Small 4 is competitive with Claude Haiku 3.5 and Qwen 2.5 on coding and math tasks, with notably shorter and more efficient outputs. This makes it an attractive option for enterprises looking for high-capacity, low-latency local or API-based AI deployments.
📬 Likte du denne?
AI-nyheter for ledere. Kuratert av en CIO som bygger det selv. Daglig i innboksen.