Google Launches Gemini Embedding 2: One Vector Model for Text, Image, Video, and Audio
Google DeepMind has released Gemini Embedding 2 in public preview — the company's first natively multimodal embedding model that maps text, images, video, audio, and documents into a single shared vector space.
The model launched on March 10, 2026 and is available through the Gemini API and Vertex AI. Built on the Gemini architecture, it is designed to simplify complex data pipelines where different modalities have traditionally required separate embedding models.
Gemini Embedding 2 supports text inputs up to 8,192 tokens, up to six images per request in PNG and JPEG formats, up to 120 seconds of video in MP4 and MOV, native audio ingestion without transcription, and PDFs up to six pages in length directly.
The model incorporates Matryoshka Representation Learning, enabling flexible output dimensions with a default of 3,072. This allows developers to tune vector dimensions to match the requirements of different storage backends and search systems.
For developers building RAG systems, semantic search, sentiment analysis, and clustering over multimodal datasets, this is a meaningful simplification. Rather than wiring together separate text and image embedders, a single model handles the entire pipeline.
For enterprises managing product catalogs with images, technical specifications, and certification documents, a model like this opens the door to search and recommendation systems that understand the relationship between a product image and its associated text documentation in a single query.
📬 Likte du denne?
AI-nyheter for ledere. Kuratert av en CIO som bygger det selv. Daglig i innboksen.