releaseMarch 12, 2026

Google releases Gemini Embedding 2 preview with one vector space for text, image, video, audio, and PDFs

Google launched Gemini Embedding 2 in preview, unifying multiple modalities and 100+ languages in one embedding space with flexible output dimensions. Try it to simplify cross-modal RAG and search pipelines, but compare it with late-interaction systems before committing.

Gemini Multimodal RAG Search

3 min read

Google releases Gemini Embedding 2 preview with one vector space for text, image, video, audio, and PDFs

TL;DR

Google's launch summary says Gemini Embedding 2 is now in public preview as its first natively multimodal embedding model, putting text, images, video, audio, and documents into one semantic space.
According to Weaviate overview, the model supports 100+ languages, up to 8,192 input tokens, and output dimensions from 128 to 3,072, with availability through the Gemini API and Vertex AI.
The practical change is simpler cross-modal retrieval: Weaviate's integration post says one model can power search and RAG across mixed media instead of separate embedding pipelines.
Practitioner reaction is already centering on enterprise context layers and agent memory, where a retrieval thread argues multimodal embeddings make non-text artifacts like meeting audio, images, and PDF pages retrievable.