releaseMarch 10, 2026

Gemini Embedding 2 enters preview with 8,192-token multimodal vectors and 3,072-dim outputs

Google put Gemini Embedding 2 into public preview with one vector space for text, images, video, audio, and PDFs, plus 3072, 1536, and 768 output sizes. Use it to replace multi-model retrieval pipelines with one API for RAG and cross-media search.

Gemini Multimodal RAG Search

4 min read

Gemini Embedding 2 enters preview with 8,192-token multimodal vectors and 3,072-dim outputs

TL;DR

Google's preview announcement puts Gemini Embedding 2 into public preview through the Gemini API and Vertex AI as its first “fully multimodal embedding model,” with one vector space for text, images, video, audio, and PDFs.API docs
The new model's feature list expands the request envelope to 8,192 text tokens, up to 6 images, 120-second videos, native audio embeddings, and PDFs up to 6 pages, while Matryoshka Representation Learning exposes 3,072-, 1,536-, and 768-dimension outputs.
Google's benchmark table positions Gemini Embedding 2 as state of the art on several multimodal retrieval tasks, including 97.4 on Image-Text TextCaps recall@1, 93.4 on Text-Image Docci recall@1, and 84.0 on MTEB Code mean task score.
Early builder reactions in