Google launched Gemini Embedding 2 in preview, unifying multiple modalities and 100+ languages in one embedding space with flexible output dimensions. Try it to simplify cross-modal RAG and search pipelines, but compare it with late-interaction systems before committing.

Google positioned Gemini Embedding 2 as a single embedding model for text, images, video, audio, and PDFs, exposed in preview through the Gemini API and Vertex AI via the launch post. Weaviate's integration post adds the implementation details engineers will care about: support for 100+ languages, an 8,192-token max input, and configurable output sizes from 128 to 3,072 dimensions.
The ecosystem angle matters because this is not just a model card drop. Weaviate said it already works with its existing Google integration, and the attached
shows multi2vec_google_gemini targeting gemini-embedding-2-preview for multimodal collections. That makes the launch immediately relevant to teams already running vector search infrastructure rather than treating it as a future-only capability.
The main engineering claim is pipeline reduction. Google's demo post frames the model as search "across all your media at once," which means cross-modal lookup without separate text, image, audio, and video embedding stages. In practice, that should simplify multimodal RAG and recommendation systems that need to retrieve a concept from one format and return matches from another, a use case also echoed in the OpenClaw note about semantically storing images, videos, audio, and docs for agents.
That does not eliminate the retrieval design tradeoff. In the practitioner thread, Jo Kristian Bergum argues there is still "no single silver bullet" for agent retrieval and says embeddings matter because much of the context engineers want to feed agents "isn't represented in text." His examples—meeting notes, audio, images, and PDF-page images—line up with the exact artifact mix Gemini Embedding 2 targets. The likely near-term use is not replacing every retrieval stack, but expanding what can enter the same retrievable context layer with fewer modality-specific workarounds.
Google now lets Gemini chain built-in tools like Search, Maps, File Search, and URL Context with custom functions inside a single API call. This removes orchestration glue for agent builders and brings Maps grounding into AI Studio for faster prototyping.
releaseOpenClaw shipped version 2026.3.22 with ClawHub, OpenShell plus SSH sandboxes, side-question flows, and more search and model options, then followed with a 2026.3.23 patch. Teams get a broader plugin surface, but should patch quickly and review plugin trust boundaries as the ecosystem grows.
releaseCursor shipped Instant Grep, a local regex index built from n-grams, inverted indexes, and Bloom filters that drops large-repo searches from seconds to milliseconds. Faster candidate retrieval shortens the coding-agent loop, especially when ripgrep-style scans become the bottleneck.
breakingChatGPT now saves uploaded and generated files into an account-level Library that can be reused across conversations from the web sidebar or recent-files picker. It removes repetitive re-uploading and makes past PDFs, spreadsheets, and images part of a persistent working context.
breakingEpoch AI says GPT-5.4 Pro elicited a publishable solution to one 2019 conjecture in its FrontierMath Open Problems set, with a formal writeup planned. Treat it as an early milestone worth reproducing, not blanket evidence that frontier models can already automate math research.
Many of you know me as the bm25 guy but I'm afraid there is no single silver bullet for solving retrieval for agents. You also need to make the data retrievable in the first place. Yes, agents are great at formulating queries and they are relentless so classic issues like Show more
🤖 From this week's issue: Google launches Gemini Embedding 2, its first natively multimodal embedding model unifying text, images, video, audio, and documents into a single semantic space. blog.google/innovation-and…
The era of juggling 5 different embedding models is over. Google just unified text, images, video, audio, and PDFs into one vector space. 𝗢𝗻𝗲 𝗺𝗼𝗱𝗲𝗹, 𝗺𝘂𝗹𝘁𝗶𝗽𝗹𝗲 𝗺𝗼𝗱𝗮𝗹𝗶𝘁𝗶𝗲𝘀: Text, images, video, audio, and PDFs all mapped into a single unified vector Show more
try it out here: aistudio.google.com/apps/bundled/m…