Gemini Embedding 2 previews 120s video, 8192 tokens – unified multimodal RAG

Stay in the loop

Free daily newsletter & Telegram daily report

Executive Summary

Google previewed Gemini Embedding 2 as its first fully multimodal embedding model; one vector space spans text, images, video, audio, and PDFs via Gemini API and Vertex AI; stated limits include 120s MP4/MOV, 8192 text tokens, 6 images, and 6-page PDFs, with Matryoshka output sizes (3072/1536/768) for storage-vs-recall tradeoffs. Google’s own table cites MTEB multilingual 69.9 vs 68.4 (prior Google), and TextCaps recall@1 89.6/97.4 (text→image / image→text); competitive rows vs Nova/Voyage show mixed doc/video metrics, but no third-party cost or SLA details in the threads.

• Workspace with Gemini: Gemini lands deeper in Docs/Sheets/Slides/Drive for G1 Pro/Ultra; AI Overviews, editable AI-made slides, and explicit grounding sources move “ideation → deck” into core surfaces.
• Hugging Face Storage Buckets: mutable, S3-like Hub storage for checkpoints/logs; positioned as the first new repo type in 4 years; backed by Xet dedup for chunked reuse.
• Adobe web stack: Photoshop web ships AI Assistant + AI Markup (beta) to generate real layer stacks from chat; Firefly Boards adds Topaz Astra upscaling to 1080p/4K.

Net effect: multimodal retrieval, doc-grounded creation, and artifact storage are being productized end-to-end; pricing, reproducible evals, and long-horizon reliability remain the missing pieces.

While you're reading this, something just shipped.

New models, tools, and workflows drop daily. The creators who win are the ones who know first.

Last week: 47 releases tracked · 12 breaking changes flagged · 3 pricing drops caught

Gemini expands into creator workflows: fully multimodal embeddings + AI-native Docs/Drive

Gemini now connects the whole creative stack: Embedding 2 puts text+image+video+audio+docs in one vector space, while Gemini inside Docs/Drive makes AI-native writing and slide creation a default workflow.

High-volume Google updates today: Gemini Embedding 2 (unified embeddings across text/image/video/audio/docs) plus a new Gemini-powered Google Docs/Sheets/Slides/Drive experience. This is the day’s main “platform shift” story for creatives building multimodal search, RAG, and doc-based ideation.

Jump to Gemini expands into creator workflows: fully multimodal embeddings + AI-native Docs/Drive topics

🧠 Gemini expands into creator workflows: fully multimodal embeddings + AI-native Docs/Drive

Gemini Embedding 2 ships in preview with fully multimodal embeddings and adjustable vector sizes

Gemini Embedding 2 (Google): Google previewed Gemini Embedding 2 as its first fully multimodal embedding model—mapping text, images, video, audio, and documents into one embedding space—announced in the launch post and echoed by the build callout as available via Gemini API and Vertex AI.

• What it unlocks for creatives: Cross-media retrieval without format “bridges” (no forced ASR, no forced captioning) shows up as a core promise in the capability breakdown, including interleaved inputs (text+image together), direct audio embedding, and multimodal RAG patterns.
• Hard limits & knobs called out in threads: The capability breakdown lists up to 120s MP4/MOV video inputs, up to 6 images per request, up to 8192 tokens of text, and up to 6-page PDFs, plus Matryoshka Representation Learning output sizes (3072/1536/768) for storage/speed tradeoffs.
• Benchmarks people are citing today: The comparison table in the launch post frames Gemini Embedding 2 as ahead of Google’s legacy text and multimodal embedding models on multiple axes (e.g., MTEB multilingual 69.9 vs 68.4, TextCaps recall@1 89.6/97.4 for text-image/image-text), while also showing competitive context against Amazon Nova and Voyage on some doc/video metrics.

Pricing specifics and production SLAs aren’t in these tweets; what’s concrete today is the preview availability plus the input limits and the “single space” multimodal promise.

Gemini Embedding 2 previews 120s video, 8192 tokens – unified multimodal RAG

Executive Summary

While you're reading this, something just shipped.

Top links today

Gemini expands into creator workflows: fully multimodal embeddings + AI-native Docs/Drive

Table of Contents

🧠 Gemini expands into creator workflows: fully multimodal embeddings + AI-native Docs/Drive

Gemini Embedding 2 ships in preview with fully multimodal embeddings and adjustable vector sizes

Gemini lands inside Google Docs/Drive with AI Overviews, editable AI slides, and new grounding sources

🎬 Video generation in practice: lip-sync ads, POV formats, and Kling 3.0 craft tests

Freepik launches Speak for lip-synced talking videos from a still + script/audio

Freepik POV pipeline: still-by-still continuity with Nano Banana 2 + Kling 3.0

Freepik Speak workflow: turn one product or model photo into multilingual talking ads

Kling 3.0 horror micro-scene: under-the-bed reveal as a jump-scare template

Kling 3.0 prompt recipe: underground boxing with tight circling camera and sweat hits

Kling 3.0 spaceship shots: creators keep spotlighting sci-fi vehicle fidelity

DreamLabLA posts “Map Adventure,” a short narrative made with Luma

Freepik Spaces: talk-video workflows are usable now; Speak is “soon”

Luma shares a brand film featuring its DreamLab creative research team

Audio-to-video hype clip resurfaces as creators push “sound drives scenes”

🧩 End-to-end creator pipelines: agents, nodes→apps, and ‘no camera’ production recipes

A $15 “no camera” real-estate renovation video workflow built from listing photos

ComfyUI ships App Mode to package node graphs as a clean app UI

Hedra Agent is being used as a reference-driven fashion campaign runner

STAGES AI Connect shows an OpenClaw-to-Blender “one shot” generation loop

A Claude-based workflow for Runway Characters knowledge bases (KB .txt upload)

Meshy lands in MakerWorld MakerLab for image-to-3D printing workflows

TapNow pitches a node-based end-to-end creation engine for short films and ads

Redfin listings surface an AI “Redesign” button for instant restyling

🧪 Copy‑paste prompts & aesthetics: SREFs, macro templates, and production-ready recipes

Midjourney SREF 2509259129 for European retro-futurist sci‑fi bande dessinée

Midjourney SREF 3438423518 for dark-fantasy seinen manga key art

Copy-paste prompt: brand-coded “liquid glass outfit” fashion editorial

Copy-paste prompt: giant fruit as a miniature alpine ski resort (luxury product photo)

Extreme Macro Close Up prompt template for soothing, high-detail product textures

Midjourney SREF 3912530269 for classical engraving sculpture studies

Promptsref keeps spotlighting “retro dreamy soft-focus” as a top Midjourney SREF

🖼️ Image generation for production: redesign buttons, consistency checks, and repeatable formats

Redfin listings get a one-click “Redesign” restyle tool

Firefly + Nano Banana 2 Hidden Objects hits Levels .062–.064 and adds watermarking

Krea Edit is being used for rapid multi-model look comparisons from one frame

A simple 2D vs 3D A/B helps pick a lookdev direction early

Train-window “Window Seat” scenes are a compact way to pitch place and atmosphere

Nano Banana Pro macro recreation is being used as a realism check against real photos

✨ Finishing & upscaling: Firefly Boards gets a 4K upscaler

Firefly Boards adds Topaz Astra video upscaling to 1080p and 4K

🛠️ Single-tool upgrades creators can use today (Photoshop web, markups, and assistants)

Photoshop web AI Assistant beta generates layered edits from a single prompt

Photoshop web AI Markup beta lets you point at regions with drawings

Photoshop web to Firefly Boards sync turns edits into a connected workflow

🧊 3D assets & printing: image→3D in MakerWorld and GDC-ready pipelines

Meshy integration brings image-to-3D straight into MakerWorld’s MakerLab

Meshy previews an AI-native game at GDC 2026 and schedules a CEO talk

Meshy’s GDC booth spotlights AI-generated prints and a 3D printer giveaway

Mech unit design sheets as a blueprint for printable collectibles

Autodesk Flow Studio runs a Wonder 3D workshop during GDC week

📈 Market signals: consumer AI divergence + big bets on agents and labs

a16z says the “default AI” race is on as ChatGPT, Gemini, and Claude diverge

Meta acquires Moltbook, the social network built for AI agents

Yann LeCun announces AMI Labs with a reported $1.03B seed round

🗣️ Voice for creators: Adobe speech generation + fast open-source TTS

Adobe adds Generate speech (beta) with ElevenLabs voices and timing controls

Fish Audio S2 open-source TTS claims sub-150ms latency and one-pass multi-speaker

📣 AI marketing creative: copy→design, visual metaphors that sell, and campaign automation

Veeso is pitching “paste copy → editable designs” as a Canva alternative

A “scrapbook moodboard” Nano Banana prompt turns one brand into many assets

Hedra Agent is being used to turn references into a full fashion campaign set

Redfin listings surface a “Redesign” button for instant style restaging

Wellness ads are personifying “skin problems” as on-body cartoon characters

A copy-paste prompt turns branded outfits into “liquid glass” fashion shots

An AI-made “home shopping” segment shows the infomercial format spreading

🧑‍💻 Builder tools for creators: desktop agents, in-box AI writing, and agentic backends

Clico puts an AI assistant inside every browser text box

EasyClaw runs a local “desktop agent” that clicks and types across apps

InsForge 2.0 adds MCPMark claims for agent-native backend automation

OpenRAG packages doc ingestion + semantic search + chat into one install

Clico’s “@ memory” pattern: carry your context across sites while drafting

OpenAI’s Symphony repo turns project board items into isolated agent runs

🗄️ Creator infra: Hugging Face ships mutable ‘Storage Buckets’ for ML artifacts

Hugging Face adds Storage Buckets for mutable ML artifacts (cheaper, faster than Git)