Gemini 3 Flash hits 1M context at $0.50 – new default fast brain

Stay in the loop

Free daily newsletter & Telegram daily report

Executive Summary

Google’s Gemini 3 Flash is the first “fast” model in a while that actually feels frontier-class. You get a 1M-token context window, 64K outputs, and full multimodality at $0.50 per 1M input tokens and $3 per 1M output, with dynamic “thinking” levels baked into that price. Google is confident enough to swap it in as the default Fast and Thinking brain in the Gemini app and Search, while context caching chops repeated-prefix cost to 10% and batch jobs can see up to 50% savings.

The ecosystem read the memo instantly. Cursor, Cline, Warp, Zed, Antigravity, Perplexity, OpenRouter, Ollama and a slew of agent stacks all wired 3 Flash in as the new quick-but-smart default for coding, repo planning, and search-heavy workflows. Benchmarks back the shift: 78% on SWE-bench Verified, state-of-the-art on MMMU-Pro, a 71 score on the Artificial Analysis Index, and around 6× more successful BrowserUse web tasks per dollar than Claude Sonnet. Box AI saw document-extraction recall jump from 74% to 84% by swapping out 2.5 Flash.

There’s a catch: hallucination rates north of 90%, “lazy” short answers without prompt pressure, and day-one jailbreaks leaking MDMA recipes and runaway chain-of-thought. Treat Gemini 3 Flash as your new default workhorse—but only behind strong verification, safety filters, and house prompts that force it to think before it talks.

Feature: Gemini 3 Flash rollout and positioning

Gemini 3 Flash lands at $0.50 in/$3 out with 3× speed vs 2.5 Pro, GPQA 90.4%, SWE‑Bench Verified 78%, MMMU‑Pro 81.2%, and dynamic thinking; available across API, Vertex, AI Studio, Antigravity, and major dev tools.

The day’s cross‑account story. Google’s Gemini 3 Flash ships broadly with frontier‑class capability at speed/price points, and immediate ecosystem pickup. Mostly product, pricing, and distribution details in the tweets sample.

Jump to Feature: Gemini 3 Flash rollout and positioning topics

⚡ Feature: Gemini 3 Flash rollout and positioning

Google launches Gemini 3 Flash as frontier‑class fast model at $0.50 / $3

Google formally unveiled Gemini 3 Flash, a multimodal "frontier intelligence" model optimized for low‑latency inference that undercuts most high‑end models on price at $0.50 per 1M input tokens and $3 per 1M output tokens, with audio input billed at $1 per 1M tokens. The model is positioned as the workhorse sibling to Gemini 3 Pro: same family, but tuned for speed, tool calling and coding rather than maximum peak scores. (pricing and positioning, deepmind overview)

Gemini 3 Flash is fully multimodal (text, images, video, audio) and ships with a 1M‑token context window and 64K output limit in the API, making it viable for long docs, codebases and video analysis. The launch deck highlights strong reasoning and knowledge scores like 90.4% on GPQA Diamond and 33.7% / 43.5% on Humanity’s Last Exam with and without tools, plus 81.2% on MMMU‑Pro and 78% on SWE‑bench Verified, all while being marketed as cheaper and faster than Gemini 2.5 Pro. (benchmarks slide, detailed benchmark table) Dynamic "thinking" is a core part of the pitch: Gemini 3 Flash can adjust internal deliberation by effort level (minimal/low/medium/high), with those extra "thinking tokens" included in the $3/1M output rate, so builders don’t need a separate SKU for slow vs fast reasoning. Pricing cards and docs also surface platform‑level features like context caching (90% cost reduction on repeated prefixes) and Batch API (up to 50% savings for offline workloads), which matter a lot for teams running agents or high‑volume backends. pricing and savings

On day one, Gemini 3 Flash is available through the Gemini API, Google AI Studio, the Gemini CLI, and Vertex AI’s gemini-3-flash-preview endpoint, with the same tool calling, JSON mode and structured outputs interfaces as Gemini 3 Pro. The developer blog leans into the idea that you can "build with frontier intelligence that scales with you" and treat Flash as the default general‑purpose model unless you explicitly need Pro‑level peaks. (developer blog, vertex model card)

Gemini 3 Flash hits 1M context at $0.50 – new default fast brain

Executive Summary

Top links today

Feature: Gemini 3 Flash rollout and positioning

Table of Contents

⚡ Feature: Gemini 3 Flash rollout and positioning

Google launches Gemini 3 Flash as frontier‑class fast model at $0.50 / $3

Ecosystem rushes to adopt Gemini 3 Flash for coding, agents and search

Gemini 3 Flash becomes the new default "Fast" brain in Gemini app and Search

Builders see Gemini 3 Flash as a new default—while warning about laziness and hallucinations

Launch benchmarks put Gemini 3 Flash near Pro on GPQA, MMMU and SWE‑bench

Gemini 3 Flash emphasizes token efficiency, caching and a firmer $0.50 / $3 price

📊 Frontier eval race: independent scores and cost curves

Gemini 3 Flash posts strong ARC‑AGI scores at far lower cost than GPT‑5.2

Gemini 3 Flash ranks #3 on AA Intelligence Index and best value at its tier

BrowserUse: Gemini 3 Flash delivers ~6× more successful web tasks per dollar

MRCR long‑context tests show Gemini 3 Flash overtaking 3 Pro at 1M tokens

LisanBench: Gemini 3 Flash trades raw score for higher token usage and lower validity

Vending‑Bench 2: Gemini 3 Flash lags Opus and 3 Pro but beats other small models

🧰 Agent stacks and coding workflows in practice

Gemini 3 Flash rapidly becomes the default model in coding IDEs and agent stacks

Box AI sees double‑digit extraction gains after swapping to Gemini 3 Flash

Claude Code rewrites terminal renderer, cutting flicker by ~85% with cell diffing

LangSmith deepens agent tooling with tracing CLI, pairwise comparisons, and telco case study

Enterprise data‑extraction agents show 3 Flash can win on messy, multi‑field documents

Oh My OpenCode plugin turns OpenCode into a multi‑agent coding harness

Warp terminal introduces auto‑approve mode for agent commands and diffs

Zed adds dev containers plus Gemini 3 Flash support to tighten cloud dev loops

Notte’s Agent Mode executes tasks first, then synthesizes maintainable code from the trace

RepoPrompt pushes disciplined research→plan→execute flow for repo‑scale agents

🚀 Serving and runtime efficiency

vLLM squeezes up to 33% more throughput from NVIDIA Blackwell GPUs

LMSYS releases mini‑SGLang, a 5k‑LOC high‑performance inference server

Pipecat 0.0.98 adds thought‑signature handling and uninterruptible frames for LLM audio agents

Ollama Cloud adds gemini‑3‑flash‑preview:cloud for quick API testing

🎙️ Realtime voice agents, latency and pricing

xAI launches Grok Voice Agent API with flat $0.05/min pricing

LiveKit adds Grok Voice Agent plugin for real-time speech-to-speech apps

Grok Voice Agent powers Reachy Mini robot demo after one-hour port

🏗️ Compute buildouts, power and chip supply signals

OpenAI ties 1.9 GW compute plan to Wisconsin data center buildout

Reuters: China’s classified EUV team has a prototype scanner under test

Epoch argues US grid can supply ~100 GW for AI by 2030 if hyperscalers pay up

Oracle quietly books $248B in AI‑related data center and cloud leases

SMIC’s N+3 node claims 5 nm‑class volume without EUV, leaning on DUV multi‑patterning

SEMI sees wafer fab equipment rising to ~$156B by 2027 on AI demand

TSMC 2026 forecast puts Nvidia at 20% of fab revenue, ahead of Apple

🏢 Enterprise adoption and platform distribution

Google Labs pilots 'CC' AI productivity agent inside Gmail and Workspace

Perplexity launches iPad app tuned for Stage Manager and deep work

Cofounder AI chief of staff ties into Notion, Slack, Gmail and GitHub

ElevenLabs Agents add WhatsApp as a new customer channel

Fastweb and Vodafone scale 'Super TOBi' telco agent with LangSmith

Lovable Connectors let AI-built apps call Perplexity, ElevenLabs, Firecrawl, Miro

DoorDash ships standalone AI app to help you choose restaurants

NotebookLM rolls chat history sync to all users on web and mobile

Notte’s Agent Mode turns natural-language tasks into executable code

Sentinel disaster-response agent wins ElevenLabs Worldwide Hackathon

🎬 Creative media, 3D assets and world models

Tencent details HY World 1.5 real-time world model architecture and training

TRELLIS.2 hits fal as high-res image-to-3D textured mesh pipeline

ComfyUI integrates Manager UI and teases a Simple Mode for big graphs

ElevenLabs Image & Video adopts GPT-Image-1.5 for faster, sharper edits

Flowith turns GPT-Image-1.5 vs Nano Banana into a one-click compare lab

TurboDiffusion claims 100–205× speedups for video diffusion models

Sparse-LaViDa explores sparse multimodal discrete diffusion language models

Video Reality Test benchmark pits ASMR gen videos against VLMs and humans

YouTube Create iOS app brings Veo3-powered video generation to mobile editing

Hailuo offers free GPT-Image-1.5 and Nano Banana Pro for creatives

🧪 Training, decoding and agent learning methods

Jacobi Forcing trains AR LLMs as fast causal parallel decoders

Microsoft open-sources Agent Lightning to add RL to existing agents

“Think Visually, Reason Textually” boosts ARC-AGI by pairing vision and text

Error-Free Linear Attention (EFLA) gives exact continuous-time updates

🛡️ Guardrails, jailbreak climate and platform policy

Gemini 3 Flash already jailbroken into detailed MDMA and malware outputs

X ToS now explicitly outlaws AI jailbreaking and prompt injection

On this page