Google Gemini 3 Pro launches with 1M context – free Antigravity IDE ships agents
Executive Summary
Google stopped teasing and actually flipped the switch: Gemini 3 Pro is now the flagship model across the Gemini app, AI Mode in Search, AI Studio, and Vertex AI. It brings a 1M‑token context window, full multimodal inputs, and a new “Thinking” mode that leans into agentic reasoning. In the consumer app, Gemini 3 becomes the default brain behind a redesigned UI plus experimental “visual layout” and “dynamic view” responses, and an Agent mode that can plan multi‑step tasks like travel booking or inbox triage under your supervision.
For builders, gemini-3-pro-preview lands in AI Studio and Vertex at $2/M input and $12/M output tokens up to 200k tokens, doubling to $4/M and $18/M above that. You also get knobs for thinking_level, per‑part media_resolution, structured outputs, and auditable “thought signatures,” so you can actually tune how much reasoning burn you’re willing to pay for.
The sleeper story is Google Antigravity: a free, Gemini 3‑powered agentic IDE where bots can drive the editor, terminal, and a browser sub‑agent, with unlimited tab completions and support for Claude Sonnet 4.5 and GPT‑OSS too. Cursor subscriptions suddenly have more explaining to do. A slower, higher‑compute Gemini 3 Deep Think mode is queued up next, hinting at a future where you dial reasoning per request instead of waiting for Gemini 4.
Feature Spotlight
Feature: Google’s Gemini 3 Pro and Antigravity IDE arrive
Google releases Gemini 3 Pro and Antigravity IDE: 1M ctx, agentic coding across editor/terminal/browser, and $2/$12 per‑M token pricing—setting a new product and go‑to‑market bar for frontier models.
Today’s dominant story: Google ships Gemini 3 Pro and the agent‑first Antigravity IDE. Broad product availability, pricing, and a new agentic coding surface drove most traffic; tweets also tease Deep Think mode. This section covers the launch itself; downstream evals, integrations, and enterprise impact are excluded and covered separately.
Jump to Feature: Google’s Gemini 3 Pro and Antigravity IDE arrive topicsTable of Contents
Stay in the loop
Get the Daily AI Primer delivered straight to your inbox. One email per day, unsubscribe anytime.
Feature: Google’s Gemini 3 Pro and Antigravity IDE arrive
Today’s dominant story: Google ships Gemini 3 Pro and the agent‑first Antigravity IDE. Broad product availability, pricing, and a new agentic coding surface drove most traffic; tweets also tease Deep Think mode. This section covers the launch itself; downstream evals, integrations, and enterprise impact are excluded and covered separately.
Gemini 3 Pro rolls out broadly across Google’s ecosystem
Google has officially launched Gemini 3 Pro as its new flagship model, turning on access in the Gemini app, AI Mode in Google Search, Google AI Studio and Vertex AI, after weeks of hints about an imminent release release window. It’s positioned as Google’s “most intelligent model” with 1M‑token context, multimodal inputs (text, images, audio, video) and a focus on reasoning and agentic coding, and is already live for many users who select the new “Thinking” mode in the Gemini web/app UI Gemini 3 thread thinking mode rollout. For builders, this means the same core model now powers consumer chat, search answers and cloud APIs, which should make behavior more consistent across UX surfaces and reduce the need to juggle different Gemini families launch overview.
Antigravity debuts as Google’s free agentic IDE powered by Gemini 3 Pro
Alongside the model, Google DeepMind shipped Google Antigravity, a VS Code‑style IDE that bakes in agents which can operate the editor, terminal and a browser sub‑agent to test apps end‑to‑end antigravity announcement antigravity blog post. The individual “public preview” plan is $0/month and, strikingly, includes an “agent model” with access to Gemini 3 Pro plus Claude Sonnet 4.5 and GPT‑OSS, unlimited tab completions and command requests, and “generous” rate limits, with team and enterprise tiers “coming soon” pricing screenshot. Antigravity runs on macOS, Windows and Linux and records artifacts like plans, screenshots and browser recordings, so for many developers it’s now a no‑cost way to try fully agentic coding workflows without paying Cursor‑level subscription prices download announcement.
Gemini 3 Pro Preview lands in AI Studio with 1M context and tiered pricing
On the developer side, Google lit up the gemini-3-pro-preview model in AI Studio and the Gemini API/Vertex AI, with a 1M‑token context window and new knobs for thinking_level, per‑part media_resolution, structured outputs and mandatory “thought signatures” aimed at keeping reasoning auditable developer feature thread developer guide. Pricing is set at $2/M input and $12/M output tokens for prompts up to 200k tokens, and $4/M in, $18/M out above that, with separate discounted cache read/write rates shown in the model card UI ai studio pricing card. For most app workloads, this makes Gemini 3 Pro slightly pricier than Gemini 2.5 Pro but competitive with other frontier models, and the huge context plus reasoning controls give teams a reason to spin up side‑by‑side evals in AI Studio or their own gateways right away pricing screenshot.
Gemini app gets Gemini 3 “Thinking” mode, generative layouts and Agent mode
The consumer Gemini app also got a significant upgrade: a redesigned interface with Gemini 3 as the default “Thinking” model, new “visual layout” and “dynamic view” experiments that generate bespoke result UIs, and a Gemini Agent mode that can execute multi‑step tasks like booking trips or organizing email under user supervision ui refresh thread app rollout. Agent mode is starting with Ultra subscribers in the US and will expand to Pro users, while US college students can now get a full year of the Gemini Pro plan for free, which includes expanded Gemini 3 Pro access, unlimited image uploads and 2 TB of storage student plan details. For builders, this means user expectations will quickly shift toward richer, app‑like responses and background task execution, so aligning your own agents’ UX with Gemini’s patterns will probably make onboarding smoother.
Google teases Gemini 3 Deep Think as a premium high‑reasoning mode
Google and early testers are also talking about “Gemini 3 Deep Think,” a new mode that runs the same family with much more compute per query to tackle particularly hard math, science and coding problems deep think chart thread reviewer impressions. Deep Think is not widely available yet—it’s being used in safety and eval programs and is slated to roll out to Google AI Ultra subscribers in the coming weeks—so for now most developers will be targeting standard Gemini 3 Pro while watching how Deep Think’s behavior, latency and pricing shake out deep think mention. The existence of this tiered reasoning mode is still important today though, because it hints at a future where you can explicitly trade cost and latency for more thorough thinking on a per‑request basis instead of waiting for an entirely new model family.
Leaderboards reshuffled by Gemini 3 (excludes launch)
Mostly third‑party and Google‑posted eval deltas: Gemini 3 variants move ARC‑AGI, HLE, Design/WebDev, AA‑Omniscience, and agent benches. Excludes the product launch (covered in Feature).
Gemini 3 Deep Think doubles ARC‑AGI‑2 SOTA at 45.1%
ARC‑AGI‑2, one of the nastiest visual reasoning benchmarks, just got blown open: Gemini 3 Deep Think hits 45.1% with tools on and Gemini 3 Pro scores 31.1% tools‑off, compared with GPT‑5.1’s 17.6% and Claude Sonnet 4.5’s 13.6% tools‑off results benchmark table. A separate ARC Prize plot shows Deep Think paying heavily in compute—about $77 per task vs Gemini 3 Pro’s ~$0.81 and GPT‑5 Pro’s ~$4.78—for that extra performance arc cost chart.
For engineers and researchers, this says two things at once: Gemini 3’s base policy is far more sample‑efficient at these puzzle‑style tasks than previous models, and test‑time compute buys real gains if you’re willing to spend. The cost/score curve also makes it easier to choose between Pro (for everyday use) and Deep Think (for small, crucial batches of hard problems).
Gemini 3 Deep Think leads Humanity’s Last Exam and GPQA
On Humanity’s Last Exam, a broad academic reasoning benchmark, Gemini 3 Deep Think scores 41% with tools off, edging out Gemini 3 Pro at 37.5% and beating GPT‑5 Pro (30.7%), GPT‑5.1 (26.5%), and Gemini 2.5 Pro (21.6%) by a wide margin benchmark table. The same chart shows Deep Think at 93.8% on GPQA Diamond and Gemini 3 Pro at 91.9%, ahead of GPT‑5.1’s 88.1% and Claude Sonnet 4.5’s 83.4% benchmark table.
For anyone building tools that lean on STEM or exam‑style reasoning, this is a clear signal that Gemini 3’s strongest variants are now the reference point. The gap over previous Gemini 2.5 and frontier models isn’t a rounding error; it’s the kind of delta that shows up in everyday problem‑solving, especially when you chain multiple reasoning steps together.
Gemini 3 Pro takes #1 on LMArena text leaderboard at 1501 Elo
On the community‑run LMArena text leaderboard, Gemini 3 Pro debuts at 1501 Elo, taking the #1 slot and pushing Grok 4.1’s thinking and non‑thinking variants down to the next ranks lmarena update. The arena team notes this score comes from pre‑release voting tagged as preliminary, but with over 3,000 votes it’s already a statistically meaningful reshuffle lmarena update.
If you’ve been using Grok 4.1, Claude 4.5, or GPT‑5.x as your subjective "feels best" default, this is a strong nudge to add Gemini 3 Pro into your rotation and see how its style, refusal patterns, and reasoning trade off for your own workloads. LMArena tends to correlate well with what power users feel day‑to‑day, so this move matters beyond the bragging rights.
Artificial Analysis Intelligence Index crowns Gemini 3 Pro as overall leader
Artificial Analysis’ Intelligence Index v3.0, which blends 10 heavy‑hitter evals (MMLU‑Pro, GPQA Diamond, HLE, LiveCodeBench, SciCode, AIME 2025, IFBench, AA‑LCR, Terminal‑Bench Hard, τ²‑Bench) now ranks Gemini 3 Pro Preview first with a score of 73, ahead of GPT‑5.1 (high) at 70 and GPT‑5 Codex (high) at 68 ai index summary. The index is designed so 0 means as many wrong as right, and Gemini 3 Pro’s margin shows up as clear wins on reasoning, math, and coding sub‑benchmarks.
If you don’t want to chase every individual benchmark, this is the cleanest single number so far saying "Gemini 3 Pro is the most capable general model right now" for knowledge + reasoning + code. It’s also one of the few cross‑vendor views that bakes in both standard tasks and agentic coding, which is closer to how people actually use these systems.
Gemini 3 Pro doubles prior SOTA on ScreenSpot‑Pro screen understanding
On ScreenSpot‑Pro, a benchmark for understanding rich application screenshots (think PhotoShop, CAD, complex UIs), Gemini 3 Pro clocks in at 72.7%, almost exactly double the prior best Claude Sonnet 4.5 score of 36.2% and far above Gemini 2.5 Pro’s 11.4% and GPT‑5.1’s 3.5% benchmark table.
This is a big deal for "computer use" agents that click around GUIs instead of hitting APIs. If these numbers hold up in the wild, Gemini‑based agents should be much better at reading dense toolbars, property panels, and viewport states and then taking the right action without brittle, app‑specific hacks.
Gemini 3 Pro jumps to #1 on WebDev Arena with 1487 Elo
In Code Arena’s WebDev leaderboard, Gemini 3 Pro posts an Elo of 1487, taking #1 with a jump of roughly +280 points over Gemini 2.5 Pro and edging out GPT‑5.1 variants and Claude Opus/Sonnet webdev rankings. That’s the biggest single‑model gain the maintainers have seen in this arena since launch.
For developers, this suggests that Gemini 3 Pro is especially strong at end‑to‑end web app tasks—HTML/CSS/JS/React wiring, asset handling, and small UX details—not just isolated code snippets. If you’re evaluating copilots for front‑end heavy teams, this is one of the few public, human‑voted signals that Gemini may actually ship cleaner first drafts than the usual OpenAI and Anthropic suspects.
Unified benchmark table shows Gemini 3 Pro ahead across math, vision, tools
A widely shared comparison table lays out Gemini 3 Pro vs Gemini 2.5 Pro, Claude Sonnet 4.5 and GPT‑5.1 across more than 20 benchmarks: 100% on AIME 2025 with code, 23.4% on MathArena Apex vs 1% for GPT‑5.1, 81% on MMMU‑Pro, 87.6% on Video‑MMMU, 2,439 Elo on LiveCodeBench Pro, and 85.4% on τ²‑Bench tool use, with only SWE‑Bench Verified narrowly going to Sonnet 4.5 at 77.2% vs Gemini 3 Pro’s 76.2% benchmark table.
The point is: across math, multimodal reasoning, code generation, and tool‑heavy agents, the center of gravity has shifted toward Gemini 3 Pro. If you’ve been optimizing routing or ensembles around GPT‑5.x + Claude, this table is a strong argument to re‑run your own private evals with Gemini 3 in the mix rather than assuming old pecking orders still hold.
Vending‑Bench 2: Gemini 3 Pro compounds to ~10× starting capital
On Andon Labs’ Vending‑Bench 2, which simulates a long‑horizon business running across hundreds of days with suppliers and contracts, Gemini 3 Pro reaches an average net worth of about $5,478 across runs, starting from ~$500—roughly 10× growth vending bench post. The same chart shows Claude Sonnet 4.5 around $3,839, GPT‑5.1 near $1,473, and Gemini 2.5 Pro stuck around $574 vending bench post.
For people experimenting with autonomous agents that manage inventory, pricing, or sourcing, this is one of the few public signals that Gemini 3 Pro can execute non‑trivial plans, re‑plan over time, and avoid obvious money‑burning traps. It doesn’t mean it’s safe to let it run your business, but it does mean you should absolutely include it in your internal agent evals.
Design Arena sees record delta as Gemini 3 Pro tops 3D/UI categories
On Design Arena, which scores models on web, game, 3D, and UI component design, Gemini 3 Pro Preview hits 1422 Elo overall and now leads 4 of the 5 code‑backed design arenas (Website, Game Dev, 3D Design, UI Components) design arena chart. The arena curator calls this "the biggest performance delta" they’ve seen since launching the benchmark, with prior leaders like GPT‑5.1 and Claude Opus/Sonnet sitting in the low‑1300s.
This is worth caring about if you use AI for front‑end generation or "vibe coding": it suggests Gemini 3 Pro is not only competent at layout but also at producing more varied, less template‑y UI than earlier systems. It won’t replace designers, but it changes who you ask to rough in complex interfaces or 3D‑heavy landing pages.
GeoBench: Gemini 3 Pro beats pro GeoGuessr players
The new GeoBench eval, which tests GeoGuessr‑style country localization from Street View, shows Gemini 3.0 Pro Preview with 84% country‑level accuracy, an average score of 4,145, and a median distance error of 144 km on an "easy world" map geobench results. In the same setup, a professional GeoGuessr player averaged 4,100 points with a 220 km median distance, meaning Gemini 3 Pro is the first LLM to surpass a human expert on this task geobench results.
For applied teams, this is less about GeoGuessr and more about how far multimodal models have come in reading geography, infrastructure, and signage cues from images. It hints that location‑aware agents for logistics, mapping, or even OSINT workflows can lean more on end‑to‑end vision models instead of custom heuristics.

Stay first in your field.
No more doomscrolling X. A crisp morning report for entrepreneurs, AI creators, and engineers. Clear updates, time-sensitive offers, and working pipelines that keep you on the cutting edge. We read the firehose and hand-pick what matters so you can act today.
I don’t have time to scroll X all day. Primer does it, filters it, done.
Renee J.
Startup Founder
The fastest way to stay professionally expensive.
Felix B.
AI Animator
AI moves at ‘blink and it’s gone’. Primer is how I don’t blink.
Alex T.
Creative Technologist
Best ROI on ten minutes of my day. I’ve shipped two features purely from their daily prompts.
Marta S.
Product Designer
From release noise to a working workflow in 15 minutes.
Viktor H
AI Artist
It’s the only digest that explains why a release matters and shows how to use it—same page, same morning.
Priya R.
Startup Founder
Stay professionally expensive
Make the right move sooner
Ship a product