Third-party MRCR v2 results put Claude Opus 4.6 at a 78.3% match ratio at 1M tokens, ahead of Sonnet 4.6, GPT-5.4, and Gemini 3.1 Pro. If you are testing long-context agents, measure retrieval quality and task completion, not just advertised context window size.

Anthropic has moved 1M-token context from preview status into general availability for Claude Opus 4.6 and Claude Sonnet 4.6. In the same announcement thread, the rollout post says the models can ingest up to 600 images, which expands the practical input budget beyond text-heavy agent runs.
The rollout is already showing up in developer tooling. In a Claude Code screenshot, Opus 4.6 appears as "Opus 4.6 (1M context) · Claude Max," while a supporting roundup also describes the 1M window as generally available for both 4.6 models. That matters operationally because it turns long-context testing into something engineers can actually run inside coding workflows rather than a limited-access benchmark claim.
The strongest signal in the evidence is not the 1M number but the retrieval curve. The MRCR v2 chart shows Opus 4.6 at 91.9% mean match ratio at 256K tokens and 78.3% at 1M, while Sonnet 4.6 drops to 65.1%, GPT-5.4 to 36.6%, and Gemini 3.1 Pro to 25.9%. The same chart notes Gemini's numbers were measured by Context Arena on the same benchmark rather than using vendor self-report, and OpenAI's score is a "bin average" across 128K-256K rather than a single fixed context length.
That distinction is why retrieval quality matters more than raw advertised window size. As one reply puts it, "If you can't get accurate retrieval," the benefit of a larger window is limited. The supporting comparison post also calls out GPT-5.4 as a "regression" on long-context behavior relative to an earlier OpenAI chart at 256K, reinforcing that context-window expansion does not automatically preserve recall.
The most concrete field report here is a side-by-side CLI comparison. In that screenshot, Gemini CLI warns that sending a message "might exceed the context window limit," while Claude Code continues processing with Opus 4.6 at 1M context. The author claims Claude handled a larger project prompt that Gemini would not submit, which is anecdotal, but the screenshot does show a real difference in tool behavior under large inputs.
For code agents, that changes the failure mode. Instead of deciding which repo slice, docs subset, or prior conversation chunk to drop, teams can keep more of the working set in one session. Another practitioner post frames it as "entire codebases" and "complete conversation history" staying in context for complex refactors. Independent eval work is still catching up, though: one evaluator said his team hit "a couple speed bumps" while trying to get Opus 1M working in their own test setup.
Claude can now drive macOS apps, browser tabs, the keyboard, and the mouse from Claude Cowork and Claude Code, with permission prompts when it needs direct screen access. That makes legacy desktop workflows automatable, and Anthropic is pairing the push with more background-task support for longer agent loops.
releaseOpenClaw shipped version 2026.3.22 with ClawHub, OpenShell plus SSH sandboxes, side-question flows, and more search and model options, then followed with a 2026.3.23 patch. Teams get a broader plugin surface, but should patch quickly and review plugin trust boundaries as the ecosystem grows.
releaseCursor shipped Instant Grep, a local regex index built from n-grams, inverted indexes, and Bloom filters that drops large-repo searches from seconds to milliseconds. Faster candidate retrieval shortens the coding-agent loop, especially when ripgrep-style scans become the bottleneck.
breakingChatGPT now saves uploaded and generated files into an account-level Library that can be reused across conversations from the web sidebar or recent-files picker. It removes repetitive re-uploading and makes past PDFs, spreadsheets, and images part of a persistent working context.
breakingEpoch AI says GPT-5.4 Pro elicited a publishable solution to one 2019 conjecture in its FrontierMath Open Problems set, with a formal writeup planned. Treat it as an early milestone worth reproducing, not blanket evidence that frontier models can already automate math research.
Anthropic consistently manages to impress, particularly with its ability to maintain attention and reasoning across extended contexts. Even outperforming ex-context-king Google, by a lot.
This is huge, literally. 1M context is now generally available for Claude Sonnet/Opus 4.6 and it can ingest 600 images. So it can "watch" a video - take a video and split out the frames then upload. Not sure how big the images can be, probably small to leave room for a convo. Show more
Claude Opus 4.6 with 1M context in Claude Code is a massive upgrade. 5x the context window we had before yesterday's update. Entire codebases. Full documentation. Complete conversation history. All in context. No more truncation. No more lost context mid-session. This changes Show more