ChatGPT Agent Mode opens to 3 paid tiers โ€“ 4.5ร— faster on Sudoku

Stay in the loop

Free daily newsletter & Telegram daily report

Join Telegram Channel

Executive Summary

OpenAI just flipped on Agent Mode (Atlas) inside ChatGPT for Plus, Pro, and Business, turning the product from chat window into click-and-act assistant. It matters because Atlas works directly in the browserโ€”researching, planning, and executing stepsโ€”without the glue code most agents demand. Early tests are mixed: one study finds it solves medium Sudoku about 4.5ร— faster than a human baseline, but it stumbles on reflex-timing games like Chromeโ€™s Tโ€‘Rex Runner and Flappy Bird. Windows support is missing in this preview, and the rollout follows a brief pause on Atlas extensions for security.

Hands-on users say the basicsโ€”navigating, reading, simple clicksโ€”feel solid, but Atlas often stalls when composing or formatting inside DOMโ€‘heavy web apps. The new โ€œthinkingโ€ view doesnโ€™t help much either; autoโ€‘scroll keeps yanking you to the bottom, making the reasoning trace hard to audit mid-run. Power users comparing it to Perplexityโ€™s Comet argue thereโ€™s โ€œno reason to switchโ€ yet unless Atlas proves better at real tasks, especially content creation and edit flows.

If youโ€™re eyeing desktop agents, note the parallel track: OpenAIโ€™s Codex CLI added an experimental Windows sandbox this week, hinting at tighter guardrails coming to agent operations even as Atlasโ€™s own Windows build sits out this preview.

Feature Spotlight

Feature: ChatGPT Agent Mode goes handsโ€‘on

ChatGPT Agent Mode (Atlas) enters preview for Plus/Pro/Business, enabling agents to research, plan and act inโ€‘browserโ€”early evals show strengths in logic tasks but gaps in realโ€‘time control; broad user feedback begins.

Crossโ€‘account focus today: OpenAIโ€™s Agent Mode (Atlas) opens preview to Plus/Pro/Business. Threads include real usage, UX feedback, and an early webโ€‘games eval; strong Sudoku, weak reflex timing. This section owns all Atlas items.

Jump to Feature: ChatGPT Agent Mode goes handsโ€‘on topics

Table of Contents

๐Ÿงญ Feature: ChatGPT Agent Mode goes handsโ€‘on

Crossโ€‘account focus today: OpenAIโ€™s Agent Mode (Atlas) opens preview to Plus/Pro/Business. Threads include real usage, UX feedback, and an early webโ€‘games eval; strong Sudoku, weak reflex timing. This section owns all Atlas items.

ChatGPT Agent Mode opens preview to Plus, Pro and Business users

OpenAI flipped on Agent Mode in ChatGPT (Atlas) for paid accounts, enabling agents to research, plan, and take actions while you browse OpenAI announcement. The rollout follows extensions pause that temporarily disabled Atlas browser extensions for security.

Hands-on prompts and early testing are already circulating among power users hands-on try.

Paper: Atlas aces medium Sudoku ~4.5ร— faster than humans but struggles on reflex timing games

A new study probes ChatGPT Atlas as a web-game agent: it cleanly solves medium Sudoku roughly 4.5ร— faster than a human baseline, but falters on realโ€‘time tasks like Chromeโ€™s Tโ€‘Rex Runner and Flappy Bird due to precise timing demands paper summary. The work maps strengths to ruleโ€‘based logic and weaknesses to longโ€‘horizon control and physics.

  • Strong: Sudoku and other logic puzzles (fast, consistent execution) paper summary
  • Weak: Reflex timing, strict geometry, and openโ€‘world task chains (frequent early crashes or stalls) paper summary

Early takes pit Atlas against Perplexity Comet; Windows support called out as missing

Practitioners testing ChatGPT Atlas Agent Mode compare it to Perplexityโ€™s Comet, arguing thereโ€™s โ€œno reason to switchโ€ unless Atlas proves betterโ€”and noting it isnโ€™t available for Windows yet in this preview comparative take. Trial prompts are circulating to kick the tires on real tasks hands-on try.

Power users say Atlas stalls on DOM-heavy creation tasks despite basic browsing working

Hands-on reports praise Atlas for simple clicks and navigation but flag that it โ€œgets stuckโ€ when adding, formatting, or creating content inside complex web apps (richer DOM composition) power-user feedback. Testers want stronger actions for editing and composing, not just reading and clicking.

Thinking trace autoโ€‘scroll frustrates Atlas users trying to read reasoning history

Early UX feedback says the new โ€œthinkingโ€ view autoโ€‘scrolls to the bottom with each entry, making it hard to review the ongoing reasoning trace during a run UX note. Users are asking for better controls to pause or browse intermediate thoughts without fighting the scroll.


๐Ÿ—๏ธ AI infrastructure: campuses, energy and financing

Infra news dominated by OpenAIโ€™s 1+ GW Stargate campus in Michigan plus Amazonโ€™s Anthropic site switchโ€‘on, Metaโ€™s 1 GW solar deals, and debtโ€‘financed capex. Excludes Atlas (covered in Feature).

Amazon switches on Indiana AI campus for Anthropic with >500k Trainium 2, targeting 2.2 GW buildout

Amazonโ€™s New Carlisle, Indiana site dedicated to Anthropic is now live, running on 500,000+ Trainium 2 chips and planned to span 30 buildings with 2.2 GW when complete News summary, following up on Rainier site which flagged the massive chip count and power envelope. The project flips former cornfields into a multiโ€‘billionโ€‘dollar AI compute hub in roughly a year, reinforcing AWSโ€™s push to vertically own AI training capacity for key partners.

OpenAI picks Michigan for >1 GW Stargate campus; โ€œlargest investment in state historyโ€

OpenAI will build a gigawattโ€‘scale Stargate data center in Saline Township, with construction targeted for early 2026, 2,500 union construction jobs, ~450 permanent roles, and closedโ€‘loop water usage (no Great Lakes draw) Local coverage. The company also outlined the multiโ€‘site Stargate program in its post, underscoring a USโ€‘based AI infrastructure buildout OpenAI blog.

Debt wave funds AI buildout: AI capex now ~25% of US IG bond supply; Meta $30B, Oracle $18B, RPLDCI $27B

Bank of America data shows borrowing to fund AI data centers exploded in Septemberโ€“October, with AI now ~25% of US investmentโ€‘grade bond supply; recent highlights include Meta $30B, Oracle $18B, and Related Digital $27B Debt chart. Meta is also prepping another $25B sale as it frontloads ASIโ€‘oriented capex Bond sale plan. The financing mix concentrates cheapest capital with incumbents that can match longโ€‘lived contracts to chip lifecycles.

Samsung and NVIDIA to build AI โ€œmegaโ€‘factoryโ€ with 50k GPUs; cuLitho targets ~20ร— faster computational lithography

Samsung and NVIDIA will stand up a GPUโ€‘powered AI factory to run fab digital twins, speed chip design, and accelerate optical proximity correction with cuLitho (claimed ~20ร— faster), while integrating Blackwell/Jetson Thor in factory robotics WSJ summary. Running core chipmaking workloads on GPUs instead of CPU clusters signals a structural compute shift inside semiconductor manufacturing itself.

TSMC clears ~$49B A14 fab in Taichung for 1.4 nm; mass production targeted 2Hโ€™28

TSMC received permits for its A14 fab and utility buildings in Taichung, aiming 1.4 nm with ~15% speed at isoโ€‘power or ~25โ€“30% lower power at isoโ€‘perf versus 2 nm, risk runs in 2027, and volume in 2Hโ€™28 Local news summary. The node claims performanceโ€‘perโ€‘watt gains critical to AI accelerator cost curves, while avoiding Highโ€‘NA EUV reduces tool risk.

UBS model projects NVIDIA unit mix through 4Q26 with GB200 ramp and Rubin CPX on the horizon

A UBS unitโ€‘mix chart outlines NVIDIA shipments by accelerator family through late 2026, with GB200 and later B300/GB300 gaining share as H100/H200 fade UBS chart. The mix implies continued supplyโ€‘chain pressure shifting toward Blackwellโ€‘class parts and previews when nextโ€‘gen Rubin CPX enters the curve.

Google Cloud ascends on AI; Alphabet guides $91โ€“$93B 2025 capex and signals larger 2026 build

Alphabetโ€™s cloud arm has flipped from laggard to growth driver on AI demand, with management guiding $91โ€“$93B 2025 capex and warning of an even bigger 2026 build Reuters analysis. Googleโ€™s strategy leans on TPUs opened to external labs, signing nine of ten leading AI shops and anchoring future AI workload siting.

Meta stock falls 11% as 2025 AI capex lifted to $70โ€“$72B; investors question nearโ€‘term ROI

Despite beating Q3 estimates, Metaโ€™s shares dropped 11% after it raised 2025 capex to $70โ€“$72B to pursue superintelligence, with even larger outlays signaled for 2026 CNBC summary. The reaction underscores market sensitivity to openโ€‘ended AI spending plans absent concrete service monetization timelines.

Michigan officials detail Stargate jobs and environmental protections for OpenAI campus

Governor Whitmerโ€™s office frames the Stargate project as the stateโ€™s biggest single investment, citing 2,500 union construction jobs, ~450 onโ€‘site roles, a closedโ€‘loop cooling system, and no Great Lakes water draw Local coverage. The permittingโ€‘friendly footprint and community funds attached to the project illustrate how AI campuses are negotiating local acceptance.

RPO and depreciation math split AI capex into two cycles: nearโ€‘term contracted vs speculative builds

Financial Times analysis highlights diverging contract quality and unit economics: Microsoftโ€™s ~$400B RPO with ~2โ€‘year duration converts faster to cash, while others carry longer, lumpier exposure; rising D&A (e.g., to ~16.8% of revenue) tightens margin control as shortโ€‘lived AI gear fills data centers FT analysis. The result is a shortโ€‘cycle, backlogโ€‘anchored boom alongside a longerโ€‘cycle speculative build that assumes future demand.


๐Ÿ› ๏ธ Builder tooling: coding agents and research assistants

Big day for agent/dev tools outside Atlas: Clineโ€™s native tool calling and approvals, Claude Codeโ€™s installer + update, Operaโ€™s deep research, Kimi CLI with MCP, and Vercel Agent investigations. Excludes Atlas Feature.

Codex CLI v0.53 adds experimental Windows filesystem/network sandbox

OpenAIโ€™s Codex CLI v0.53 introduces a highly experimental Windows sandbox for workspaceโ€‘scoped writes and controlled networking, with an onโ€‘request approval mode and known caveat for worldโ€‘writable folders sandbox brief, and GitHub discussion. This ships days after the prior improvements CLI update that focused on undo and stability.

Claude Code v2.0.31: Vertex web search, Shift+Tab on Windows, and MCP fixes

The 2.0.31 release updates Windows modeโ€‘switch to Shift+Tab, adds Web Search on Vertex, honors VS Code .gitignore by default, and fixes subagents/MCP toolโ€‘name conflicts, compaction errors, and plugin uninstall behavior changelog.

Small ergonomics like /compact reliability and duplicateโ€‘summary fixes target longโ€‘running agent threads changelog.

Kimi CLI tech preview: shell UI with command exec, Zsh integration, and MCP

Moonshot released KIMI CLI (technical preview), a terminalโ€‘native coding agent featuring a shellโ€‘like UI, direct command execution, seamless Zsh integration, MCP support, and an Agent Client Protocol for broader tooling feature brief.

This lowers friction for agentโ€‘assisted coding and automations directly from the console feature brief.

Vercel Agent adds automated โ€˜Investigationsโ€™ for incidents; $100 credit for new users

Vercel Agent can now autoโ€‘detect anomalies and run AIโ€‘driven investigations that correlate telemetry and propose remediation steps, aiming to cut MTTR for production issues; new users get $100 in credits blog post, and Vercel blog. This pushes agentic ops beyond static alerts toward rootโ€‘cause analysis as a builtโ€‘in workflow.

FactoryAI Droid can import Claude agents directly from .claude/agents

Droid now supports โ€œImport from Claude (.claude/agents)โ€, making Claude agents portable into Droidโ€™s runtime without reโ€‘authoring feature screenshot.

This shrinks setup time for teams standardizing on Claude Skills while experimenting with alternative orchestrators.

LangChain earns AWS Generative AI Competency; LangSmith now on AWS Marketplace

LangChain joined AWSโ€™s Generative AI Competency program and listed LangSmith on AWS Marketplace, enabling agentโ€‘engineering workflows (tracing, evals, deployments) with ISV Accelerate alignment for coโ€‘sell partner update.

The move eases procurement and governance for teams standardizing on Bedrock, SageMaker, and AWS data services.

LlamaIndex ships native MCP search so coding agents can query its docs directly

LlamaIndex added a native MCP search endpoint for its documentation, letting MCPโ€‘enabled coding agents call search tools directly (no custom glue), which simplifies agent builds that need APIโ€‘accurate context docs update. This pairs well with editor agents that plan, retrieve, and cite within the same run.

Ollama v0.12.8 boosts Qwen3โ€‘VL and engine stability; desktop adds reasoningโ€‘effort control

Ollama 0.12.8 improves Qwen3โ€‘VL performance (FlashAttention default, better transparency handling) and engine prompt processing; Windows now ignores unsupported iGPUs release notes, and GitHub release. The desktop app also exposes perโ€‘chat โ€œreasoning effortโ€ selection to trade speed vs depth desktop UI.

Opera rolls out Deep Research Agent in Neon for longโ€‘form web analysis

Opera launched ODRA (Opera Deep Research Agent) in the Opera Neon browser, packaging sourcing, summarization, and deeper multiโ€‘page analysis as a builtโ€‘in research assistant feature brief. This puts an agentic researcher directly into a mainstream browser without extensions, useful for competitive/market scans and literature reviews.

Perplexity launches โ€˜Patentsโ€™ agent for IP research, free in beta to subscribers

Perplexity rolled out a Patents agent that structures and searches IP corpora as a guided research workflow, available free in beta for subscribers feature recap. Itโ€™s a targeted assistant for priorโ€‘art checks and technology landscaping inside a familiar research UX.


๐Ÿงช Models: โ€˜thinkingโ€™ Qwen and multimodal Nemotron on vLLM

Selective model updates relevant to builders: Qwen3 Max Thinking hits arenas and Nemotron Nano 2 VL arrives on vLLM. Runtimeโ€‘only updates (e.g., Ollama engine) live in Systems, not here.

Qwen3 Max Thinking appears in LM Arena, signaling release

The โ€˜thinkingโ€™ variant of Qwen3 Max surfaced in LMSYS Arena, with community posts indicating rollout is underway and broader evals imminent Arena update, release note, release hint. In context of Ollama Qwen3โ€‘VL, which added the VL lineup locally, this brings Qwenโ€™s reasoningโ€‘first tier into public headโ€‘toโ€‘heads.

Expect rapid informal benchmarking across math, coding, and agent workflows as Arena datapoints accumulate; an earlier headsโ€‘up also flagged โ€œwithin hoursโ€ timing for the drop release tease.

vLLM adds NVIDIA Nemotron Nano 2 VL (12B) for video and document intelligence

vLLM now serves NVIDIAโ€™s Nemotron Nano 2 VL, a 12B hybrid Transformerโ€“Mamba VLM with 128k context and Efficient Video Sampling to cut redundant tokens on long videosโ€”aimed at faster, accurate multimodal reasoning over multiโ€‘image docs and video integration post, vLLM blog. Builders get an enterpriseโ€‘ready path to highโ€‘throughput VLM agents, with weights offered in BF16/FP8/FP4โ€‘QAD formats and strong results on MMMU, MathVista, AI2D, and OCRโ€‘heavy tasks as outlined in the release.


๐Ÿงฉ Interoperability: MCP workflows and agent imports

MCPโ€‘centric moves to wire tools and agents together. Focus is on crossโ€‘tool interoperability; implementationโ€‘specific IDE features sit in Tooling.

LlamaIndex adds native MCP search endpoint for agent tooling

LlamaIndex rolled out a native MCP search endpoint so agent runtimes can call LlamaIndex-backed search tools directly, with docs live for builders MCP search docs. The move lowers glue-code and standardizes search access across MCP-compatible IDEs and orchestrators, following Replit templates that made MCP server deployment a oneโ€‘minute task.

This should simplify wiring retrieval into code assistants and research agents without bespoke adapters, and helps converge on MCP as the default interop surface for tool calls.

Claude Code v2.0.31 ships MCP subagent stability fixes

Anthropicโ€™s Claude Code v2.0.31 fixes an MCP edge case (โ€œTool names must be uniqueโ€) that broke some subagent setups, alongside plugin uninstall and compaction fixes Changelog details. A weekly roundup also highlights resumable subagents and a new Plan subagent that can pair with MCP tools Weekly roundup.

For interop-heavy projects, the MCP bugfix unblocks multi-tool agent stacks and reduces brittle behavior when wiring several MCP servers into a single plan.

FactoryAI Droid can now import Claude agents directly

FactoryAI added โ€œImport from Claude (.claude/agents)โ€ to Droid, letting teams load Claude-built agents directly into Droid sessions for reuse and extension Import menu screenshot. This reduces migration friction between ecosystems and encourages agent portability across stacks.

Practically, this makes Claude-defined workflows firstโ€‘class citizens inside Droid without re-authoring skills or tools, speeding crossโ€‘tool experimentation.

Kimi CLI tech preview lands with MCP and Agent Client Protocol support

Moonshot released a Kimi CLI technical preview that combines a shellโ€‘like UI, command execution and Zsh integration with MCP server support and the Agent Client Protocol, positioning the CLI as a hub for interoperable tool use Kimi CLI announcement.

For agent builders, native MCP in a terminal workflow means faster local prototyping of toolchains, easier testing of server capabilities, and portability across agent runtimes that speak MCP.

CopilotKit + LangGraph demo predictive state updates with human-in-the-loop sync

CopilotKit showcased โ€œpredictive state updates,โ€ wiring its realโ€‘time UI to LangGraph agents so edits flow as structured workflows (agent rewrites โ†’ human approval โ†’ live sync) rather than linear text diffs Workflow post. This pattern makes collaborative agent edits feel native while keeping humans in control of final changes.

For engineers stitching tools, itโ€™s a practical recipe for interop between an orchestrator (LangGraph), UI state, and agent tool callsโ€”useful where MCP tools and nonโ€‘MCP services coexist.


๐Ÿ’ผ Enterprise adoption and partnerships

Signals of commercialization: Perplexityโ€™s Getty deal for licensed images, LangChainโ€™s AWS competency/Marketplace path, and Figmaโ€™s Weavy acquisition for AI media pipelines.

Amazon lights up Indiana AI campus for Anthropic with >500k Trainium 2 chips and 2.2 GW plan

Amazon has activated its largest AI data center for Anthropic in New Carlisle, Indianaโ€”running over 500,000 Trainium 2 chips, scaling to 30 buildings and a planned 2.2 GW draw news brief, following up on initial build that outlined a 0.5โ€“1.0M Trainium target this year.

The dedicated campus underscores deep, longโ€‘term buyerโ€“supplier alignment between a hyperscaler and a frontier lab, with material implications for model training capacity and cost curves.

Perplexity struck a multiโ€‘year licensing deal with Getty Images so its AI answers can show licensed editorial and creative photos with credits and links, a notable move toward โ€œproperly attributed consent.โ€ Getty shares jumped roughly 45โ€“50% on the news deal coverage.

The agreement formalizes image rights for AI search and follows Perplexityโ€™s publisher revโ€‘share program; together they point to a paidโ€‘content supply chain for AI results.

Figma buys Weavy and unveils โ€˜Figma Weaveโ€™ for AI media generation pipelines

Figma acquired Tel Avivโ€“based Weavy and introduced the โ€˜Figma Weaveโ€™ brand, bringing a nodeโ€‘based canvas that chains multiple AI models to generate and edit images/video with granular layerโ€‘level controls; Weavy will run standalone initially before deeper Figma integration deal summary.

The move positions Figma to own more of the AI media workflow (prompting, lighting, angles, compositing) inside a designerโ€‘friendly canvas.

LangChain earns AWS Generative AI Competency; LangSmith now on AWS Marketplace

LangChain joined AWSโ€™s Generative AI Competency program and listed LangSmith on AWS Marketplace, with ISV Accelerate eligibility and โ€œDeployed on AWSโ€ statusโ€”giving enterprises a vetted, procurementโ€‘friendly path to agent engineering (tracing, evals, deployments) partner badge post.

Frameworkโ€‘agnostic positioning means teams can adopt LangSmith with or without LangChain/langgraph, while plugging into Bedrock, SageMaker, S3, Opensearch, and more.

Modal and Datalab teamed up so developers can deploy Marker + Surya OCR on GPUs in minutes, with cached weights and autoscaling that deliver roughly 10ร— higher parsing throughput; a hosted API backed by Modal is also available for maximum throughput partnership post, and the setup is documented in Modalโ€™s guide Modal blog post.

This brings a deterministic, hallucinationโ€‘free document intelligence stack into an elastic, productionโ€‘ready runtime.


โš™๏ธ Systems: sandboxes and local runtimes

Serving/runtime engineering updates: Codexโ€™s Windows sandbox for safer agent runs and Ollama engine/desktop improvements for practical local workflows.

Codex CLI v0.53 adds experimental Windows sandbox for safer agent runs

OpenAI introduced an experimental filesystem and network sandbox on Windows that confines agent actions to a workspace with onโ€‘request approvals, bringing tighter guardrails to Codex runs. Following up on v0.52 update that focused on stability, this release outlines a workspaceโ€‘write mode and flags, plus a key caveat: writes remain possible in directories where the Windows Everyone SID already has write permission. See setup flags and limitations in the docs sandbox flags, and the live docs and call for feedback via the GitHub page and discussion thread GitHub docs, testing call.

Ollama v0.12.8 boosts local Qwen3โ€‘VL with FlashAttention and engine fixes

Ollama shipped v0.12.8 with Qwen3โ€‘VL performance upgrades (FlashAttention enabled by default), faster prompt processing, and engine fixes such as better handling of transparent images and ignoring unsupported integrated GPUs on Windows. Release notes also mention app fixes like properly stopping a model before removal and correcting DeepSeek thinking toggles in the new desktop app release notes, with full details in the changelog GitHub release.

Northflank microVMs help scale secure production sandboxes during heavy launch traffic

cto.new reports moving to Northflankโ€™s microVMs to scale secure agent sandboxes through a surge, citing perโ€‘second billing, APIโ€‘driven provisioning, and thousands of daily container deployments without performance hits. The case study highlights a pragmatic path to isolate workloads and smooth spiky demand for agent workflows case study post, with deployment details in the provider writeโ€‘up Northflank blog.

Ollama desktop adds perโ€‘chat โ€œreasoning effortโ€ and model picker controls

The new Ollama desktop UI exposes a perโ€‘chat โ€œreasoning effortโ€ selector (e.g., Medium) alongside model choice, letting users trade latency and accuracy on the fly without leaving the conversation. This is a practical knob for local runs when switching between lightweight and more deliberate modes, captured in the updated toolbar screenshot desktop UI screenshot.


๐Ÿ›ก๏ธ Safety, abuse and rights

Policy and threatโ€‘intel notes: music rights groups align on AI registration rules; separate post shows automated botnet detection in production. Sandbox tech lives in Systems.

ASCAP, BMI, SOCAN align on registering partly AI-made songs; pureโ€‘AI works remain ineligible

North Americaโ€™s three major PROs will now accept registrations of musical works with meaningful human authorship that incorporate AI-generated elements, while works created entirely by AI remain ineligible. The groups also reiterate that training on copyrighted music without authorization is infringement and point to ongoing lawsuits against AI firms Policy overview.

  • Policies center human authorship as the basis for rights while creating a path to credit and payment when AI tools are used in production Policy overview.

Vercel BotID autoโ€‘blocks sophisticated botnet in ~5 minutes after 500% traffic spike

Vercel says its BotID Deep Analysis detected a sudden 500% traffic surge from a coordinated bot network, identified ~40โ€“45 spoofed browser profiles rotating through proxy nodes, and automatically re-verified and blocked the sessions within about five minutesโ€”no customer action required Incident report, Vercel blog.

  • The system flagged human-like fingerprints and behavior, then used correlation across browser profiles and proxies to classify the attack before enforcing blocks Vercel blog.

๐Ÿง  Training recipes: precision, adapters, and looping

Practitioner debates and papers on training and reasoning: FP16 vs BF16 for RLโ€‘FT stability, zeroโ€‘latency fused adapters, and ByteDance LoopLM tradeoffs.

Engineers push FP16 over BF16 in RL fineโ€‘tuning to cut train/infer divergence

Practitioners argue FP16โ€™s 10 mantissa bits (vs BF16โ€™s 7) reduce policy drift between training and inference in RL fineโ€‘tuning by improving numerical agreement of kernels and absorbing rounding noise practitioner thread. The same thread later corrects the plot source while keeping the core claim intact, underscoring rising interest in precision choices for stability plot correction, with others signaling imminent switches to FP16 in production training loops engineer comment. See the linked paper thread cited in the discussion for additional context on precision tradeโ€‘offs ArXiv paper.

Samsungโ€™s zFLoRA fuses adapters for zeroโ€‘latency fineโ€‘tuning

Samsung Research introduces zFLoRA, a fused lowโ€‘rank adapter that merges adapter weights into base layers, effectively eliminating the extra matmuls and memory traffic that make classic LoRA slower (LoRA can add up to ~2.5ร— prefill and ~1.6ร— decode latency) paper abstract. Results across 18 tasks on 1B/3B/7B models show accuracy comparable to LoRA and near full FT, with latency measured on H100 GPUs and NPUs remaining close to base model runtime paper abstract.

ByteDanceโ€™s LoopLM Ouro trades recurrence for depth; small models gain, no extrapolation beyond T=4

Ouro 1.4B/2.6B repeatedly applies the same transformer stack for T recurrent steps (trained at T=4) over 7.7T tokens, learning multiโ€‘hop tasks with fewer examples and adding a learned earlyโ€‘exit gate for easier inputs analysis thread. The tradeโ€‘offs: 4ร— FLOPs at T=4 inference, no accuracy gains when pushing recurrence beyond the trained depth, and standard untiedโ€‘depth transformers win in computeโ€‘matched comparisonsโ€”though LoopLMs look strong perโ€‘parameter and under memory/KV constraints analysis thread.

CISPO RL loss fixes clippingโ€‘induced CoT collapse, enabling longer reasoning chains

Authors recount how offโ€‘policy PPO clipping suppressed lowโ€‘probability โ€œthinking tokensโ€ (e.g., โ€œwait,โ€ โ€œbut,โ€ โ€œlet meโ€), stunting chainโ€‘ofโ€‘thought growth; CISPO restores gradient flow when advantages are positive while retaining stability, leading to onโ€‘policyโ€‘like length gains without divergence origin thread. A unified formulation that covers REINFORCE and PPO is presented, with reports of nearโ€‘R1 performance on Qwen2.5โ€‘32B in internal runs and detailed derivations of the masking and clipping behavior math details, Zhihu post.


๐Ÿ—‚๏ธ Agent data: RAG retrievers and highโ€‘throughput parsing

New retrieval assets and parsing infra: NVIDIAโ€™s Nemotron RAG family, Datalab Marker on Modal GPUs, and a patentsโ€‘focused agent from Perplexity subscribers.

Marker on Modal GPUs delivers ~10ร— document parsing throughput

Modal and Datalab launched a turnkey deployment for the Marker + Surya OCR stack: cache weights, spin up on GPUs in under five minutes, and autoscale to handle spikes, yielding roughly 10ร— higher throughput for structured document extraction versus CPU baselines Collab note, and Blog post. Teams that donโ€™t want to selfโ€‘host can also use Datalabโ€™s hosted Marker API, which runs on Modalโ€™s GPU backend for maximum throughput Hosted API note.

NVIDIA posts Nemotron RAG collection with text, multimodal, layout and โ€œOmniโ€ retrievers

NVIDIA released a suite of retrieval models on Hugging Face covering text retrievers, multimodal retrievers, layout detectors, and new โ€œOmniโ€ retrievers that span image, text, and audioโ€”licensed for commercial use, making them dropโ€‘in building blocks for RAG systems Model roundup, and Hugging Face collection. The โ€œOmniโ€ variants broaden modalities for retrieval pipelines, useful for enterprise document and media search Omni retrievers.

OpenRouter launches crossโ€‘provider embeddings directory

OpenRouter introduced a browsable catalog of embedding models across providersโ€”useful for search, reranking, and vectorโ€‘DB pipelinesโ€”exposing pricing, limits, and quick filtering in one place Release note, and Model directory. The listing makes it easier to trial alternatives without provider lockโ€‘in Browse page.

Perplexity debuts โ€˜Patentsโ€™ agent for IP research

Perplexity added a patentsโ€‘focused agent that streamlines intellectual property research workflows, with advanced capabilities available free during the beta for subscribers Feature note. The move expands RAGโ€‘style retrieval into structured patent corpora for dueโ€‘diligence and competitive analysis.


๐Ÿ“Š Evals and capability tracking

Measurement items outside of Atlas Feature: corrected GPTโ€‘5 scoring deltas and a quarterly landscape showing GPTโ€‘5 (high) retakes top spot. No other model launch repeats here.

EpochAI fixes GPT-5 scoring bug; โ€˜highโ€™ now edges โ€˜mediumโ€™, tie on ECI

EpochAI corrected an Inspect evaluations bug that was silently forcing GPTโ€‘5 calls set to โ€œhighโ€ reasoning down to โ€œmedium.โ€ Updated runs show GPTโ€‘5 (high) slightly ahead of GPTโ€‘5 (medium) on several benchmarks, while the two are now tied on the Epoch Capabilities Index. See benchmark bars and error bars in the update corrected scores. The root cause was an outdated Inspect version that ignored the โ€œreasoning effortโ€ parameter for OpenAI models unless the name began with โ€œoโ€ (e.g., o3); upgrading Inspect fixed it bug cause.

  • Notable deltas: OTIS Mock AIME 2024โ€“2025 (~92% vs ~87%), GPQA Diamond (~85% vs ~83%), FrontierMath T4 (~13% vs ~9%) corrected scores.

Quarterly State of AI: GPTโ€‘5 (high) leads; US and China dominate model releases

Artificial Analysisโ€™ latest quarterly landscape shows GPTโ€‘5 (high) retaking the top spot on their intelligence index, with big tech pushing across modalities while smaller challengers specialize. The report also highlights U.S. and China dominance in new model releases, with relatively few entrants from elsewhere report highlights, website report.

  • Modality spread: incumbents build across text, vision, audio, and agents; challengers focus on niche strengths report highlights.

๐Ÿ“š Research: computer use, decoding, memory and video reasoning

Fresh papers beyond training recipes: Surfer 2 crossโ€‘platform computer use agents, AutoDeco endโ€‘toโ€‘end decoding control, geometric memory in sequence models, and video zeroโ€‘shot reasoning limits.

Surfer 2 unifies web/desktop/mobile computer-use agents, beating prior systems

A new paper introduces Surfer 2, a single agent architecture that generalizes computer use across the web, desktop, and mobile while outperforming earlier systems on accuracy and task completion paper abstract.

Following Copilot boost sandboxed Windows 365 computer use, this result offers a research baseline for crossโ€‘platform action grounding and UI policy learning with stronger generalization than prior singleโ€‘environment agents.

AutoDeco lets LLMs learn their own decoding policy, moving beyond hand-tuned strategies

โ€œThe End of Manual Decodingโ€ proposes AutoDeco, an architecture where a model learns to control its own decoding strategyโ€”selecting sampling modes and constraints endโ€‘toโ€‘endโ€”rather than relying on fixed heuristics (e.g., temperature, nucleus thresholds) paper screenshot.

The approach aims to reduce trainโ€“inference mismatch and brittle promptโ€‘level tuning by integrating decoding choices into the learned policy itself; details include a controller that adapts decoding parameters based on context and objective feedback loops.

Transformers and Mamba memorize as geometry, solving 50Kโ€‘node path queries in one step

A study finds deep sequence models (Transformers, Mamba) tend to form geometric memories: nodes in a knowledge graph embed so that multiโ€‘hop paths become nearโ€‘oneโ€‘step distance checks, reaching up to 100% accuracy on unseen paths in graphs with ~50K nodes paper first page.

The work shows competition between associative (lookup) and geometric representations, with a Node2Vec baseline learning an even cleaner geometry tied to the graph Laplacianโ€”implications include faster multiโ€‘hop reasoning and more faithful retrieval without explicit chainโ€‘ofโ€‘thought.

Video generators arenโ€™t zeroโ€‘shot reasoners: MMEโ€‘CoF scores under 2/4 and fails on long chains

The MMEโ€‘CoF benchmark tests textโ€‘toโ€‘video models (e.g., Veoโ€‘3 class) on 12 reasoning areas and finds they average below 2/4, handling short, locally constrained steps but failing on longโ€‘horizon logic, strict geometry, and causal constraints benchmark paper.

Evaluators report smooth clips that nonetheless break rules (miscounts, timing errors, clutter misses), underscoring a gap between visual fidelity and robust procedural reasoning in zeroโ€‘shot settings.


๐ŸŽƒ Creative AI: Halloween effects, music, and recipes

Large volume of creative items: Sora character clips, Minimax/Kling horror filters, ElevenLabs Music tools, and Geminiโ€™s Veoโ€‘based Halloween howโ€‘tos. This section corrals the nonโ€‘dev media news.

Higgsfield drops 1080p Halloween horror pack with Minimax + Kling, free gens and credits promo

Higgsfield launched a seasonal set of 13 Minimax transformations and 4 Kling โ€œnightmaresโ€ (werewolf, devil, raven transition and more) with 1080p output and limitedโ€‘time free generations and credits giveaways inside the app feature rundown, free gens note. A dedicated landing page showcases oneโ€‘click โ€œHalloween presetsโ€ and global availability promo thread, with details and examples on the site Halloween presets.

ElevenLabs Music adds stem separation and inโ€‘painting, launches 24โ€‘hour Halloween radio and 50% promo

ElevenLabs rolled out Music stem separation and inโ€‘painting tools for granular remix control, alongside a oneโ€‘day โ€˜Radio Elevenโ€™ Halloween station and a twoโ€‘week 50% discount on Music plans feature rundown. The inโ€‘app radio is live for 24 hours with spooky remixes and spectral vocals radio announcement.

Soraโ€™s โ€˜Monster Manorโ€™ and character tools power Halloween shorts from creators

OpenAI highlighted a Halloween โ€œMonster Manorโ€ set in Sora and encouraged seasonal creations, while creators showcased multiโ€‘minute shorts using the new Characters feature in the Sora app Monster Manor, creator short, characters note. This follows credit packs, where OpenAI teased Characters coming to the web and paid Cameos; now the app experience is fueling steady โ€œSoraweenโ€ posts Soraween post.

Gemini shares Halloween creation playbook: Veo 3.1 monsters, costume ideas, โ€˜animate nightmaresโ€™ and invites

The Gemini team published a compact howโ€‘to thread for seasonal content: generate scary creatures with Veo 3.1, ideate costume looks, build full costume mockups, animate nightmare scenes, and autoโ€‘design party invitesโ€”all within the Gemini app and Studio how-to thread, Veo creature, costume ideas, animate nightmares, costume builder, party invites. A product overview page details image generation and editing (aka โ€œNano Bananaโ€) tips and prompt guidance Gemini image guide.

ChatGPT image generation shows yearโ€‘overโ€‘year gains on Halloween costume kit prompt

A repeat prompt (โ€œthose bags that hold cheap costumes, but make the costumes really weirdโ€) produced sharper, more humorous packaging conceptsโ€”like โ€œSesame Loaf,โ€ โ€œBeige Carpet Stain,โ€ and โ€œPossessed CAPTCHAโ€โ€”suggesting improved visual wit and layout fidelity over the past year image examples.

ComfyUI hosts Wan 2.2 Animate live session with control and quality tips

ComfyUI ran a Halloweenโ€‘day livestream on Wan 2.2 Animate covering practical knobs for motion control and output quality, with hosts breaking down the pipeline and sharing recipes for consistent results event announcement. A companion post links to the session and notes timing and hosts for onโ€‘demand viewing event replay.

On this page

Executive Summary
Feature Spotlight: ChatGPT Agent Mode goes handsโ€‘on
๐Ÿงญ Feature: ChatGPT Agent Mode goes handsโ€‘on
ChatGPT Agent Mode opens preview to Plus, Pro and Business users
Paper: Atlas aces medium Sudoku ~4.5ร— faster than humans but struggles on reflex timing games
Early takes pit Atlas against Perplexity Comet; Windows support called out as missing
Power users say Atlas stalls on DOM-heavy creation tasks despite basic browsing working
Thinking trace autoโ€‘scroll frustrates Atlas users trying to read reasoning history
๐Ÿ—๏ธ AI infrastructure: campuses, energy and financing
Amazon switches on Indiana AI campus for Anthropic with >500k Trainium 2, targeting 2.2 GW buildout
OpenAI picks Michigan for >1 GW Stargate campus; โ€œlargest investment in state historyโ€
Debt wave funds AI buildout: AI capex now ~25% of US IG bond supply; Meta $30B, Oracle $18B, RPLDCI $27B
Samsung and NVIDIA to build AI โ€œmegaโ€‘factoryโ€ with 50k GPUs; cuLitho targets ~20ร— faster computational lithography
TSMC clears ~$49B A14 fab in Taichung for 1.4 nm; mass production targeted 2Hโ€™28
UBS model projects NVIDIA unit mix through 4Q26 with GB200 ramp and Rubin CPX on the horizon
Google Cloud ascends on AI; Alphabet guides $91โ€“$93B 2025 capex and signals larger 2026 build
Meta stock falls 11% as 2025 AI capex lifted to $70โ€“$72B; investors question nearโ€‘term ROI
Michigan officials detail Stargate jobs and environmental protections for OpenAI campus
RPO and depreciation math split AI capex into two cycles: nearโ€‘term contracted vs speculative builds
๐Ÿ› ๏ธ Builder tooling: coding agents and research assistants
Codex CLI v0.53 adds experimental Windows filesystem/network sandbox
Claude Code v2.0.31: Vertex web search, Shift+Tab on Windows, and MCP fixes
Kimi CLI tech preview: shell UI with command exec, Zsh integration, and MCP
Vercel Agent adds automated โ€˜Investigationsโ€™ for incidents; $100 credit for new users
FactoryAI Droid can import Claude agents directly from .claude/agents
LangChain earns AWS Generative AI Competency; LangSmith now on AWS Marketplace
LlamaIndex ships native MCP search so coding agents can query its docs directly
Ollama v0.12.8 boosts Qwen3โ€‘VL and engine stability; desktop adds reasoningโ€‘effort control
Opera rolls out Deep Research Agent in Neon for longโ€‘form web analysis
Perplexity launches โ€˜Patentsโ€™ agent for IP research, free in beta to subscribers
๐Ÿงช Models: โ€˜thinkingโ€™ Qwen and multimodal Nemotron on vLLM
Qwen3 Max Thinking appears in LM Arena, signaling release
vLLM adds NVIDIA Nemotron Nano 2 VL (12B) for video and document intelligence
๐Ÿงฉ Interoperability: MCP workflows and agent imports
LlamaIndex adds native MCP search endpoint for agent tooling
Claude Code v2.0.31 ships MCP subagent stability fixes
FactoryAI Droid can now import Claude agents directly
Kimi CLI tech preview lands with MCP and Agent Client Protocol support
CopilotKit + LangGraph demo predictive state updates with human-in-the-loop sync
๐Ÿ’ผ Enterprise adoption and partnerships
Amazon lights up Indiana AI campus for Anthropic with >500k Trainium 2 chips and 2.2 GW plan
Perplexity signs multiโ€‘year Getty Images license to display credited photos in AI search
Figma buys Weavy and unveils โ€˜Figma Weaveโ€™ for AI media generation pipelines
LangChain earns AWS Generative AI Competency; LangSmith now on AWS Marketplace
Modal partners with Datalab to scale Marker OCR pipelines with ~10ร— throughput on GPUs
โš™๏ธ Systems: sandboxes and local runtimes
Codex CLI v0.53 adds experimental Windows sandbox for safer agent runs
Ollama v0.12.8 boosts local Qwen3โ€‘VL with FlashAttention and engine fixes
Northflank microVMs help scale secure production sandboxes during heavy launch traffic
Ollama desktop adds perโ€‘chat โ€œreasoning effortโ€ and model picker controls
๐Ÿ›ก๏ธ Safety, abuse and rights
ASCAP, BMI, SOCAN align on registering partly AI-made songs; pureโ€‘AI works remain ineligible
Vercel BotID autoโ€‘blocks sophisticated botnet in ~5 minutes after 500% traffic spike
๐Ÿง  Training recipes: precision, adapters, and looping
Engineers push FP16 over BF16 in RL fineโ€‘tuning to cut train/infer divergence
Samsungโ€™s zFLoRA fuses adapters for zeroโ€‘latency fineโ€‘tuning
ByteDanceโ€™s LoopLM Ouro trades recurrence for depth; small models gain, no extrapolation beyond T=4
CISPO RL loss fixes clippingโ€‘induced CoT collapse, enabling longer reasoning chains
๐Ÿ—‚๏ธ Agent data: RAG retrievers and highโ€‘throughput parsing
Marker on Modal GPUs delivers ~10ร— document parsing throughput
NVIDIA posts Nemotron RAG collection with text, multimodal, layout and โ€œOmniโ€ retrievers
OpenRouter launches crossโ€‘provider embeddings directory
Perplexity debuts โ€˜Patentsโ€™ agent for IP research
๐Ÿ“Š Evals and capability tracking
EpochAI fixes GPT-5 scoring bug; โ€˜highโ€™ now edges โ€˜mediumโ€™, tie on ECI
Quarterly State of AI: GPTโ€‘5 (high) leads; US and China dominate model releases
๐Ÿ“š Research: computer use, decoding, memory and video reasoning
Surfer 2 unifies web/desktop/mobile computer-use agents, beating prior systems
AutoDeco lets LLMs learn their own decoding policy, moving beyond hand-tuned strategies
Transformers and Mamba memorize as geometry, solving 50Kโ€‘node path queries in one step
Video generators arenโ€™t zeroโ€‘shot reasoners: MMEโ€‘CoF scores under 2/4 and fails on long chains
๐ŸŽƒ Creative AI: Halloween effects, music, and recipes
Higgsfield drops 1080p Halloween horror pack with Minimax + Kling, free gens and credits promo
ElevenLabs Music adds stem separation and inโ€‘painting, launches 24โ€‘hour Halloween radio and 50% promo
Soraโ€™s โ€˜Monster Manorโ€™ and character tools power Halloween shorts from creators
Gemini shares Halloween creation playbook: Veo 3.1 monsters, costume ideas, โ€˜animate nightmaresโ€™ and invites
ChatGPT image generation shows yearโ€‘overโ€‘year gains on Halloween costume kit prompt
ComfyUI hosts Wan 2.2 Animate live session with control and quality tips