Fresh stories
NVIDIA releases Nemotron 3 Ultra: 550B MoE, 1M context
NVIDIA shipped Nemotron 3 Ultra, a 550B/55B-active hybrid Mamba-Transformer MoE with open weights, data, and recipe, plus broad runtime and host support. It matters because the model pairs frontier open benchmarks with immediate agent-serving options, though local use still needs heavy quantization or large-memory hardware.

Anthropic reports Claude wrote 80% of merged code
Anthropic published internal metrics showing Claude wrote 80% of merged code, with 8x engineer output and 52x training-code speedups in Mythos Preview. The post matters because it gives a rare lab-side look at AI-assisted engineering gains, while still saying research judgment remains a bottleneck and recursive self-improvement is unproven.

Cognition launches Devin Productivity Guarantee with $10M cap
Cognition said it will fund Devin usage up to $10 million when measured engineering value falls below cost, and published a technical writeup estimating productive engineering hours per session. It matters because the company is shifting agent pricing from tokens to claimed output and extending coding evaluation toward much longer task horizons.


NVIDIA releases Nemotron 3 Ultra: 550B MoE, 1M context
NVIDIA shipped Nemotron 3 Ultra, a 550B/55B-active hybrid Mamba-Transformer MoE with open weights, data, and recipe, plus broad runtime and host support. It matters because the model pairs frontier open benchmarks with immediate agent-serving options, though local use still needs heavy quantization or large-memory hardware.

Anthropic reports Claude wrote 80% of merged code
Anthropic published internal metrics showing Claude wrote 80% of merged code, with 8x engineer output and 52x training-code speedups in Mythos Preview. The post matters because it gives a rare lab-side look at AI-assisted engineering gains, while still saying research judgment remains a bottleneck and recursive self-improvement is unproven.

ChatGPT adds memory summaries and 2x memory in Dreaming V3 rollout
OpenAI rolled out a more capable ChatGPT memory system that keeps context across conversations, shows a reviewable memory summary, and doubles memory for US Plus and Pro users. The change matters because persistent context becomes a first-class product feature with explicit controls instead of a static saved-memories note list.

Cognition launches Devin Productivity Guarantee with $10M cap
Cognition said it will fund Devin usage up to $10 million when measured engineering value falls below cost, and published a technical writeup estimating productive engineering hours per session. It matters because the company is shifting agent pricing from tokens to claimed output and extending coding evaluation toward much longer task horizons.
Arena launches Agent Mode rankings with GPT-5.5 High leading
Codex fixes token undercounting after three reliability incidents and quota resets
GitHub Copilot adds 1M context window and reasoning levels
Browser Use adds cloud profiles and geo proxies, with 484 browsers in <2s
Top storiesthis week
Gemma 4 12B ships encoder-free multimodal local model with 16GB target and 256K context
Google released Gemma 4 12B, an Apache 2.0 encoder-free multimodal model with native audio and vision for 16GB-class laptops. Day-zero support in llama.cpp, vLLM, Ollama, MLX, and SGLang should make local agents and on-device apps easier to deploy immediately.


Codex users report outages, 5-hour caps, and token shortages after Sites launch
Users reported outages, tighter 5-hour caps, and token availability problems a day after OpenAI launched Codex Sites and plugins. OpenAI reset Codex usage limits after three incidents, so teams should watch quotas and backend reliability as agent workflows ramp up.

Uber cuts AI coding-tool spend to $1,500 per employee per tool each month
Uber set a $1,500 monthly limit for each AI coding tool an employee uses, covering products such as Cursor and Claude Code. The cap gives enterprises an early benchmark for coding-agent spend as token costs outgrow typical software-seat budgets.

LangSmith launches Sandbox, LLM Gateway, and Engine for agent execution, spend tracking, and eval triage
LangSmith added sandboxed execution, spend-aware gateway routing, and Engine to surface recurring agent failures from traces. The bundle gives teams one place to run agents, control token spend, and turn production issues into debugging and eval loops.

Ideogram 4.0 releases 9.3B open weights with 2K output and non-commercial license
Ideogram released 4.0 as open weights with 2K output, layout control, and strong text rendering, with rollout to ComfyUI, fal, and Hugging Face. Teams can download the design-focused model, but they should check the non-commercial license before using it in production.











