Fresh stories
Kimi K2.6 launches with 58.6 SWE-Bench Pro and 4,000-tool-call agent runs
Moonshot open-sourced Kimi K2.6, a 1T-parameter MoE with 32B active parameters, 256K context, multimodal input, and larger agent swarms. It now sits near frontier closed models for long-horizon coding and tool use, so teams can try it for agent workflows.

Kimi K2.6 adds day-one support across vLLM, SGLang, Ollama, and OpenRouter
Kimi K2.6 shipped across vLLM, SGLang, OpenRouter, Baseten, Ollama, OpenCode, Hermes Agent, and Droid within hours of launch. That cuts the usual lag between model release and production trials, so mixed-provider agent stacks can test it sooner.

Claude Code 2.1.116 adds 67% faster /resume and safer sandbox rm checks
Claude Code 2.1.116 shipped 24 CLI changes, including faster resume on large sessions, stricter guardrails around rm and rmdir, and automatic plugin dependency installs. It also updates terminal input behavior and model surface area for agent workflows, so teams should upgrade if they rely on the CLI.


Kimi K2.6 launches with 58.6 SWE-Bench Pro and 4,000-tool-call agent runs
Moonshot open-sourced Kimi K2.6, a 1T-parameter MoE with 32B active parameters, 256K context, multimodal input, and larger agent swarms. It now sits near frontier closed models for long-horizon coding and tool use, so teams can try it for agent workflows.

OpenAI Codex adds Chronicle screen memories in macOS Pro preview
OpenAI added Chronicle, a Codex preview that turns recent screen context into reusable memories for errors, files, docs, and workflows. The macOS Pro-only feature stores local memory unencrypted and can burn rate limits quickly, so watch prompt-injection risk before relying on it.

Qwen launches Qwen3.6-Max-Preview on Qwen Chat with AA Index 52
Qwen put Qwen3.6-Max-Preview live on Qwen Chat as an early flagship preview with stronger agentic coding and world-knowledge claims. Early testers report strong first-pass results, but the Max line remains closed rather than open-sourced.

Kimi K2.6 adds day-one support across vLLM, SGLang, Ollama, and OpenRouter
Kimi K2.6 shipped across vLLM, SGLang, OpenRouter, Baseten, Ollama, OpenCode, Hermes Agent, and Droid within hours of launch. That cuts the usual lag between model release and production trials, so mixed-provider agent stacks can test it sooner.
Vercel updates breach bulletin: npm packages stayed untampered
Claude adds live artifacts in Cowork with synced dashboards and version history
Google AI Studio adds Pro and Ultra plan support with higher quotas
Claude Code 2.1.116 adds 67% faster /resume and safer sandbox rm checks
Top storiesthis week
Opus 4.7 users report 1.46x tokenization and faster limit burn
Four days after the Opus 4.7 launch, independent tests measured about 1.35-1.46x more text tokens than 4.6 while users kept reporting faster limit burn and weaker coding. That can change effective cost and session economics in Claude Code even if list prices stay flat.


ChatGPT Pro users report GPT-5.4 Pro with faster SVG and UI generation
Multiple Pro users said GPT-5.4 Pro started producing richer front-end and SVG outputs with much faster runtimes, despite no formal OpenAI announcement. The reports matter because they affect whether long visual and code-generation tasks are practical inside ChatGPT.

Vercel reports OAuth-linked breach via compromised AI tool
Vercel disclosed unauthorized access to internal systems affecting a limited subset of customers and said a compromised Google Workspace OAuth app at a third-party AI tool was the entry point. Some non-sensitive environment variables may have been exposed, so teams should review SaaS integrations and secret handling now.

Codex users report subagent, MCP, and canary deploy workflows
Practitioners shared repeatable Codex workflows for long-lived threads, background subagents, computer-use access through MCP, and canary rollouts. Codex is being used less as a one-shot assistant and more as a persistent automation harness.

Gemma 4 ecosystem ships 60+ on-device demos and local agent benchmarks
A weekend of Gemma 4 demos spanned YC hackathon projects, offline iPhone runs, and HN reports of strong local coding and SQL-agent performance. Gemma 4 is increasingly showing up as a practical edge model for tool use and multimodal apps, not just a release benchmark.







