Kimi K2.5 posts 50.2% HLE – $0.60 in, $3.00 out

Stay in the loop

Free daily newsletter & Telegram daily report

Executive Summary

Moonshot AI’s Kimi K2.5 continues its “open-weights near-frontier” push with a widely shared 50.2% on Humanity’s Last Exam (full set); Artificial Analysis tags it at GDPval‑AA Elo 1309 and frames it as an MoE with 1T total params and 32B active, shipped in INT4 at ~595GB. Distribution tightened fast: OpenRouter lists $0.60/M input and $3.00/M output tokens; Ollama exposes a cloud target plus ollama launch wiring into agent CLIs; Replicate markets it for “visual coding.” Cost narratives lean on AA’s $371 “cost to run” index figure; claims travel faster than independent harness reproductions.

• LM Arena: Kimi K2.5 Thinking shows up #15 overall and #1 open model; also cited #7 in Coding; preference Elo signal, not task-verifiable.
• Arcee Trinity‑Large‑Preview: 400B MoE (~13B active) appears on OpenRouter free tier; vLLM posts day‑0 vllm serve with auto tool-choice flags.
• DeepSeek‑OCR 2: claims 16× visual token compression (256–1120 tokens/image) and 91.09% OmniDocBench v1.5; vLLM 0.8.5 day‑0 support anchors self-hosting feasibility.

Net: open models are converging on the same playbook—aggressive distro + serving recipes + leaderboard proof points—while real-world variance likely depends on endpoint stacks, tool harnesses, and how much the “cheap multimodal” story holds under load.

Prism: LaTeX-native research workspace inside ChatGPT (GPT‑5.2 in-project)

Prism collapses “LaTeX editor + PDF preview + citations + AI chat” into one project-aware workspace, cutting copy/paste drift and setup friction for research teams—an Overleaf-class surface with GPT‑5.2 in the loop.

High-volume launch coverage of OpenAI Prism: a cloud LaTeX workspace where GPT‑5.2 can read/edit within the project (equations, refs, structure) and support real-time collaboration. This category focuses on Prism and excludes other model/tool launches.

Jump to Prism: LaTeX-native research workspace inside ChatGPT (GPT‑5.2 in-project) topics

🧪 Prism: LaTeX-native research workspace inside ChatGPT (GPT‑5.2 in-project)

OpenAI launches Prism, a free LaTeX-native workspace for scientific writing

Prism (OpenAI): OpenAI shipped Prism, a cloud LaTeX-native workspace where GPT-5.2 works inside each project (not in a separate chat tab), and it’s available now to anyone with a ChatGPT personal account as stated in the Launch announcement and reiterated in the Availability note.

OpenAI also frames Prism as removing “version conflicts and setup overhead” for scientific tooling adoption, as described in the Adoption friction note alongside the Product page.

Kimi K2.5 posts 50.2% HLE – $0.60 in, $3.00 out

Executive Summary

Top links today

Prism: LaTeX-native research workspace inside ChatGPT (GPT‑5.2 in-project)

Table of Contents

🧪 Prism: LaTeX-native research workspace inside ChatGPT (GPT‑5.2 in-project)

OpenAI launches Prism, a free LaTeX-native workspace for scientific writing

Prism brings a live LaTeX editor + PDF preview with GPT-5.2 in the same workspace

Prism integrates Zotero for reference import into LaTeX projects

Prism’s AI is project-aware across structure, equations, and references

Builders frame Prism as “Cursor for scientists” and an Overleaf-native AI workspace

Prism can turn hand-drawn sketches into TikZ diagrams

Prism prompts questions about data controls and opt-out behavior

⌨️ Claude Code + Claude desktop: customization, UX polish, and CLI churn

Claude Code 2.1.21 tightens session reliability and VS Code Python env handling

Claude Code adds custom keybindings via /keybindings

Claude Code permission hooks questioned after reported rm auto-approvals

Claude Desktop roadmap leak mentions Sketch and a local dev-server toggle

Claude Desktop surfaces a not-yet-active Plugins section

Vercel AI Gateway adds Claude Code Max support

Claude Code preview shows team-customizable spinner verbs

🧰 OpenAI Codex CLI: UX knobs, regressions, and real-world usage complaints

OpenAI resolves Responses API bug showing false rate-limit errors

Codex CLI draws criticism for not rendering markdown tables

Codex 0.92 adds /personalities with friendly vs pragmatic modes

Developer credits GPT-5.2 in Codex with fixing a high-impact production bug

🔎 Cursor & IDE indexing: semantic search and reuse of teammate indexes

Cursor claims orders-of-magnitude faster indexing for large repos

Index reuse drops time-to-first-query from seconds to milliseconds

A Cursor-for-Slack MCP trace hints at agent-to-Slack workflows

✅ Automated PR review agents: Devin Review + Kilo Code Reviewer + agent-safe guardrails

Devin Review expands PR review with codebase-aware bug catching and a URL-swap entrypoint

dcg blocks destructive git commands from agent tool calls unless the human confirms

Kilo Code Reviewer launches PR auto-reviews with model choice, now free with MiniMax M2.1

Gemini Code Assist PR reviews get criticized for incorrect or low-signal feedback

🧭 Agentic coding practices: context discipline, multi-session setups, and prompt techniques

Split planning vs execution by running two Claude sessions in separate repos

Subagent discipline: avoid spawning subagents when the main agent already has context

Ask the agent to refactor your codebase for its own context-window constraints

Treat “skills” as installable, versioned docs instead of blog-post knowledge

TDD/RGR loop with an agent (“Ralph”) and a dedicated refactor prompt

Use variable-driven prompt templates to reuse image prompts without drift

🦞 Agent runners & multi-agent workplaces: Moltbot, LobeHub, MiniMax Agent Desktop, Superagent

Clawdbot becomes Moltbot after trademark pressure

LobeHub launches a multi-agent teammate workspace with remixing

Moltbot rename triggers handle squatting and scam confusion

Moltbot ships with memory present but unconfigured by default

Airtable’s Superagent ships an async multi-agent research workflow

MiniMax launches Agent Desktop as a cloud agent workspace

Moltbot gains a local-model backend through Ollama

Ollama Cloud exposes Kimi K2.5 to agent toolchains via ollama launch

Supermemory adds persistent “infinite memory” to Moltbot via a plugin

🧩 Installables: Skills/CLIs that feed agents better context

Firecrawl ships a Skill + CLI to fetch higher-coverage web context for agents

Warp adopts AGENTS.md as the standard project context file for agents

📊 Benchmarks & eval signals: open model rankings, agentic leaderboards, and new math benchmark

Artificial Analysis: Kimi K2.5 leads open weights and closes in on frontier models

EpochAI launches FrontierMath: Open Problems, a verifier-first benchmark for unsolved math

LM Arena: Kimi K2.5 Thinking debuts as #1 open model in Text Arena

ARC-AGI-2 public eval plot: RSA + Gemini 3 Flash reported at 59.31%

Prediction Arena chart shows an early Grok checkpoint leading returns

Artificial Analysis: K2 Think V2 boosts intelligence and reduces hallucinations

LiveBench: Kimi K2.5 Thinking shows up 9th overall on a multi-skill leaderboard

📦 Model releases builders are testing: Kimi K2.5, Trinity Large, DeepSeek OCR 2, Qwen3-Max-Thinking

Kimi K2.5 benchmark story: strong agent tests, moderate run cost

Kimi K2.5 lands across OpenRouter, Ollama Cloud, Replicate, and more

Kimi K2.5 feels “Opus-like” to some builders, but not everyone likes it

Trinity Large Preview arrives as a 400B MoE, with OpenRouter access and vLLM support

DeepSeek-OCR 2 launches with learned reading order and vLLM day-0 support

Qwen3-Max-Thinking: adaptive tool calling gets the spotlight in Zhihu tests

🧨 Tooling ecosystem friction: trademarks, scammers, and maintainer burden signals

Anthropic trademark push forces Clawdbot rebrand to Moltbot

Moltbot handle squatting and account takeover during rename causes recovery scramble

Maintainer warns of “coin” impersonation scams targeting Moltbot/Clawdbot identity

🛡️ Agent security & misuse surfaces: prompt injection, malware planting, and robustness concerns

Clawdbot prompt injection via hidden GitHub hyperlink can plant near-invisible backdoors

Adversarial robustness is framed as the main blocker to trusting agents

🏗️ AI infra economics: subscriptions, credits, and datacenter power constraints

Data centers are bypassing the grid with 48 GW of behind-the-meter power plans

SoftBank reportedly explores up to $30B more investment in OpenAI