Fresh stories

GLM-5.2 ranks #1 on DeepSWE with 44% pass@1
Independent results put GLM-5.2 at the top of the open-model DeepSWE board and near the top on debate and post-train evals. Watch token use and long reasoning traces, which can offset its headline price advantage.
Wafer claims GLM-5.2 hits 222 tok/s and 12.6s end-to-end
Wafer said its GLM-5.2 deployment leads Artificial Analysis on throughput and latency, and priced usage at $1.20 input and $4.10 output per million tokens. Compare serverless and dedicated endpoints if you need speed at scale.

Hermes Agent adds Blank Slate mode in hermes setup
Hermes now offers a setup path that starts with only a provider, model, file operations, and terminal access. The smaller base gives users a minimal install they can extend manually.


GLM-5.2 ranks #1 on DeepSWE with 44% pass@1
Independent results put GLM-5.2 at the top of the open-model DeepSWE board and near the top on debate and post-train evals. Watch token use and long reasoning traces, which can offset its headline price advantage.

GLM-5.2 ships to BrowserCode, Hyper, OpenCode, and Together in 3 days
BrowserCode, Hyper, OpenCode, Together, and other vendors added GLM-5.2 soon after release. That turns the open model into a deployable option across coding, browser automation, and hosted chat.

GLOSSOPETRAE releases Lingua Ex Machina with 250 covert channels and 0% monitor recovery
The project ships a paper, repo, and UI for generated languages, alien code, and tokenizer blind-spot testing across model pairs. Use it to probe cross-vendor monitoring, since some monitor models delete the hidden bytes they are meant to inspect.
Top storiesthis week
Engineers report GLM-5.2 matches near-Opus planning at about 1/10 the price
Independent tests put GLM-5.2 near Opus 4.8 and GPT-5.5 on planning and coding, and users shared Claude Code, BrowserCode, dcode, and local-serving recipes. It matters because many engineers are treating it as a daily-driver option for text-heavy coding, though teams still report weaker vision and provider limits.


lift-pdf releases 9B extractor with 90.2% accuracy and 9.5s p50
lift-pdf released an open-source 9B model for schema-constrained document extraction, with code, pip install, playground access, and a 90.2% score on the team's 225-document bench. It matters because the model claims near-Gemini 3.5 Flash accuracy at 9.5s p50, though coverage is still skewed toward Latin-language docs and commercial-use limits remain.

ComputeSDK releases 2026 100k Scale Invitational results across 6 sandbox providers
ComputeSDK published results from its 2026 100k Scale Invitational after weeks of reruns and infra tuning across Modal, Tensorlake, Northflank, Declaw AI, E2B, and Isorun. It matters because sandbox and agent infra claims now have a shared public concurrency target instead of vendor-specific load demos.

Developers publish loop libraries and control-loop guides for long-running agents
Builders released reusable loop artifacts this week, including a Loop Library Skill, repo templates, and published control-loop definitions for docs sweeps, onboarding checks, and error triage. It matters because teams are turning one-shot prompting into persistent agent runs with explicit stop conditions and shared repo state.

Secure Exec v0.3 rewrites in Rust and adds Bun SDK, process trees, and Node-less mode
Secure Exec v0.3 shipped a full Rust rewrite, Bun and Rust SDKs, process-tree support for spawn and exec inside the VM, and a configurable Node-less mode. It matters because agent sandboxes can tighten performance and isolation without depending on a full Node runtime.







