Skip to content
AI Primer

Explore what's new in AI

Where people deep in AI come to stay current.

Filters

Category

Tags

Breaking

Human-on-the-Bridge compares reusable eval assets with LLM judges and human review

A new Human-on-the-Bridge paper argued for front-loading expert judgment into reusable evaluation assets, while practitioners also shared double-run and multi-model review setups. The cluster matters because teams tuning agent harnesses need repeatable ways to measure behavior beyond one-off benchmark scores or subjective PR review.

Human-on-the-Bridge compares reusable eval assets with LLM judges and human review
New
Evals·21st June·5 min read
Breaking

Hermes Agent adds self-hosted Mem0 and headless desktop connections

Hermes Agent can now self-host Mem0, and the desktop client can attach to headless Hermes instances or start one with the hermes desktop command. The change expands always-on memory and remote control setups outside a laptop session.

Hermes Agent adds self-hosted Mem0 and headless desktop connections
New
Hermes Agent·21st June·3 min read
See all stories →
🤖Agentic Engineering(29)
🧩Agent Development(3)
🧠Models & APIs(1)
Inference & Infrastructure(7)
🔒Security & Reliability(3)

Top storiesthis week

GLM-5.2 ranks #1 on DeepSWE with 44% pass@1

Independent results put GLM-5.2 at the top of the open-model DeepSWE board and near the top on debate and post-train evals. Watch token use and long reasoning traces, which can offset its headline price advantage.

GLM-5.2 ranks #1 on DeepSWE with 44% pass@1
GLM·20th June·7 min read
See all stories →
AI PrimerAI Primer

Your daily guide to AI tools, workflows, and creative inspiration.

© 2026 AI Primer. All rights reserved.