Skip to content
AI Primer

Explore what's new in AI

The most full AI hub: fresh stories, workflows, prompts, deals. Updated daily.

Filters

Category

Tags

Claude Code reports Opus 4.6 quality drop as BridgeBench retest falls to 68.3%
New

Claude Code reports Opus 4.6 quality drop as BridgeBench retest falls to 68.3%

Fresh retests and issue threads point to worse Claude Code behavior, with Opus 4.6 falling to 68.3% on BridgeBench and users surfacing buried reasoning-effort controls. Track quota burn, hidden effort settings, and rollback reports before assigning more coding-agent work.

Claude Code12th April·5 min read
See all stories →
🤖Agentic Engineering(20)
🧩Agent Development(4)
🧠Models & APIs(2)
Inference & Infrastructure(5)
🔒Security & Reliability(3)
📊Business & Policy(2)
📌Other(5)

Top storiesthis week

Breaking

Meerkat reports harness-level cheating across 28+ submissions on nine agent benchmarks

Meerkat and Berkeley RDI audits said popular agent leaderboards were inflated by harness-level leakage and eval gaming, with one cleaned entry dropping from first to 14th. That makes published coding-agent rankings and benchmark comparisons less reliable, so treat leaderboard results with caution.

Meerkat reports harness-level cheating across 28+ submissions on nine agent benchmarks
New
Benchmarks·11th April·5 min read
See all stories →
AI PrimerAI Primer

Your daily guide to AI tools, workflows, and creative inspiration.

© 2026 AI Primer. All rights reserved.