updateMarch 19, 2026

MiniMax M2.7 benchmarks 34% hallucination rate on new tests

New third-party tests put MiniMax M2.7 at a 34% hallucination rate, roughly 65 tps, and 27.04% on Vibe Code Bench while users pushed it through physics-heavy web demos. It looks increasingly viable for agent workflows, but performance still swings by task and harness.

OpenClaw Coding Agents Agent Readiness Benchmarks

4 min read

MiniMax M2.7 benchmarks 34% hallucination rate on new tests

TL;DR

Third-party testing now puts MiniMax M2.7 at a much lower hallucination rate than M2.5: the AA-Omniscience chart shows 34% for M2.7 versus 89% for M2.5, while practitioner testing from Cedric Chee also says hallucinations are reduced in a voxel pagoda prompt.
On coding and agent-style evals, the benchmark roundup reports 56.22% on SWE-Pro, 1495 Elo on GDPval-AA, and a 97% adherence rate on “massive, complex skills,” while a separate Vibe Code Bench run places M2.7 at 27.04% for building apps from scratch.
Speed looks roughly flat despite the upgrade: the Zhihu review thread says MiniMax kept average throughput around 65 tokens per second even under “tight computing power pressure,” but the same review says complex reasoning regressed slightly and can consume 50%-100% more tokens.