Kilo said MiniMax M2.7 placed fifth on PinchBench, 1.2 points behind Opus 4.6 at much lower input cost, while community tests showed strong multi-loop agent behavior on graphics tasks. If you route coding-agent traffic by price, M2.7 looks worth a controlled bake-off.

Kilo's benchmark writeup says MiniMax M2.7 scored 86.2% on PinchBench, a benchmark for agentic coding tasks, putting it fifth among 50 tested models and 1.2 points behind Opus 4.6. The same post says it outperformed several established competitors while improving 3.7 points over M2.5.
The second benchmark matters more for autonomous coding behavior. In Kilo Bench, an 89-task eval, Kilo reports M2.7 finished second overall with a 47% pass rate and says the model often spends longer exploring a codebase before making changes. Kilo's launch thread claims that pattern let it solve tasks that no other tested model completed, but the writeup also says the same behavior can drive up token use and cause timeouts on harder jobs.
The first practitioner reports are less about raw benchmark rank and more about orchestration style. In one shared setup, the author says they used "Opus 4.6 as a reviewer/planner" with "4 worker minimax M2.7 agent" instances and a loop count of five to iteratively improve generated scenes workflow thread. The attached demo shows a Three.js-generated "premium interactive isometric 3D cozy room" built in a single HTML block with repeated refinement cozy room demo.
A follow-up voxel-art test used the same five-loop pattern and the author says to "always use minimax in agentic form," adding that the full run at five loops still cost less than Sonnet 4.6 cost claim. The evidence here is anecdotal, but it lines up with Kilo's benchmark note that M2.7 appears strongest when given room to explore, review, and iterate rather than being treated as a cheap one-pass coding model benchmark details.
Vals AI switched SWE-Bench Verified from SWE-Agent to the bash-only mini-swe-agent harness, aligning results more closely with the official benchmark setup. Top score dipped slightly to 78.8%, but the change reduces harness-specific confounds when comparing models.
releaseOpenClaw shipped version 2026.3.22 with ClawHub, OpenShell plus SSH sandboxes, side-question flows, and more search and model options, then followed with a 2026.3.23 patch. Teams get a broader plugin surface, but should patch quickly and review plugin trust boundaries as the ecosystem grows.
releaseCursor shipped Instant Grep, a local regex index built from n-grams, inverted indexes, and Bloom filters that drops large-repo searches from seconds to milliseconds. Faster candidate retrieval shortens the coding-agent loop, especially when ripgrep-style scans become the bottleneck.
breakingChatGPT now saves uploaded and generated files into an account-level Library that can be reused across conversations from the web sidebar or recent-files picker. It removes repetitive re-uploading and makes past PDFs, spreadsheets, and images part of a persistent working context.
breakingEpoch AI says GPT-5.4 Pro elicited a publishable solution to one 2019 conjecture in its FrontierMath Open Problems set, with a formal writeup planned. Treat it as an early milestone worth reproducing, not blanket evidence that frontier models can already automate math research.
Holy Shit This is insane result 🤯 I used opus 4.6 as a reviewer/planner and 4 worker minimax M2.7 agent and this the result Voxel art of eiffel tower < loop set to 5 that means opus will tell @MiniMax_AI to make it better 5 times >
Minimax M2.7 use @threejs to make this , at this rate we can build anything I will test it on voxel art , always use minimax in agentic form and loop it 5 times it will make result better at 5 loops is still cost less then sonnet 4.6 lol