MiniMax M2.1 opens 230B‑param MoE – 74% SWE‑bench push

Stay in the loop

Free daily newsletter & Telegram daily report

Executive Summary

MiniMax releases open weights for MiniMax‑M2.1, a 10B‑active / 230B‑total MoE coding model positioned as an agent backbone rather than a one‑shot autocomplete; launch charts claim 74% on SWE‑bench Verified, 49.4% on Multi‑SWE‑bench, 72.5% on SWE‑bench Multilingual, 47.9% on Terminal‑Bench 2.0 and an 88.6 VIBE average, with several scores edging Gemini 3 Pro and Claude Sonnet 4.5. The model targets multi‑language software work (Rust/Go/Java/C++/Kotlin/web/mobile) and “interleaved thinking” for tools and agents; weights ship under an MIT‑style license that adds a MiniMax branding requirement for commercial apps, and are exposed via MiniMax’s own API plus a Hugging Face release.

• Long‑context and bias profile: Context Arena’s MRCR run shows 49.2% AUC and 42.8% pointwise accuracy on 2‑needle retrieval at 128k tokens—roughly half GLM‑4.7’s input cost and ~85% lower output cost—but with strong forward‑drift bias and “creative” hallucinated needles in 74–82% of misses.
• Ecosystem and serving: vLLM and SGLang ship day‑0 parsers for MiniMax‑style tool/“thinking” tokens; Novita hosts inference on Hugging Face, and early 4–6‑bit Mac Studio tests report ~40–47 tok/s, making M2.1 an immediately runnable open coding/agent option.

Feature: MiniMax M2.1 open-sourced for real-world coding agents

MiniMax M2.1 releases open weights for agentic coding; community frameworks add day‑0 support, and posts tout SOTA SWE/VIBE plus local deployability—making a strong open option for dev/agent stacks.

Cross-account focus today: MiniMax ships open weights for its agentic coding model; immediate ecosystem pickup and benchmark claims aimed at SWE/VIBE use. This is the day’s major actionable model story for builders.

Jump to Feature: MiniMax M2.1 open-sourced for real-world coding agents topics

🧰 Feature: MiniMax M2.1 open-sourced for real-world coding agents

MiniMax open-sources M2.1 coding MoE as SOTA agent backbone

MiniMax M2.1 open weights (MiniMax): MiniMax has released open weights for MiniMax‑M2.1, a 10B‑active / 230B‑total MoE model positioned as SOTA for SWE‑bench, VIBE and Multi‑SWE coding and agent benchmarks, with claims of outperforming Gemini 3 Pro and Claude Sonnet 4.5 on several metrics in the launch chart launch thread; this follows agent backend, where M2.1 was described as a proprietary agent backend inside tools. The company highlights M2.1’s multi‑language programming focus (Rust, Java, Go, C++, Kotlin, mobile, web) and real‑world complexity targets in its product write‑up and text‑generation docs, which explain tool use, interleaved "thinking" and prompt caching for the model text generation guide and model news.

• Model and license details: M2.1 is a sparse MoE with 10B active parameters per token out of 230B total, advertised as higher‑throughput and easier to deploy locally than dense peers launch thread; community commentary notes an MIT‑style license with an added requirement to display the MiniMax brand when commercializing apps, which is relevant for product teams planning white‑label use license note.
• Distribution and surfaces: The model is available as MiniMaxAI/MiniMax-M2.1 on Hugging Face with Novita as one of the hosted inference backends hf release and hf model card; MiniMax also exposes it via the MiniMax Open Platform and a Codex‑style API with Anthropic/OpenAI SDK compatibility, documented in its text‑generation guide text generation guide.
• Positioning and training philosophy: MiniMax’s own alignment blog frames M2.1 as optimized both for open benchmarks and messy real workflows, emphasizing interleaved thinking (reasoning woven through the task, not in a single pre‑ or postfix block) and dual alignment to benchmarks and users alignment blog; the company explicitly invites users to "try harder & try anything" and stresses that M2.1 is built for agents doing complex multi‑step work rather than one‑shot code snippets testing invite.
• Promo and adoption push: To seed usage, MiniMax is offering an 80%‑off coding plan—"start 2026 with a $2 MiniMax Coding Plan"—plus a giveaway that bundles robots, NVIDIA GTC tickets and API credits for active M2.1 subscribers pricing promo and giveaway rules; the team’s own framing is that the model is “SOTA, fast, easy to infer, economically robust” and they ask rhetorically "What else could you ask for" miniMax slogan.

Developer responses describe M2.1 as a "huuuuge leap in SWE tasks" with strong agentic behavior and note that its mix of open weights plus commercial API makes it an unusually flexible coding model for 2026 planning dev reaction.

MiniMax M2.1 opens 230B‑param MoE – 74% SWE‑bench push

Executive Summary

Top links today

Feature: MiniMax M2.1 open-sourced for real-world coding agents

Table of Contents

🧰 Feature: MiniMax M2.1 open-sourced for real-world coding agents

MiniMax open-sources M2.1 coding MoE as SOTA agent backbone

MiniMax M2.1 claims SOTA coding and strong long‑context at lower cost

vLLM and SGLang ship day‑0 runtimes for MiniMax M2.1

🧑‍💻 Coding agents in practice: workflows, prompts, and web automation

Claude Code writes ~200 PRs in a month as engineer skips IDE

Addy Osmani links AI coding success to boring engineering foundations

Firecrawl n8n v2.0 turns web scraping into an agentic /agent workflow

Omar Shakir outlines Claude Code skill workflows and “context engineering” focus

Prompt template turns Claude Code into explainer + async background agent for non‑devs

Steipete’s personal stack turns Codex into a multi‑network orchestration layer

Cursor tip: one Plan Mode task per agent window to avoid context bloat

Peakypanes + Oracle skill used for quick parallel Codex agents

Developers wire Claude Code, Gemini CLI and Codex into Obsidian workspaces

🏗️ AI factories, memory supply, and geopolitics of inference

Bank of America sees Nvidia–Groq as a bid to own the inference rack

Bloomberg: 7GW of new AI data centers complete in 2025, US power to 61.8GW

UBS: Nvidia’s Hopper line alone still out-earns all compute rivals combined

AI demand drives DRAM price spike and turns memory into a new choke point

Intel Fab 52 tool stack underlines scale of upcoming 18A AI wafer capacity

US developer proposes repurposing Navy nuclear reactors to power AI campuses

📊 Leaderboards, bias diagnostics, and deployment variance

LM Arena highlights how provider/runtime choices skew “same model” behavior

Poetiq runs GPT‑5.2 X‑High to 75% on ARC‑AGI‑2 public eval

Context Arena shows GLM‑4.7 strong ≤64k but sharp cliff at 128k

MiniMax M2.1 scores 49.2% AUC at 128k with forward‑drift bias profile

Context Arena normalizes prices to AUC@128k and adds richer cost drilldowns

Devstral 2 hits 86.9% success on real CTO coding tasks, close to Sonnet 4.5

Opus 4.5 estimated at ~3.5‑minute no‑CoT reasoning horizon on math tasks

🧪 New methods: agent memory, RL composition, refusal control, and quant physics

Agent memory survey maps forms/functions/dynamics and shift toward RL control

OpenBMB shows RL can teach new compositional skills beyond reranking

Refusal Steering steers political answers via activations without retraining

QuantiPhy benchmark finds VLMs mostly guess motion numbers from priors

SAGE RL agent turns past tool use into reusable skills with better efficiency

💼 Usage share, assistant quality pushes, and compute margins

OpenAI compute margin on paid usage climbs to ~68–70% by late 2025

Satya Nadella takes direct control of Copilot reliability and grounding

OpenAI explores memory‑targeted sponsored links inside ChatGPT answers

Similarweb: Gen‑AI web visits up 76% YoY while ChatGPT share falls and Gemini rises

Anthropic fixes Claude Max 5× holiday promo bug and resets affected limits

ChatGPT usage data: more than six requests a day puts you in top 10%

⚙️ Routing and runtime effects on output quality

LM Arena highlights how runtime “inference quality” diverges for identical models

Satya Nadella personally drives Microsoft Copilot grounding and tool reliability

Context Arena dissects GLM‑4.7’s long‑context dropoff and recency bias

OpenRouter Auto Router now spans 58 models and supports tool calling

Refusal Steering steers Qwen3’s political refusals at inference without retraining

Claude Memory 8.2.2 adds OpenRouter support for cheaper recall models

🤖 Autonomous robots in the wild and factory lines

Autonomous combat robot reportedly holds frontline position for six weeks in Ukraine

GITAI robot autonomously assembles 5‑meter tower on uneven ground

UBTECH 1000 humanoid enters mass production in China

Jensen Huang pegs Tesla Optimus in multi‑trillion‑dollar robot market

Unitree G1 martial‑arts demo highlights safety risks around nearby humans

Porcospino Flex single‑track robot targets squeezing and gripping in tight spaces

🛡️ Assistant trust: ads-in-answers and search inclusion disputes

OpenAI weighs memory‑targeted ads inside ChatGPT answers

IndiaMART sues OpenAI over exclusion from ChatGPT search results

Rob Pike email flap spotlights consent and “slop” in AI outreach

🎬 Generative media stacks and creative workflows

Perplexity’s Comet browser adds avatar-based virtual try-on for shopping

Google Search AI Mode doubles as code+visualization tutor

Qwen-Image-Edit-2511 focuses on character consistency and industrial design

Free AI tool generates 9-shot video sequences from a single prompt

Opus 4.5 powers fast ASCII storyboard skill for low-cost pre-viz

SuperDesign lets users train AI design agents with custom style and context

Wan by Alibaba showcased in quick New Year AI video

Adobe Lightroom’s on-device segmentation enables full photo edits on iPhone 16 Pro

Lovable coding assistant adds on-demand video generation

Nano Banana Pro used for custom vouchers and advanced light-painting art

🧭 Workflows are changing: agency over intelligence and 2026 expectations

Boring engineering practices emerge as key accelerators for AI‑driven teams

Practitioners argue everyone is behind on AI, even frontier users

2026 predictions shift focus from AI demos to 95%+ reliability