Mistral Devstral 2 hits 72.2% SWE‑Bench – 24B laptop coder rivals giants
Stay in the loop
Free daily newsletter & Telegram daily report
Executive Summary
Mistral showed up to the coding race with receipts, not vibes. Devstral 2 (123B params) and Devstral Small 2 (24B) both ship as open‑weight coders with 256K context and FP8 checkpoints, posting 72.2% and 68.0% on SWE‑Bench Verified—within a few points of proprietary staples like Claude 4.5 Sonnet and GPT‑5.1 Codex Max. The twist: the 24B Small model is roughly 28× smaller than some DeepSeek‑class flagships yet lands in the same accuracy band, and it’s Apache 2.0, laptop‑deployable, and very privacy‑friendly.
What’s new versus yet another open model drop is the stack around it. Mistral shipped Vibe CLI as an open, repo‑aware terminal agent—plan → read → edit → run → summarize—where all prompts and tools live in Markdown, begging to be forked. Day‑zero support from vLLM (with a dedicated tool‑calling parser), Zed’s new Vibe Agent Server, AnyCoder’s model picker, and Kilo Code’s IDE (free Devstral usage all December, after quietly running a pre‑release “Spectre” build) means you can trial this in real workflows without writing glue.
Builders are already tagging Devstral Small 2 as “SOTTA” (state of the tiny art) and treating it as the default self‑hosted coder, while grumbling about the big model’s revenue cap for $20M+/month companies. Net effect: if you’ve been leaning on DeepSeek or closed coders, Devstral is now a serious, open toggle in your production dropdown.
Top links today
- OpenAI and Anthropic agent standards overview
- Missing Layer of AGI coordination paper
- Omega system for trusted cloud agents
- Excessive chain-of-thought training study
- M4-RAG multilingual multimodal RAG benchmark
- World models with calibrated uncertainty paper
- GRAPE unified transformer position encoding
- Verifier-driven LLM reasoning and diversity paper
- LUNE efficient LLM unlearning via LoRA
- LLM failure modes in agentic simulations
- LLMs vs time-series models for forecasting
- GenAI.mil Pentagon platform using Gemini
- FT on Nvidia H200 export deal to China
Feature Spotlight
Feature: Mistral’s Devstral 2 + Vibe CLI push open‑source coding to SOTA
Mistral ships Devstral 2 (123B) and Devstral Small 2 (24B) plus the Vibe CLI—open SOTA coding with 72.2%/68.0% SWE‑bench Verified, 256K context, FP8 weights, and a repo‑aware terminal agent.
Biggest cross‑account story today. New open‑weight coding models (123B, 24B) with 256K context and a native terminal agent. Multiple third‑party benchmarks, tools, and day‑0 serving surfaced in the sample.
Jump to Feature: Mistral’s Devstral 2 + Vibe CLI push open‑source coding to SOTA topics