Mistral Small 4 combines reasoning and non-reasoning modes in one 119B MoE, adds native image input, and expands context to 256K at $0.15/$0.6 per million tokens. It improves sharply over Small 3.2, but still trails similarly sized open peers on several evals.

Artificial Analysis describes Small 4 as a multimodal open-weights release with "hybrid reasoning" in a single model, meaning engineers can switch between reasoning and non-reasoning behavior without changing to a separate checkpoint launch thread. The same post says the model takes image and text input, produces text output, and doubles context from 128K in Small 3.2 to 256K.
The implementation details are concrete enough to matter for deployment. Small 4 is listed at $0.15/$0.60 per million input/output tokens, licensed under Apache 2.0, and available through Mistral's first-party API; Artificial Analysis' model page also notes strong throughput, summarizing output speed at 151.2 tokens per second full breakdown. The thread adds a self-hosting caveat: at native FP8, the 119B-parameter weights need about 119GB, which is more than the 80GB on a single H100 launch thread.
The clearest gain is over Mistral's own prior small model. Artificial Analysis says reasoning mode jumps 12 points on its Intelligence Index, from 15 on Small 3.2 to 27 on Small 4, while non-reasoning mode reaches 19 launch thread. On agentic work, the same source reports GDPval-AA improving from 339 Elo to 871, putting Small 4 close to Mistral Large 3 at 880.
The peer comparison is more mixed. Artificial Analysis says 27 still trails open models in the same size class, including gpt-oss-120B at 33, Nemotron 3 Super 120B A12B at 36, and Qwen3.5 122B A10B at 42 intelligence tradeoff. On multimodal evals, Small 4 scores 57% on MMMU-Pro, ahead of Mistral Large 3 at 56% but well behind Qwen3.5's 75%, and on hallucination the model's -30 AA-Omniscience score is better than the comparable open peers cited in the thread full breakdown. Artificial Analysis also says its reasoning run used about 52M output tokens versus roughly 78M, 110M, and 91M for those three peers, suggesting a cheaper reasoning profile even if the absolute benchmark ceiling is lower token efficiency.
Physical Intelligence says its RL token compresses VLA state into a lightweight signal that an on-robot actor-critic can adapt in minutes. This matters for last-millimeter manipulation, where full-size models are often too slow or too coarse to tune online.
releaseOpenClaw shipped version 2026.3.22 with ClawHub, OpenShell plus SSH sandboxes, side-question flows, and more search and model options, then followed with a 2026.3.23 patch. Teams get a broader plugin surface, but should patch quickly and review plugin trust boundaries as the ecosystem grows.
releaseCursor shipped Instant Grep, a local regex index built from n-grams, inverted indexes, and Bloom filters that drops large-repo searches from seconds to milliseconds. Faster candidate retrieval shortens the coding-agent loop, especially when ripgrep-style scans become the bottleneck.
breakingChatGPT now saves uploaded and generated files into an account-level Library that can be reused across conversations from the web sidebar or recent-files picker. It removes repetitive re-uploading and makes past PDFs, spreadsheets, and images part of a persistent working context.
breakingEpoch AI says GPT-5.4 Pro elicited a publishable solution to one 2019 conjecture in its FrontierMath Open Problems set, with a formal writeup planned. Treat it as an early milestone worth reproducing, not blanket evidence that frontier models can already automate math research.
Mistral has released Mistral Small 4, an open weights model with hybrid reasoning and image input, scoring 27 on the Artificial Analysis Intelligence Index @MistralAI's Small 4 is a 119B mixture-of-experts model with 6.5B active parameters per token, supporting both reasoning
On Intelligence vs Total Parameters, Mistral Small 4 (Reasoning, 27) offers a less favorable tradeoff than peers at similar sizes such as gpt-oss-120B (high, 33), NVIDIA Nemotron 3 Super 120B A12B (Reasoning, 36), and Qwen3.5 122B A10B (Reasoning, 42)
At ~52M output tokens, Mistral Small 4 (Reasoning) uses fewer tokens to run the Artificial Analysis Intelligence Index compared to peer models such as gpt-oss-120B (high, ~78M), NVIDIA Nemotron 3 Super 120B A12B (Reasoning, ~110M), and Qwen3.5 122B A10B (Reasoning, ~91M)