Mistral 3 opens 675B‑param MoE – 3B–14B vision models land

Stay in the loop

Free daily newsletter & Telegram daily report

Executive Summary

Mistral 3 finally dropped as a fully open, Apache‑2.0 stack: a 675B‑parameter MoE Large 3 with 41B active parameters plus three dense Ministral 3 models at 3B, 8B, and 14B. All of them handle 256k context and images, and all ship in base and instruct flavors, with reasoning variants training on the small models. After a month of talking about closed frontier reasoners like DeepSeek V3.2 Speciale and Opus 4.5, this is the first frontier‑class suite most teams can actually own, fine‑tune, and ship without license drama.

On benchmarks, Large 3 looks like the new Apache‑licensed default: around 85.5 on multilingual MMLU, 43.9 on GPQA‑Diamond, and a 1418 text Elo in LMArena under its “Jaguar” codename, where it ranks #6 overall and #1 for coding among open‑weights. Artificial Analysis pegs its Intelligence Index 11 points above Mistral Large 2, trading blows with DeepSeek‑3.1 and Kimi K2 while still trailing top proprietary reasoning models.

The ecosystem arrived on day zero. vLLM serves the whole family with NVFP4 and sparse MoE kernels, Ollama has one‑line installs for all three Ministrals, Modal chopped cold starts for 3B from minutes to ~12 seconds, and Baseten is already recommending 8×B200 for production Large 3. The catch: a new community jailbreak shows Mistral 3 instruct models are very pliable, so you’ll need your own guardrails if you wire them into agents.

Feature: Mistral 3 goes fully open — Large 3 + Ministral 3B/8B/14B

Mistral opens a full model stack (Large 3 + Ministral 3/8/14B, multimodal, 256k) under Apache‑2.0 with immediate support across vLLM, Modal, Ollama and clouds—giving teams a credible open alternative at multiple sizes.

Broad, cross‑account launch of an Apache‑2.0 family: a frontier MoE (675B total/41B active) plus three small, vision‑capable dense models. Day‑0 ecosystem support and fresh bench data dominate today’s feed.

Jump to Feature: Mistral 3 goes fully open — Large 3 + Ministral 3B/8B/14B topics

🧩 Feature: Mistral 3 goes fully open — Large 3 + Ministral 3B/8B/14B

Mistral 3 family launches as fully Apache‑2.0 open model suite

Mistral has released the full Mistral 3 family under Apache 2.0: Mistral Large 3, a 675B‑parameter MoE with 41B active experts, plus three dense Ministral 3 models at 14B, 8B, and 3B parameters, all multimodal and shipping in base and instruct variants, with reasoning versions for the small models already training. The models offer 256k context windows, handle both text and images, and are immediately available across Mistral’s own Studio, major clouds like AWS Bedrock and Azure AI, and open‑weight hubs such as Hugging Face and OpenRouter, giving teams a rare frontier‑class stack they can actually own and fine‑tune without license drama launch thread Mistral blog aa writeup.

For AI engineers and infra leads, the point is: this is a credible open alternative in a space that’s been dominated by Chinese open models and closed Western APIs. You get a frontier‑scale MoE you can run wherever you have H100s or Blackwell, plus smaller dense models that slot into everything from serverless inference to on‑device experiments, all under a license your legal team won’t balk at. Leaders and analysts should read this as Mistral doubling down on an “ownable” frontier stack, rather than trying to chase OpenAI and Google on closed IP.

Mistral 3 opens 675B‑param MoE – 3B–14B vision models land

Executive Summary

Top links today

Feature: Mistral 3 goes fully open — Large 3 + Ministral 3B/8B/14B

Table of Contents

🧩 Feature: Mistral 3 goes fully open — Large 3 + Ministral 3B/8B/14B

Mistral 3 family launches as fully Apache‑2.0 open model suite

Mistral Large 3 ranks near top of open models on coding and reasoning

Ministral 3B/8B/14B bring multimodal small models to edge and browser

vLLM, Ollama, Modal and Baseten ship day‑0 support for Mistral 3

🦀 Anthropic buys Bun and claims $1B Claude Code run‑rate

Anthropic buys Bun as Claude Code hits $1B run‑rate in 6 months

Anthropic study: engineers now route 60% of their work through Claude Code

Anthropic launches Claude for Nonprofits with discounts and training

Developers see 3× faster agent runs using Bun vs Rust and Claude harnesses

🟥 OpenAI competitive reset: ‘Code Red’, new reasoning, and ‘Garlic’ pretrain

OpenAI reportedly declares “code red”, pausing ads to refocus on ChatGPT

Data shows ChatGPT usage dipping ~6–7% as Gemini 3 traffic surges

Leaks describe OpenAI’s “Garlic” pretrain beating Gemini 3 and Opus 4.5

Mark Chen teases near‑term reasoning model “ahead of Gemini 3”

OpenAI podcast unpacks GPT‑5.1 Instant’s reasoning and personality controls

🟧 AWS Nova 2 family and Nova Act agent platform

Amazon’s Nova 2 models put Pro/Lite/Omni/Sonic back in the frontier mix

Nova 2.0 Pro posts top‑tier agentic scores but mixed factuality

AWS launches Nova Act for “normcore” browser agents at scale

Nova 2 Lite lands on OpenRouter with 1M context and free trial

Nova Sonic 2.0 debuts as Amazon’s speech‑to‑speech reasoning model

🛠️ Agent IDEs and coding flows in practice

LangSmith Agent Builder public beta lets teams design and deploy agents via chat

Cline v3.39.1 ships inline diff explanations and free 256k “microwave” model

Cursor shares Composer‑1 internals: RL training, MXFP8 kernels, shared agent backend

Cursor teams lean on Bugbot rules plus Composer for end‑to‑end coding flows

Parallel’s n8n node turns web enrichment and search into reusable agent steps

Parallel’s n8n node turns web enrichment and search into reusable agent steps

Vercept’s Vy relaunches as a cross‑platform desktop agent that drives your UI

Hyperbrowser ships Hyper‑Research to benchmark coding tools and agents from URLs

OpenCode SDK sees growing adoption; maintainer plans polish and better docs

Simular 1.0 brings neurosymbolic Mac desktop agents near human OSWorld scores

🏢 Enterprise adoption and company moves

Anthropic engineers now offload most coding work to Claude

Anthropic reportedly preparing 2026 IPO to fund AI scale

Sourcegraph spins off Amp as separate coding agent company

Apple swaps AI chief as it tries to catch up in GenAI

Lovable switches to Claude Opus 4.5 for app generation

Anthropic launches discounted Claude program for nonprofits

Vercel hires Geldata team to invest in Python

⚙️ Serving stacks, quantization and model servers

Transformers v5 general release focuses on cleaner APIs and serving

SGLang adds native NVIDIA ModelOpt quantization pipeline

📊 Leaderboards and eval reads

LisanBench crowns DeepSeek V3.2 Speciale as top open non‑Anthropic reasoner

Artificial Analysis calls out Nova 2.0’s strong agents but weak factuality

Mistral Large 3 debuts as top open coding model on LMArena

SciArena refreshes scientific reasoning rankings with GPT‑5.1 and Gemini 3

Code Arena’s new WebDev track spots KAT Coder Pro at #17

🎬 Creative stacks: Nano Banana Pro, FLUX.2 and Kling O1

Kling O1 powers reference-driven character workflows across Freepik, InVideo and ImagineArt

Nano Banana Pro 2K variant jumps to #1 in both T2I and Image Edit arenas

Builders turn Nano Banana Pro into a backbone for multi-image, multi-scene workflows

FLUX.2 family lands near the top of Artificial Analysis image leaderboards with clear price–quality trade-offs

Fal emerges as a multi-model hub for high-end image and video generation

PixVerse v5.5 launches on fal with sound-aware, multi-shot video controls

Tencent’s HunyuanVideo 1.5 opens a creator contest via Tensor with cash prizes

🛡️ Threat models and defenses for agents

Perplexity open-sources BrowseSafe detector and benchmark for prompt-injection in browser agents

Community jailbreak prompt reliably bypasses safety on Mistral 3 instruct models

🧮 Accelerator roadmaps: Trainium3 UltraServers

AWS Trainium3 UltraServers target cheaper frontier‑scale AI training

🤖 Embodied signals: Optimus hands and student dog bot

Tesla Optimus hands jump in dexterity with 22‑DoF, sensor‑rich design

Stanford team open-sources ~$2K Gemini‑controlled quadruped robot dog

📚 New research: test‑time RL, video flows, and physics‑guided code

Chain‑of‑Unit‑Physics steers scientific code synthesis with physics unit tests

Hierarchical token compression speeds streaming Video‑LLMs while keeping ~99% accuracy

Stabilizing RL with LLMs: theory and recipes for large‑scale GRPO training

ThetaEvolve: test‑time RL lets an 8B model beat prior math systems on open problems

FINDER/DEFT reveal deep research agents struggle with evidence integration, not task parsing

Four Over Six makes 4‑bit NVFP4 training and inference much more stable

LongVT incentivizes long‑horizon video reasoning with native tool calls

ReasonIF shows LRMs routinely break instructions inside their reasoning traces