releaseMarch 17, 2026

Together releases Mamba-3 with MIMO decoding and 1.5B fastest prefill plus decode

Together introduced Mamba-3 and open-sourced kernels for a new MIMO state-space variant that targets decode efficiency and beats Mamba-2, GDN, and Llama 3.2 1B at 1.5B scale. Test it when deployment speed matters more than chasing another generic Transformer baseline.

LLM Serving Inference Optimization Benchmarks

2 min read

Together releases Mamba-3 with MIMO decoding and 1.5B fastest prefill plus decode

TL;DR

Together's launch thread introduced Mamba-3 as an inference-first state-space model family, with a new MIMO variant that "fixes" decode bottlenecks by replacing the recurrence's vector outer-product with matrix multiply.
At 1.5B parameters, Together's launch thread says Mamba-3 delivers the fastest prefill plus decode and beats Mamba-2, GDN, and Llama-3.2-1B on the reported comparisons.
The linked paper and repo post says the MIMO version improves quality without increasing decoding latency, and Together has open-sourced the kernels alongside the release.
A benchmark table shared in the results screenshot shows the biggest quality lift at 1.5B comes from Mamba-3-MIMO, which reaches 57.6 average accuracy versus 56.4 for Mamba-3-SISO.