Cohere released a 2B speech-to-text model with 14 languages and top Open ASR scores, and upstreamed encoder-decoder optimizations to vLLM in the same launch. It is a self-hosted ASR option, so test accuracy and throughput on your own speech workload.

Transcribe 03-2026 is a self-hostable ASR release aimed squarely at production transcription stacks. In the primary announcement, Cohere's model is described as conformer-based, 2B parameters, and covering 14 languages, while a Hugging Face maintainer highlighted that it is both "quite runnable" and available under Apache 2.0 with Transformers support on day one.
On quality, the public claim is straightforward: the launch thread says the model topped the Open ASR leaderboard. The model page is the canonical artifact for trying the weights, and TechCrunch's linked report adds two concrete numbers absent from the tweets: an average WER of 5.42 on the leaderboard and processing speed of 525 minutes of audio per minute. That same report says Cohere's internal human evals showed a 61% win rate on accuracy, coherence, and usability, while also flagging relatively weaker performance in Portuguese, German, and Spanish.
The more consequential engineering detail may be the serving work that landed alongside the model. According to the vLLM post, Cohere contributed encoder-decoder optimizations for variable-length encoder batching and packed attention in the decoder, and vLLM is claiming "up to 2x throughput improvement" for speech workloads. vLLM also says those changes benefit all encoder-decoder models on the runtime, not just Cohere Transcribe.
The rollout path is unusually short for a new speech model. The same announcement says support is available day-0 in vLLM, and the attached install snippet [img:1|vLLM install snippet] shows audio extras plus a vllm serve command targeting CohereLabs/cohere-transcribe-03-2026 with remote code enabled. That means teams already serving through vLLM can test both model quality and the new batching path without waiting for a separate backend integration.
Chroma released Context-1, a 20B search agent it says pushes the speed-cost-accuracy frontier for agentic search, with open weights on Hugging Face. Benchmark it against your current search stack before wiring it into production.
breakingAnthropic said free, Pro, and Max users will hit 5-hour Claude session limits faster on weekdays from 5am to 11am PT, while weekly caps stay the same. Shift long Claude Code jobs off-peak and watch prompt-cache misses.
releaseOpenAI rolled out Codex plugins across the app, CLI, and IDE extensions, with app auth, reusable skills, and optional MCP servers. Teams should test plugin-backed workflows and permission models before broad rollout.
releaseCline launched Kanban, a local multi-agent board that runs Claude, Codex, and Cline CLI tasks in isolated worktrees with dependency chains and diffs. Teams can use it as a visual control layer for parallel coding agents on repo chores that split cleanly.
releaseMistral released open-weight Voxtral TTS with low-latency streaming, voice cloning, and cross-lingual adaptation, and vLLM Omni shipped day-0 support. Voice-agent teams should compare quality, latency, and serving cost against closed APIs.
Introducing: Cohere Transcribe – a new state-of-the-art in open source speech recognition.
Cohere just topped Open ASR Leaderboard with a 2B model 👑 > conformer based model > covers 14 languages > comes with @huggingface transformers support day-0!
Introducing: Cohere Transcribe – a new state-of-the-art in open source speech recognition.
This is a very solid release! Apache 2.0 as well, 2B parameters (i.e. quite runnable), 14 languages, and supported using Transformers already. Great work @cohere 👏
Introducing: Cohere Transcribe – a new state-of-the-art in open source speech recognition.