updateMarch 20, 2026

Reason-ModernColBERT claims nearly 90% on BrowseComp-Plus with a 150M retriever

LightOn says its 150M multi-vector retriever is pushing BrowseComp-Plus close to saturation, with results showing search-call behavior and retriever choice matter nearly as much as model size. Retrieval engineers should watch multi-hop setup and tool-calling limits before copying the benchmark.

Reranking Evals Benchmarks Deep Research

3 min read

Reason-ModernColBERT claims nearly 90% on BrowseComp-Plus with a 150M retriever

TL;DR

LightOn's Antoine Chaffin says a 150M retriever, Reason-ModernColBERT, has pushed BrowseComp-Plus to "nearly 90%" and outperformed much larger retrievers, including models "54× bigger," though the post frames that as a benchmark claim rather than a full paper release launch claim.
The strongest concrete evidence in the thread is that retriever choice sharply changes end-to-end agent accuracy: the results table shows GPT-5 rising from 55.90% with BM25 to 70.12% with Qwen3-Embed-8B, while o3 moves from 49.28% to 63.49%.
BrowseComp-Plus appears heavily constrained by multi-hop search behavior, not just model size: Ben Clavié's task example says the benchmark is "near-impossible single hop," and Chaffin's tool-calling note ties poor scores to agents making only one or fewer search calls.