
Google’s BATS lifts BrowseComp accuracy to 24.6% – cuts agent cost 31%
Stay in the loop
Free daily newsletter & Telegram daily report
Executive Summary
Google’s new BATS framework is the first agent paper this week that feels directly aimed at your cloud bill. The lightweight Budget Tracker plugin alone hits ReAct‑level accuracy on web tasks using 10 instead of 100 tool calls and trims overall cost by about 31%, simply by exposing live “query budget remaining” counters inside the agent’s reasoning loop.
Full BATS orchestration pushes harder on quality. On BrowseComp, a Gemini‑2.5‑Pro agent with BATS scores 24.6% vs ReAct’s 12.6% under the same 100‑tool cap; BrowseComp‑ZH jumps to 46.0% vs 31.5%, and HLE‑Search to 27.0% vs 20.5%. Planning and self‑verification both become budget‑aware, and the paper introduces a unified cost metric that blends token spend and tool‑call prices so you can reason in dollars, not abstract “steps.”
DAIR.AI is already teaching BATS in its agent courses, which is usually a sign a pattern is graduating from research toy to production norm. Paired with Artificial Analysis’ token‑usage charts showing GPT‑5.2‑xhigh burning nearly 2× Sonnet’s tokens for similar work, the direction of travel is clear: agents that ignore budgets are going to feel as dated as models that ignore context windows. Time to make “budget‑aware by design” part of your standard agent spec.
Top links today
- BATS budget-aware tool-use agents paper
- Imaginary RoPE extension for long context
- BEAVER deterministic LLM rule verifier paper
- ARTEMIS AI penetration testing agents paper
- Tom’s Hardware AI semiconductor giga cycle analysis
- Bond Capital trends in artificial intelligence report
- Statista AI investment hubs by region chart
Feature Spotlight
Feature: Budget‑aware agent scaling (BATS)
Google’s BATS makes agents budget‑aware, doubling BrowseComp accuracy vs ReAct under equal budgets and hitting ReAct‑level accuracy with 10× fewer tool calls—clear design guidance for reliable, cheaper web agents.
Cross‑account coverage converges on Google’s BATS/Budget Tracker paper showing that making agents explicitly aware of tool‑call budgets lifts accuracy and slashes cost; mostly web‑agent results and concrete deltas vs ReAct.
Jump to Feature: Budget‑aware agent scaling (BATS) topics