Arena now shows input-output pricing and max context window directly on its text leaderboards, along with public material on how votes become research-grade data. Use it to compare rank against cost and context limits when choosing models.

Arena has added price and context columns directly to its text leaderboard. According to the announcement, price is shown as input and output cost per 1M tokens, while context shows the maximum context window.
That matters because the leaderboard is now doing more than rank ordering models by Arena score. The leaderboard page shows those new fields alongside model score, vote count, and license, so teams can compare quality against hard deployment constraints like token budget and long-context support in one place. Arena frames it as a way to compare models "based on what matters for your use case" in the launch post.
Arena paired the UI change with public material on its evaluation pipeline. In the linked explainer, it says user prompts are tagged by category, low-quality or suspicious activity is filtered out, and duplicate or manipulative votes are removed before they affect rankings.
The follow-up discussion thread adds a little more operational detail. Arena says it tracks many categories beyond the overall score, including "Creative Writing," "Instruction Following," occupational domains, and coding, as shown in
. The same thread says a provider once "switched an endpoint against the policy," and that validation against abuse is "a lot better now since that incident." That does not settle broader benchmark skepticism, but it clarifies that Arena is trying to position the leaderboard as a fresh, multi-category, user-driven benchmark rather than a static test set.
Epoch AI says GPT-5.4 Pro elicited a publishable solution to one 2019 conjecture in its FrontierMath Open Problems set, with a formal writeup planned. Treat it as an early milestone worth reproducing, not blanket evidence that frontier models can already automate math research.
releaseOpenClaw shipped version 2026.3.22 with ClawHub, OpenShell plus SSH sandboxes, side-question flows, and more search and model options, then followed with a 2026.3.23 patch. Teams get a broader plugin surface, but should patch quickly and review plugin trust boundaries as the ecosystem grows.
releaseCursor shipped Instant Grep, a local regex index built from n-grams, inverted indexes, and Bloom filters that drops large-repo searches from seconds to milliseconds. Faster candidate retrieval shortens the coding-agent loop, especially when ripgrep-style scans become the bottleneck.
breakingChatGPT now saves uploaded and generated files into an account-level Library that can be reused across conversations from the web sidebar or recent-files picker. It removes repetitive re-uploading and makes past PDFs, spreadsheets, and images part of a persistent working context.
breakingEpoch AI says GPT-5.4 Pro elicited a publishable solution to one 2019 conjecture in its FrontierMath Open Problems set, with a formal writeup planned. Treat it as an early milestone worth reproducing, not blanket evidence that frontier models can already automate math research.
Arena leaderboards now include Price and Context. - Price is shown as input / output cost per 1M tokens, and context shows the maximum context window. Compare Arena scores based on what matters for your use case.
This doesn't cover every single thing we do, but to give you an idea youtube.com/watch?v=omT1oh… - very nice video by @cthorrez in our ML team