releaseMarch 13, 2026

Arena adds price and context columns to text leaderboards

Arena now shows input-output pricing and max context window directly on its text leaderboards, along with public material on how votes become research-grade data. Use it to compare rank against cost and context limits when choosing models.

Evals Benchmarks

2 min read

Arena adds price and context columns to text leaderboards

TL;DR

Arena's leaderboard update adds two implementation-relevant columns to its text leaderboard: token pricing as input/output cost per 1M tokens, and each model's maximum context window.
The linked leaderboard page turns that into a practical comparison surface, letting engineers weigh rank, price, context length, and license in one table instead of checking vendor docs separately.
Arena is also pointing users to its methodology explainer on how prompt votes are filtered, categorized, and converted into what it calls "research-grade data."
In follow-up discussion, Arena's cofounder thread said the team now runs more validation against abuse and exposes category-level views such as coding, math, instruction following, and creative writing.

What shipped on the leaderboard

Arena.ai

@arena

·Follow

Arena leaderboards now include Price and Context. - Price is shown as input / output cost per 1M tokens, and context shows the maximum context window. Compare Arena scores based on what matters for your use case.

Watch on X

10:08 PM · Mar 13, 2026

180

Read 15 replies

Arena has added price and context columns directly to its text leaderboard. According to the announcement, price is shown as input and output cost per 1M tokens, while context shows the maximum context window.

That matters because the leaderboard is now doing more than rank ordering models by Arena score. The leaderboard page shows those new fields alongside model score, vote count, and license, so teams can compare quality against hard deployment constraints like token budget and long-context support in one place. Arena frames it as a way to compare models "based on what matters for your use case" in the launch post.

How Arena says the ranking data is produced

Peter Gostev (SF: 29 Mar - 3 Apr)

@petergostev

·Follow

Replying to @koltregaskes

This doesn't cover every single thing we do, but to give you an idea youtube.com/watch?v=omT1oh… - very nice video by @cthorrez in our ML team

2:22 PM · Mar 13, 2026

🧾 More sources

TL;DR2 tweets

Top-line facts: the new leaderboard columns and the supporting methodology material Arena is using to explain the data behind rankings.

What shipped on the leaderboard1 tweets

Covers the product change itself: new pricing and context fields on the text leaderboard and what extra comparison data is visible there.

How Arena says the ranking data is produced1 tweets

Groups the evidence explaining Arena's vote-processing and category system, plus the team's follow-up claims about validation and abuse prevention.

Arena adds price and context columns to text leaderboards

TL;DR

What shipped on the leaderboard

How Arena says the ranking data is produced

🧾 More sources

Epoch AI reports GPT-5.4 Pro solved one FrontierMath Open Problems conjecture

OpenHands benchmarks EvoClaw and caps continuous-evolution scores at 38.03%

Vals AI updates SWE-Bench Verified harness to mini-swe-agent and score slips to 78.8%

llm-circuit-finder compares duplicated layers and reports BBH logical deduction gains

Read next

OpenClaw ships 2026.3.22 with ClawHub marketplace and OpenShell SSH sandboxes

Cursor adds Instant Grep: 13ms regex search across millions of files

ChatGPT adds Library tab for reusable file uploads across conversations

Epoch AI reports GPT-5.4 Pro solved one FrontierMath Open Problems conjecture