workflowMarch 21, 2026

Autoresearch claims 2718 Elo after 70 experiments on a Rust chess engine

A developer says an autoresearch loop hill-climbed a vibecoded Rust engine to 2718 Elo after running more than 70 experiments under a 500 ms move budget. The real takeaway is the workflow: automated experiment loops can optimize code against a measurable target.

Coding Agents Benchmarks Deep Research

3 min read

Autoresearch claims 2718 Elo after 70 experiments on a Rust chess engine

TL;DR

Developer Deedy Das says Karpathy-style Autoresearch pushed a "vibecoded Rust chess engine" from "expert" strength to a reported 2718 Elo after running "over 70 experiments" and hill-climbing for score results thread.
The underlying engine is conventional search, not a newly trained model: Deedy's technical breakdown says it uses negamax alpha-beta search with pruning, iterative deepening, opening books, and a transposition table, all tested at a 500 ms per-move limit.
The practical engineering story is the loop, not chess. The community reaction around Autoresearch frames it as a reusable pattern for "everything with a measurable metric," where the agent proposes code changes and keeps what improves the target.
That pattern may extend beyond fully verifiable tasks. In a practitioner take, Shreya Shankar says she is "very optimistic" about combining these search loops with qualitative evaluators for more subjective coding-style work.