Shopify opened the /autoresearch plugin after an autoresearch loop produced a 53% faster parse-plus-render path and 61% fewer allocations in Liquid. Try it if you want agent-driven optimization backed by tests and measurable performance targets.

Shopify says it has open-sourced the /autoresearch plugin for pi, with the launch post framing it simply as “tell it what you want, it will do the rest.” The public release landed alongside a concrete case study: the Liquid thread says the loop was run against Shopify’s 20-year-old Liquid template engine and produced a 53% faster parse-and-render path plus 61% fewer allocations.
The technical pattern was iterative search, not architecture surgery. As the breakdown describes it, the agent “proposes one small change,” benchmarks it, keeps it if it improves the metric, and reverts it if it does not. The accepted changes were small but cumulative: scanning for }} directly instead of invoking regex repeatedly, freezing and reusing string objects in comparisons, detecting single-condition if statements up front, splitting product.title once at parse time instead of every render, skipping per-iteration loop-limit checks when no limit exists, and parsing simple filter names without the full lexer.
The bigger engineering takeaway is that the workflow was only practical because Liquid already had a strong validation harness. In his notes, Simon Willison highlights “974 unit tests” as the unlock for safely letting an agent try many small performance edits, and he argues that a benchmarking script makes “make it faster” an actionable objective instead of a vague prompt. The attached [img:6|notes screenshot] makes the same point: autoresearch works when the agent can repeatedly test, measure, and discard bad ideas.
That also explains why this is more interesting than a one-off CEO coding anecdote. The thread says the same plugin can target other measurable objectives such as test speed, bundle size, build times, and Lighthouse scores, which makes it a reusable optimization loop for mature codebases with tests and benchmarks already in place. Even the reaction thread around the PR stayed focused on that pattern: one summary called out a 20-year-old production engine improving by more than 50%, while Willison’s writeup treats the result as evidence that coding agents are now effective at systematic benchmark-driven cleanup in legacy systems.
Claude can now drive macOS apps, browser tabs, the keyboard, and the mouse from Claude Cowork and Claude Code, with permission prompts when it needs direct screen access. That makes legacy desktop workflows automatable, and Anthropic is pairing the push with more background-task support for longer agent loops.
releaseOpenClaw shipped version 2026.3.22 with ClawHub, OpenShell plus SSH sandboxes, side-question flows, and more search and model options, then followed with a 2026.3.23 patch. Teams get a broader plugin surface, but should patch quickly and review plugin trust boundaries as the ecosystem grows.
releaseCursor shipped Instant Grep, a local regex index built from n-grams, inverted indexes, and Bloom filters that drops large-repo searches from seconds to milliseconds. Faster candidate retrieval shortens the coding-agent loop, especially when ripgrep-style scans become the bottleneck.
breakingChatGPT now saves uploaded and generated files into an account-level Library that can be reused across conversations from the web sidebar or recent-files picker. It removes repetitive re-uploading and makes past PDFs, spreadsheets, and images part of a persistent working context.
breakingEpoch AI says GPT-5.4 Pro elicited a publishable solution to one 2019 conjecture in its FrontierMath Open Problems set, with a formal writeup planned. Treat it as an early milestone worth reproducing, not blanket evidence that frontier models can already automate math research.
And the most important part: we open sourced the /autoresearch plugin for pi. Just tell it what you want, it will do the rest. github.com/davebcn87/pi-a…
Published some notes on @tobi's autoresearch PR that improved the performance benchmark scores of the Liquid template language (which Tobi created for Shopify 20 years ago) by a hefty 53% simonwillison.net/2026/Mar/13/li…
The meta-insight here is devastating: Liquid has been battle-hardened by thousands of engineers over 20 years. An AI ran a loop for a couple days and found 6+ architectural inefficiencies none of them caught. Not because it's smarter. Because it never got used to "good Show more