Google DeepMind and Kaggle opened a global challenge to build cognitive benchmarks across learning, metacognition, attention, executive function, and social cognition. Join if you work on evals and want reusable tasks with human baselines instead of another saturated leaderboard.

DeepMind's launch post says the company is running a global Kaggle competition to "build new cognitive evaluations for AI," with $200,000 in prizes for submitted benchmarks. The official Kaggle competition is positioned as a benchmark-building contest, not a model leaderboard, which makes this more relevant to eval engineers than to model hobbyists.
The concrete scope comes from the organizer thread, which says submissions should measure cognitive capabilities across learning, metacognition, attention, executive functions, and social cognition. The same post argues current AI systems are starting to saturate many existing tests, so the bar now is building tasks that remain discriminative as models improve.
The strongest technical signal is that DeepMind is not just asking for harder questions; it is asking for benchmarks grounded in a broader cognitive framework. A practitioner summary of the release says the framework maps 10 cognitive abilities, includes human baselines for each task, and still has "5 abilities" with no reliable evals, which points to gaps in today's benchmarking stack rather than just gaps in model scores.framework summary
That matters because many labs still publish on different eval suites, making cross-model comparisons noisy. DeepMind's announcement explicitly pitches this as a community effort to "put our framework to the test," suggesting the output they want is portable task design that other researchers and model providers can reuse, not a one-off benchmark stunt.
Epoch AI says GPT-5.4 Pro elicited a publishable solution to one 2019 conjecture in its FrontierMath Open Problems set, with a formal writeup planned. Treat it as an early milestone worth reproducing, not blanket evidence that frontier models can already automate math research.
releaseOpenClaw shipped version 2026.3.22 with ClawHub, OpenShell plus SSH sandboxes, side-question flows, and more search and model options, then followed with a 2026.3.23 patch. Teams get a broader plugin surface, but should patch quickly and review plugin trust boundaries as the ecosystem grows.
releaseCursor shipped Instant Grep, a local regex index built from n-grams, inverted indexes, and Bloom filters that drops large-repo searches from seconds to milliseconds. Faster candidate retrieval shortens the coding-agent loop, especially when ripgrep-style scans become the bottleneck.
breakingChatGPT now saves uploaded and generated files into an account-level Library that can be reused across conversations from the web sidebar or recent-files picker. It removes repetitive re-uploading and makes past PDFs, spreadsheets, and images part of a persistent working context.
breakingEpoch AI says GPT-5.4 Pro elicited a publishable solution to one 2019 conjecture in its FrontierMath Open Problems set, with a formal writeup planned. Treat it as an early milestone worth reproducing, not blanket evidence that frontier models can already automate math research.
Help us measure the progress towards AGI (specifically cognitive capabilities) by building benchmarks on @kaggle
How do we measure progress toward AGI? It takes a village – and a bit of healthy competition. 🛠️ We’re launching a global hackathon with @Kaggle to build new cognitive evaluations for AI. With $200k in prizes up for grabs, help us put our framework to the test. Join the Show more