Together released Open Deep Research v2 and published the hosted app, codebase, blog, and evaluation dataset together. Use it as a full open reference stack for report-generation agents rather than another closed demo.

Together's announcement bundles four artifacts at once: the Open Deep Research v2 app, the source code, a technical blog post, and an evaluation dataset. In the launch thread, the company frames the project as a way to "generate detailed reports on any topic with open source LLMs," and the follow-up app post sends users directly to the hosted app.
The resources post is the key engineering detail because it breaks the release into runnable pieces: a live app at the hosted demo, a public GitHub repo, and a build blog post. That makes v2 inspectable at both the product layer and the implementation layer.
For engineers building agentic research or report-generation systems, the useful part is the packaging. Closed deep-research demos usually expose only the UX; here, Together is publishing the code and the eval dataset alongside the app, so teams can compare prompting, orchestration, and output quality against a concrete baseline rather than reverse-engineering behavior from screenshots.launch thread
The hosted app also lowers the cost of evaluation before adoption: teams can test the workflow in the live app and then inspect how it was assembled in the repo. The release does not publish benchmark numbers in these posts, but it does provide a full open reference implementation for long-form, cited research agents.
LLM Debate Benchmark ran 1,162 side-swapped debates across 21 models and ranked Sonnet 4.6 first, ahead of GPT-5.4 high. It adds a stronger adversarial eval pattern for judge or debate systems, but you should still inspect content-block rates and judge selection when reading the leaderboard.
releaseOpenClaw shipped version 2026.3.22 with ClawHub, OpenShell plus SSH sandboxes, side-question flows, and more search and model options, then followed with a 2026.3.23 patch. Teams get a broader plugin surface, but should patch quickly and review plugin trust boundaries as the ecosystem grows.
releaseCursor shipped Instant Grep, a local regex index built from n-grams, inverted indexes, and Bloom filters that drops large-repo searches from seconds to milliseconds. Faster candidate retrieval shortens the coding-agent loop, especially when ripgrep-style scans become the bottleneck.
breakingChatGPT now saves uploaded and generated files into an account-level Library that can be reused across conversations from the web sidebar or recent-files picker. It removes repetitive re-uploading and makes past PDFs, spreadsheets, and images part of a persistent working context.
breakingEpoch AI says GPT-5.4 Pro elicited a publishable solution to one 2019 conjecture in its FrontierMath Open Problems set, with a formal writeup planned. Treat it as an early milestone worth reproducing, not blanket evidence that frontier models can already automate math research.
Introducing v2 of our Open Deep Research app! Generate detailed reports on any topic with open source LLMs. Fully free & open source. We're releasing everything: evaluation dataset, code, app, and blog 🔥
Here are more resources on how we built open deep research & the code! - Blog post explaining it: together.ai/blog/open-deep… - GitHub repo with the code: github.com/Nutlope/open-d…
Here are more resources on how we built open deep research & the code! - Blog post explaining it: together.ai/blog/open-deep… - GitHub repo with the code: github.com/Nutlope/open-d…