OpenAI Codex Subagents roll out – 2M+ weekly users claim, +300% since Jan

Subagents are now available in Codex. You can accelerate your workflow by spinning up specialized agents to: • Keep your main context window clean • Tackle different parts of a task in parallel • Steer individual agents as work unfolds

8:09 PM · Mar 16, 2026

6.2K

Read 305 replies

Codex CLI review pattern: one subagent per risk area, then synthesize

Codex CLI subagents (Workflow): A concrete early workflow is to delegate code review into parallel passes—one subagent each for security, bugs, races, test flakiness, maintainability—then wait for all results and merge them into a single structured review, as shown end-to-end in the CLI screenshots.

This pattern maps well to large repos where sequential review prompts tend to drift or prematurely stop; the screenshot sequence shows Codex spawning multiple explorers, waiting, closing them, then spot-checking the highest-severity findings before producing the final summary.

cedric

@cedric_chee

Tried subagents in Codex CLI on my ZeptoClaw project. It's so good! Codex now supports subagent workflows: orchestrates agents, spawns subagents, routes follow-ups, awaits results, and closes threads.

OpenAI Developers

@OpenAIDevs

8:32 PM · Mar 16, 2026

Codex adoption chatter spikes with “2M+ weekly active” and rapid usage charts

Codex adoption (Signal): Multiple posts pair the subagents rollout with aggressive adoption claims—Sam Altman shared a sharply rising usage chart in the Usage chart, while another estimate claims Codex is at “2M+ weekly active users” and up ~300% since early January in the Weekly active estimate.

Treat this as directional rather than audited (the tweets don’t provide a canonical metric definition), but it’s consistent with “parallelism + cleaner context” being marketed as a workflow unlock rather than a minor UX tweak.

Sam Altman

@sama

The Codex team are hardcore builders and it really comes through in what they create. No surprise all the hardcore builders I know have switched to Codex. Usage of Codex is growing very fast:

5:40 PM · Mar 16, 2026

5.4K

Read 1.1K replies

Codex Subagents docs frame context pollution and sandbox tradeoffs

Subagent concepts (OpenAI Codex): Alongside the feature rollout, OpenAI published Subagents documentation that explicitly frames when to fork work out of the main thread (to avoid “context pollution/rot”), plus guidance on sandboxing/threat models and choosing model configs for different subagents, as laid out in the Docs page and referenced by Codex staff in the Docs pointer.

A practical read of this is: subagents aren’t just “more parallelism”—they’re a hygiene tool for keeping long runs usable without constantly pruning your main context window.

Tibo

@thsottiaux

Hello subagents in codex. Have seen some awesome new and creative workflows emerge from these developers.openai.com/codex/subagent…

OpenAI Developers

@OpenAIDevs

8:31 PM · Mar 16, 2026

804

Read 77 replies

Codex automation pattern: generate a daily “Pending Slack Replies” digest

Codex Automations (Workflow): A detailed automation prompt shows Codex scanning Slack (public channels, private, DMs, group DMs), reading candidate threads, and rewriting a daily markdown section listing who’s waiting, where, when, and what’s blocked—effectively a “response queue” generator, as shared in the Automation prompt.

This is a representative “agent as ops clerk” pattern: high IO (search + reads), strict de-dup (rewrite section each run), and a stable output contract (a single markdown section) that’s easy to diff and trust.

jason liu

@jxnlco

codex app automations: slack pending replies Review Slack for the current user and update today's daily summary note in /Users/jasonliu/vault at agent/daily-summary-YYYY-MM-DD.md with a single section titled ## Pending Slack Replies. Use Slack search and thread reads across Show more

6:29 PM · Mar 16, 2026

Codex team solicits “what’s broken” feedback as subagents land

Codex feedback loop (OpenAI): As subagents roll out, Codex engineering is explicitly asking what users want improved or fixed—“What are we consistently getting wrong with codex?”—and the thread drew a large volume of replies, per the Feedback question.

This is a useful signal for near-term roadmap pressure points (UX rough edges, sandbox friction, or multi-agent coordination issues) because it’s framed as “fixes” rather than feature requests.

Tibo

@thsottiaux

What are we consistently getting wrong with codex that you wish we would improve / fix?

12:45 AM · Mar 17, 2026

546

Read 733 replies

Subagents shift the cost profile: more parallelism, more tokens

Cost and tokens (Codex Subagents): Codex’s own UI flags that delegating work to subagents “may increase token usage,” as shown in the In-product prompt, and builders are already framing this as “token anxiety,” per the Token anxiety post.

The practical implication is that teams adopting fan-out patterns will likely need to watch for new failure modes: parallel tool calls that duplicate context reads, repeated repo scans, and subagent retries that quietly multiply spend.

Codex can now spawn multiple subagents to explore complex tasks in parallel. New subagents are available on desktop apps and CLI across all plans. Spawn them 👀

OpenAI Developers

@OpenAIDevs

8:16 PM · Mar 16, 2026

301

Read 12 replies

Using context fork to isolate skill runs into a subagent

Isolated execution (Subagents): A small but high-leverage tip is to add context: fork when running a skill so it executes in an isolated subagent and the main thread only receives the final output, reducing context bloat for tool-heavy steps, as described in the Fork context tip.

This is essentially a “keep the transcript clean” knob you can apply tactically to noisy operations like browsing, parsing, or bulk file inspection.

Lydia Hallie ✨

@lydiahallie

Btw you can add `context: fork` to run a skill in an isolated subagent. The main context only sees the final result, not the intermediate tool calls It gets a fresh context window with CLAUDE.md + your skill as the prompt. The `agent` field even lets you set the subagent type!

5:55 PM · Mar 16, 2026

716

Read 44 replies

Codex Pro users highlight Spark as a fast subagent mode

Spark subagents (Codex): Codex Pro users are calling out that it’s especially satisfying to spawn “fast subagents” using Spark, implying a workflow split where low-latency parallel helpers handle quick subtasks while the main thread stays focused, per the Spark subagents note.

The tweets don’t include a formal spec for Spark here, but the practical claim is about faster turnaround for parallel subtasks within the same Codex session.

Alexander Embiricos

@embirico

Now you can ask Codex to use subagents. If you're on Pro, it's particularly satisfying to ask Codex to spawn fast subagents using Spark!

OpenAI Developers

@OpenAIDevs

8:45 PM · Mar 16, 2026

160

Read 19 replies

🛠️ Claude Code CLI updates: sandbox controls, output limits, remote envs

Anthropic-side shipping today centers on Claude Code CLI v2.1.77 (lots of fixes) plus new support for custom environments when running Claude Code remotely. Excludes Codex Subagents (covered as the feature).

Claude Code 2.1.77 closes a permissions bypass via PreToolUse hooks

Claude Code CLI (Anthropic): A permissions bug was fixed in 2.1.77 where PreToolUse hooks returning “allow” could bypass deny rules, including enterprise-managed settings, as documented in the changelog thread.

This is the kind of issue that can silently invalidate an organization’s “deny-first” sandbox posture if they rely on hooks for workflow automation.

Claude Code 2.1.77 has been released. 44 CLI changes, 1 system prompt change Highlights: • Raised Opus 4.6 default output limit to 64k tokens; up to 128k for Opus/Sonnet 4.6 • Added sandbox allowRead setting to re-allow read access within denyRead regions • Hooks returning Show more

Claude Code 2.1.77 changes how you resume spawned agents

Claude Code CLI (Anthropic): 2.1.77 removes the Agent tool’s resume parameter; continuing a previously spawned agent now requires SendMessage({to: agentId}), while new Agent invocations always start fresh and need full context, as described in the system prompt diff and reiterated in the changelog thread.

This is a small interface change with real harness implications: any wrappers that assumed Agent(resume=...) will need a migration to “spawn vs. message” semantics.

Replying to @ClaudeCodeLog

Updates to the Claude Code 2.1.77 system prompt Notable changes: 1) Claude can no longer resume a prior subagent via an Agent tool resume parameter. Continuations must use SendMessage with the agent ID/name in the to field, while new Agent calls always start fresh and need full Show more

Read 1 reply

Claude Code 2.1.77 fixes --resume truncating conversation history

Claude Code CLI (Anthropic): The --resume flow in 2.1.77 was patched to prevent silently truncating recent conversation history due to a race between memory-extraction writes and the main transcript, per the changelog thread.

That bug hits exactly where teams expect durability: picking up an interrupted session without losing the most recent constraints and decisions.

Claude Code 2.1.77 fixes an auto-updater bug that could consume tens of GBs

Claude Code CLI (Anthropic): 2.1.77 fixes an auto-updater issue where repeatedly opening/closing the slash-command overlay could trigger overlapping binary downloads, reportedly accumulating tens of gigabytes of memory usage, according to the changelog thread.

It’s a reliability fix, but it also matters operationally for long-running sessions on laptops or small dev boxes where disk and memory headroom is tight.

Claude Code 2.1.77 lifts default output caps for Opus/Sonnet 4.6

Claude Code CLI (Anthropic): Output limits were raised in Claude Code 2.1.77, with Opus 4.6 default output up to 64k tokens and an upper bound of 128k tokens for Opus/Sonnet 4.6, as called out in the changelog thread.

This changes long-file edits and large diffs more than “raw intelligence”; it reduces the need to chunk prompts or force multi-turn continuation when the agent is mid-refactor, while also increasing the risk of runaway outputs if guardrails aren’t set elsewhere.

Claude Code 2.1.77 adds allowRead to carve exceptions inside denyRead

Claude Code CLI (Anthropic): 2.1.77 adds an allowRead sandbox filesystem setting that can re-enable read access inside existing denyRead regions, per the changelog thread.

This is a practical knob for teams that block broad paths by default but need narrow “escape hatches” for specific files (for example, letting the agent read a build manifest inside an otherwise restricted directory).

Claude Code remote now supports custom environments

Claude Code (Anthropic): Remote Claude Code runs now support custom environments when accessed via claude.ai/code, plus Claude desktop and mobile apps, per the remote environment note.

This is specifically about “where the code runs” (and what’s installed there), which is a prerequisite for reproducible remote agent sessions across different repos and toolchains.

cat

@_catwu

We now support custom environments when you run Claude Code remotely via claude.ai/code, Claude desktop, and Claude mobile apps!

Omid Mogasemi

@OmidMogasemi

You can now add a setup script in Claude Code on the web & desktop! Use these to automate setup before Claude Code launches on a cloud environment. It’s particularly useful for installing dependencies, settings, configs, etc.

12:44 AM · Mar 17, 2026

174

Read 23 replies

Claude Code 2.1.77 lands dozens of terminal and tmux quality-of-life fixes

Claude Code CLI (Anthropic): 2.1.77 includes a long tail of terminal UX fixes—tmux clipboard reliability, iTerm2/tmux crash cases, vim-mode key handling, hyperlink double-open behavior, and other CLI ergonomics—summarized in the changelog thread.

It’s not one headline feature, but it reduces the “death by papercut” friction in day-to-day agent sessions (especially inside tmux/screen and SSH-heavy workflows).

Claude Code with 1M context: fewer resets, mixed quality reports

Long-context workflows (Claude): Multiple builders say 1M context changes how they work—less context clearing and fewer mid-session compactions—per the long-session report and the 1M context benchmark post.

There’s also visible disagreement: one report calls Opus 4.6 1M context “extremely bad” in the negative take, which suggests quality and harness settings still dominate whether long-context feels stable in practice.

BridgeMind

@bridgemindai

Spent 20+ hours vibe coding with Claude Opus 4.6 1M context. It's a noticeably different workflow. No more auto compaction. No more lost context mid-session. Full codebase. Full conversation history. All in memory. Complex refactors that used to take multiple sessions now Show more

12:37 PM · Mar 16, 2026

122

Read 22 replies

WSL sandboxing pain: Docker-based Claude Code sandboxes reported unreliable

Claude Code sandboxing: A recurring pain point surfaced around sandbox reliability on WSL—Docker sandboxing is described as “unreliable as hell,” and the built-in /sandbox mode is criticized as insufficient for “properly AFK workflows,” per the WSL sandbox complaint.

This is a practical constraint for teams trying to run unattended agent loops on Windows dev machines; the thread is explicitly asking for alternatives to Docker-based isolation.

Matt Pocock

@mattpocockuk

What alternatives are there to docker sandbox for sandboxing Claude Code? It's unreliable as hell on WSL. /sandbox doesn't do what I want - cc can always get around it and it doesn't allow for properly AFK workflows.

9:28 AM · Mar 16, 2026

106

Read 68 replies

📈 GPT‑5.4 + ChatGPT quality signals (ramp, tone, and regressions)

Continues the GPT‑5.4 story with new numbers and product-quality adjustments: API ramp metrics, user sentiment on upgrades, and a ChatGPT tone fix targeting “teaser-style” phrasing. Excludes Codex Subagents (feature).

GPT‑5.4 API ramp hits 5T tokens/day and a claimed $1B net-new run rate

GPT‑5.4 (OpenAI): API usage ramped to 5T tokens/day within a week, with OpenAI leadership claiming it’s already handling more volume than the entire API did one year ago and reaching an annualized $1B net-new revenue run rate, per the Ramp metrics. Sam Altman reinforced the “builders building fast” framing off the same first-week signal in the First-week reaction.

The public numbers are directionally useful for capacity planning and vendor-risk discussions, but they’re also self-reported (no external telemetry in the tweets).

Greg Brockman

@gdb

gpt-5.4 has ramped faster than any other model we've launched in the API: within a week of launch, 5T tokens per day, handling more volume than our entire API one year ago, and reaching an annualized run rate of $1B in net-new revenue. it's a good model, try it out!

6:04 PM · Mar 16, 2026

3.3K

Read 339 replies

ChatGPT updates GPT‑5.3 Instant to reduce teaser-style follow-ups

ChatGPT GPT‑5.3 Instant (OpenAI): OpenAI shipped a targeted tone change that explicitly aims to reduce “teaser-style phrasing” like “If you want…”, “You’ll never believe…”, and “I can tell you these three things…”, as shown in the Release notes screenshot.

This appears to respond directly to the user perception that ChatGPT responses increasingly end with clickbaity questions, as called out in the User complaint.

Tibor Blaho

@btibor91

GPT-5.3 Instant in ChatGPT is getting an update that improves follow-up tone and reduces teaser-style phrasing in responses

9:01 PM · Mar 16, 2026

524

Read 44 replies

Builders report noticeable behavior shifts moving from GPT‑5.3 to GPT‑5.4

GPT‑5.4 (OpenAI): Multiple builders describe the 5.3→5.4 upgrade as a real behavioral shift—“I generally agree… feel it myself on the 5.3 → 5.4 upgrade,” as Sam Altman put it in the Upgrade reaction. One deep-research user complaint calls GPT‑5.4 “really annoying” and “reflexively contrarian,” illustrated with a “my house is on fire” analogy in the Deep-research gripe.

• Tool choice fallback: a user claims they’re “fully back to Codex 5.3,” questioning whether early 5.4 coding hype matched reality in the Switch back post.

Net: early sentiment looks mixed—strong ramp and throughput signals, but enough tone/interaction complaints that some users are actively reverting their default workflows.

Sam Altman

@sama

It is also very smart, but...I generally agree with this, and really feel it myself on the 5.3 -> 5.4 upgrade.

rohit

@krishnanrohit

GPT 5.4 is very good, but its most distinguishing characteristic is its humanity. 5.3 Codex was already incredible at coding so it's interesting to see what made it so much more successful. People claim they want a 10x autist savant coder, but what they want is personality.

9:43 PM · Mar 16, 2026

1.4K

Read 396 replies

GPT‑5.4 xhigh ties the Intelligence Index, with long first-token latency cited

GPT‑5.4 xhigh (OpenAI): Artificial Analysis reports GPT‑5.4 (xhigh) tying for the lead on its “Intelligence Index,” alongside concrete serving characteristics—$2.50/M input tokens, $15/M output tokens, ~72.5 tokens/sec output speed, and ~185s first-token latency—in the Index and pricing snapshot.

Treat the ranking as a proxy signal (it’s one index with its own mix of evals), but the latency number is an operational constraint that matters for interactive UX and agent loops.

Kol Tregaskes

@koltregaskes

OpenAI GPT-5.4 (xhigh) scores 57 on the Artificial Analysis Intelligence Index - tied for the lead. - Pricing $2.50 per million input tokens and $15.00 per million output tokens. - Output speed 72.5 tokens per second; first-token latency around 185 seconds. - Generated 120 Show more

Artificial Analysis

@ArtificialAnlys

OpenAI’s new GPT-5.4 (xhigh) lands equal first in the Artificial Analysis Intelligence Index alongside Gemini 3.1 Pro, but at a cost increase compared to GPT-5.2 @OpenAI's GPT-5.2 (xhigh, 51) was the most intelligent model as at end of 2025. Since then, OpenAI released two

3:29 PM · Mar 16, 2026

157

“Clickbaity follow-up questions” show up as a ChatGPT quality complaint

ChatGPT (OpenAI): A high-engagement user complaint says “almost all responses ends with clickbaity questions,” giving examples like “if you want, i can tell you the one mistake…” in the Clickbait complaint. The most concrete product response in today’s tweets is OpenAI’s GPT‑5.3 Instant tone tweak that aims to reduce this exact class of phrasing, as shown in the Release notes screenshot.

eric zakariasson

@ericzakariasson

whats going on with chatgpt these days? almost all responses ends with clickbaity questions "if you want, i can tell you the one mistake that almost everyone forgets"

1:49 PM · Mar 16, 2026

10.4K

Read 845 replies

A worked example of “backsolving” token/day from an unlabeled usage chart

Measurement hygiene: A Codex usage chart was shared without a y-axis in the Usage chart, and one practitioner attempted to reconstruct “T tokens/day” by anchoring the bar heights to an earlier public disclosure window, documenting the full backsolving approach in the Y-axis reconstruction.

This pattern can be useful for internal forecasting when the inputs are trustworthy, but the same method can create false precision when any of the anchor assumptions are off.

Sam Altman

@sama

The Codex team are hardcore builders and it really comes through in what they create. No surprise all the hardcore builders I know have switched to Codex. Usage of Codex is growing very fast:

5:40 PM · Mar 16, 2026

5.4K

Read 1.1K replies

GPT‑4.5 nostalgia: “last creative-writing-first” OpenAI model claim resurfaces

Model positioning: A community thread claims GPT‑4.5 was the last OpenAI model optimized for creative writing before a shift toward research/coding, and that it was discontinued largely due to cost, in the Creative-writing claim.

No new supporting data is provided in the tweets, but it’s a signal of how some users segment “coding/research” vs “writing voice” as separate product qualities.

Haider.

@slow_developer

gpt-4.5 was the last model built for creative writing before openai shifted more toward research and coding even now, only opus 4.6 comes close. openai probably won't make a model like that again, mainly because it was discontinued due to being too expensive to run

8:20 AM · Mar 16, 2026

353

Read 45 replies

✅ Code quality, CI, and review automation (security agents, merge conflicts, ROI)

Focuses on correctness + maintainability workflows as agents write more code: PR review automation, security scanning agents, merge conflict delegation, and ROI instrumentation. Excludes Codex Subagents (feature).

Cursor shares templates for always-on security agents reviewing 3K+ PRs/week

Cursor (Cursor): Cursor says it now runs a fleet of security agents continuously on its own codebase—reviewing 3,000+ internal PRs per week and catching 200+ vulnerabilities—and it’s publishing automation templates so other teams can replicate the setup, per the templates announcement.

• Why this matters operationally: this is positioned as “always-on” review capacity (not a one-shot scan), which changes how you budget CI time and how you triage findings—especially as PR volume rises faster than human review bandwidth, as shown in the templates announcement.

Cursor

@cursor_ai

We built a fleet of security agents to run continuously on our codebase. We're sharing new automation templates for you to do the same.

5:26 PM · Mar 16, 2026

932

Read 54 replies

Code review is becoming the bottleneck as code generation accelerates

Review workflows: multiple posts describe the development bottleneck shifting from “getting code written” to “reviewing what got written,” with one engineer calling it “jarring” that norms aren’t set up for this pace in the bottleneck observation.

• Where the debate goes next: Elon Musk predicts “code review will swiftly become a thing of the past,” per the Musk reply, while practitioners report a stopgap loop that looks like “a refreshing diff that I just stare at while it churns,” per the diff watching comment.

Logan Kilpatrick

@OfficialLoganK

The bottleneck has so quickly moved from code generation to code review that it is actually a bit jarring. None of the current systems / norms are setup for this world yet.

1:53 AM · Mar 17, 2026

2.1K

Read 229 replies

Factory launches Analytics linking tokens to shipped software

Factory Analytics (FactoryAI): Factory shipped an analytics layer meant to make agent ROI auditable end-to-end—tracking tokens → usage → commits → pull requests → shipped software, as described in the launch announcement.

• What’s new vs typical “LLM spend” dashboards: the product framing is outcome-linked instrumentation (engineering artifacts and throughput), not just token burn and latency, per the launch announcement.

Factory

@FactoryAI

Today we’re launching Factory Analytics. Enterprise teams can now see exactly how AI agents translate into engineering outcomes: tokens → usage → commits → pull requests → shipped software. The missing layer for proving ROI in agent-native software development.

5:11 PM · Mar 16, 2026

139

Read 12 replies

Codex-driven refactor with mutation targets, run overnight

SCRAP (Uncle Bob Martin): Uncle Bob describes iterating with Codex to build a Speclj “SCRAP” analyzer, then having Codex run harder passes (CRAP + mutation) overnight with explicit constraints—e.g., reduce CRAP below 8 and split files with >50 mutation sites—ending with a refactor into 8 smaller files and test growth from 11 tests/48 assertions to 28 tests/109 assertions, per the workflow writeup.

• Artifact you can inspect: the resulting tool and README are available in the GitHub repo, which makes this a concrete example of “sleep while the agent hardens,” not just a prompt anecdote.

Uncle Bob Martin

@unclebobmartin

So I had codex write a scrap tool -- a tool like crap4clj but for speclj specs. Codex and I iterated on a solution that identifies specs that could be improved. You can check the readme for details. (github.com/unclebob/scrap) I got the tool working, and then pushed it. Then Show more

1:03 PM · Mar 16, 2026

Read 2 replies

zed.dev teases delegating merge-conflict resolution to an agent

Zed merge conflicts (zed.dev): zed.dev previewed a CLI flow that hands merge-conflict resolution to an agent—prompting once and returning “Conflict resolved” in the terminal, per the feature teaser.

• Workflow implication: this pushes conflict resolution from a manual edit loop into an approval-style step (y/n), which will likely shift how teams gate auto-merges and how they audit agent edits, as shown in the feature teaser.

Zed

@zeddotdev

Coming Zednesday. Delegate merge conflict resolution to the agent.

4:44 PM · Mar 16, 2026

531

Read 20 replies

Claude Code 2.1.77 fixes missing cost tracking in non-streaming fallback

Claude Code CLI 2.1.77 (Anthropic): the 2.1.77 changelog includes a fix where cost and token usage weren’t tracked when the API fell back to non-streaming, plus a security-relevant change where PreToolUse hooks returning “allow” no longer bypass deny permission rules (including enterprise-managed settings), per the changelog thread.

• Why this lands in CI/ROI land: if your internal reporting ties agent spend to outcomes, the non-streaming fallback path can silently skew dashboards unless it’s metered; this fix closes that hole per the changelog thread.

OpenClaw core refactor pushes more into plugins

OpenClaw (OpenClaw): OpenClaw maintainers report a substantial refactor that removes code from core for lower memory use and more plugin-based extensibility, summarizing it as “everything can be a plugin now,” and adding support for Claude/Codex/Cursor plugin bundles, per the maintainer update.

• Why code quality folks care: plugin-izing core is a maintainability move—shrinking the trusted base and making behavior more testable/reviewable by isolating integrations, per the maintainer update.

Peter Steinberger 🦞

@steipete

Replying to @steipete

We made a ton of progress on that today. Lots of code gone from core. Faster, less memory use overall. Need another day or two to stabilize. Everything can be a plugin now. Also added support for Claude/Codex/Cursor plugin bundles

9:18 AM · Mar 16, 2026

192

Read 31 replies

🧰 Coding agent tool ecosystem (Cursor/Codex/Claude comparisons, UX frontier)

This bucket is for the cross-tool ecosystem dynamics: “which tool feels better,” harness UX differences, and shifting bottlenecks (code gen → review). Excludes Codex Subagents (feature) and Claude Code v2.1.77 specifics (covered separately).

The bottleneck moved from code generation to code review

Code review workflow shift: Several builders are calling out that the “jarring” part of agentic SWE isn’t generating code anymore—it’s reviewing it, and the surrounding norms/tooling aren’t built for that pace yet, as described in Review bottleneck note. Some are already adapting by treating review like a live feed (“a refreshing diff that I just stare at while it churns”), per Diff watching habit, while others predict review itself will disappear, per Code review prediction.

The throughline is operational: as autonomy increases, teams need new review ergonomics (continuous diffs, gating policies, provenance, automated checks) more than they need another +5% in raw codegen quality.

Logan Kilpatrick

@OfficialLoganK

The bottleneck has so quickly moved from code generation to code review that it is actually a bit jarring. None of the current systems / norms are setup for this world yet.

1:53 AM · Mar 17, 2026

2.1K

Read 229 replies

Packaging process as skills becomes a daily driver workflow

Agent skills as process: One pragmatic approach is to codify repeatable engineering rituals into slash commands—/grill-me, /write-a-prd, /prd-to-issues, /tdd, /improve-my-codebase—so the agent runs the same playbook every time, per Five daily skills.

A complementary alignment trick is keeping a shared domain glossary (an “ubiquitous language” doc) open while planning, which the author describes as “unbelievably high value-per-token,” per Ubiquitous language doc.

The implied pattern is less “prompt better” and more “standardize how work happens,” so context stays stable across humans and agents.

Matt Pocock

@mattpocockuk

I've been an engineer for nearly a decade. Right now, process has never been more important. And skills are the best way to bundle up processes for agents. Here are the 5 I use every day: /grill-me /write-a-prd /prd-to-issues /tdd /improve-my-codebase

8:52 PM · Mar 16, 2026

1.5K

Read 41 replies

VS Code adds experimental agent browser tools via a feature flag

VS Code (Microsoft): An experimental “Agentic Browser Tools” surface lets agents open pages, read content, click elements, and verify changes directly in an integrated browser, enabled via workbench.browser.enableChatTools as shown in Feature flag announcement.

This is a concrete step toward tightening the webdev loop inside the IDE: if the agent can both edit code and validate UI state in the same environment, fewer workflows need an external “computer use” agent or separate Playwright harness.

Visual Studio Code

@code

🌐 Agentic Browser Tools (Experimental) in @code! Agents can now open pages, read content, click elements, and verify changes directly in the integrated browser while building your web app. Enable ⚙️ workbench.browser.enableChatTools to try it out. Learn mode: Show more

12:24 AM · Mar 17, 2026

1.0K

Read 28 replies

BridgeMind adds a 12-agent “Canvas Mode” workspace

BridgeMind (BridgeSpace): A new “Canvas Mode” tiles 12 concurrent agents into one screen—explicitly mixing Codex, Claude Code/Opus, and GPT‑5.4 configurations—so you can run parallel threads without tab-switching, per Canvas mode launch.

This is an opinionated take on orchestration UX: instead of hiding parallelism behind a single chat, it makes parallel execution the primary interface, with each thread visible as an operational unit (model, directory, prompt, state).

BridgeMind

@bridgemindai

BridgeMind just shipped Canvas Mode inside BridgeSpace. 12 AI agents running at the same time. One screen. Claude Opus 4.6. Claude Code. Codex. GPT 5.4 High. All of them. Simultaneously. No tabs. No switching. Just shipping. bridgemind.ai

8:20 PM · Mar 16, 2026

Builders say Claude Code wins CLI while Codex wins desktop

Claude Code vs Codex UX: A crisp take making the rounds is that Claude Code CLI feels better than Codex CLI, but Codex Desktop feels better than Claude Code Desktop, framed as “a jagged UX frontier” in CLI vs desktop take. A separate, language-specific datapoint echoes the same theme: one Swift-focused user reports Codex can “work for a long time” with fewer interruptions, while Claude Code “asks permissions” and still doesn’t finish the job, per Swift workflow comparison.

What’s missing is a shared rubric: most comparisons are about interruption rate (permissions/prompts), not correctness on a common harness.

Hamel Husain

@HamelHusain

Claude Code CLI > Codex CLI Codex Desktop > Claude Code Desktop It’s a jagged UX frontier

12:35 AM · Mar 17, 2026

144

Read 13 replies

Codex tip: generate sandbox rules from prior conversations

Codex (OpenAI): A practical workflow for safer automation is to run the task in the sandbox while manually approving tool requests, then ask Codex to analyze the session transcript and produce a reusable rules file so future runs don’t need “full access,” as described in Rules file tip.

This pattern treats permissions as something you can iteratively “compile” out of a successful run—useful when you want unattended automations but still need a principled permission boundary.

dominik kundel

@dkundel

💡 Codex tip: Rules You can have Codex analyze past conversations. This comes in especially handy if you want to create rules files to have your Codex automations run inside the sandbox without using "full access". Have Codex do the task inside the sandbox and approve requests, Show more

10:18 PM · Mar 16, 2026

255

Read 13 replies

Tooling harness effects show measurable deltas in model benchmarks

Cursor harness benchmarking: Following up on Harness theories (builders arguing harness design drives outcomes), a new comparison claims Cursor increased frontier-model performance by ~11% on average versus other harnesses, as referenced in Benchmark summary with more detail in the linked full benchmark video at Full benchmark video.

The key implication is methodological: “which model is best” is increasingly inseparable from “which runtime loop” (context packing, tool execution, evaluation gating, retries) the model is embedded in; without shared harness baselines, leaderboard talk can be misleading.

edwin

@edwinarbus

Matt Maher tested frontier models in Cursor v. other harnesses. Cursor boosted model performance by 11% on average: Gemini: 52% → 57% GPT-5.4: 82% → 88% Opus: 77% → 93% His benchmark measures how well models implement a 100-feature PRD. @cursor_ai consistently outperformed.

7:26 PM · Mar 16, 2026

447

Read 48 replies

WSL sandbox reliability becomes a tool-choice factor

Sandboxing UX: A recurring complaint is that Docker-based sandboxing for Claude Code can be “unreliable as hell on WSL,” and that /sandbox doesn’t enable the fully unattended (“AFK”) workflows people want, per WSL sandbox complaint. When paired with anecdotes that Claude Code’s permission prompting interrupts longer tasks, per Permission prompt gripe, sandbox reliability and permission ergonomics start looking like competitive differentiators—not “nice-to-haves.”

This is also a reminder that local dev environments (WSL, corp laptops, locked-down Docker) are where many agent workflows fail first.

Matt Pocock

@mattpocockuk

9:28 AM · Mar 16, 2026

106

Read 68 replies

Cursor team asks about GPT‑5.4 optimization vs Opus

Cursor (Anysphere): A pointed public question—“Who do we work with to make Cursor as 5.4 optimized as opus”—highlights that “best model” and “best in a given harness” are diverging concerns, per Optimization question.

This frames optimization as a toolchain problem (prompt scaffolds, diff/plan UX, tool routing, safety gates, latency hiding), not only a weights problem; it also implies builders are noticing systematic deltas between how GPT‑5.4 and Opus behave inside the same IDE loop.

jason liu

@jxnlco

Who do we work with to make Cursor as 5.4 optimized as opus.

edwin

@edwinarbus

10:52 PM · Mar 16, 2026

NICAR workshop handout: coding agents for data analysis workflows

Coding agents for data analysis: Simon Willison published a 3-hour NICAR workshop handout covering how tools like Codex CLI and Claude Code can support data exploration, scraping, visualization, and analysis workflows, as shared in Workshop handout post and detailed in the linked workshop handout at Workshop handout.

It’s a grounded, task-shaped artifact (journalism constraints, real datasets, repeatable steps) rather than generic prompt advice, and it reflects an emerging norm: the “coding agent” is being used as an interactive analyst that writes and runs small programs, not just as a code generator.

Simon Willison

@simonw

Here's the handout for a three hour workshop I presented at the NICAR data journalism conference on using coding agents (Codex CLI, Claude Code etc) for data exploration, visualization and analysis simonwillison.net/2026/Mar/16/co…

8:13 PM · Mar 16, 2026

483

Read 19 replies

🖥️ ‘Computer’ agents on-device (Comet/Perplexity, Manus desktop, local routines)

A cluster of products is converging on agents that can operate your local browser/desktop without bespoke connectors. Excludes Codex Subagents (feature) and OpenClaw/NemoClaw (separate category).

Perplexity Computer can take full control of Comet via a browser agent

Computer in Comet (Perplexity): Perplexity shipped an on-device-style workflow where Computer spins up a browser agent inside Comet that can operate any site (including logged-in apps) with user permission—explicitly positioned as requiring no connectors or MCPs, per the launch post in Comet control announcement.

• Integration surface change: Instead of wiring OAuth/connectors per app, the agent rides your existing browser session and UI; that shifts the “hard part” from integrations to supervision and approvals, as shown in the Comet control announcement demo.

Perplexity

@perplexity_ai

Computer can now take full control of Comet to complete tasks. When you’re in Comet, Computer spins up a browser agent that can access any site or logged‑in app with your permission, without the need for connectors or MCPs. Available to all Computer users on Comet.

5:36 PM · Mar 16, 2026

1.7K

Read 126 replies

Manus ships “My Computer,” a desktop agent that can run local commands

My Computer (Manus): Manus launched a desktop app that turns its agent into a local computer operator on macOS and Windows—able to execute command-line actions against your machine, while requiring explicit authorization for access/execution as described in the launch coverage from Desktop app launch and the longer capabilities rundown in Capabilities list.

• What it’s aimed at: The product pitch emphasizes local file ops (organizing photos, renaming invoices) and local app-building flows, as listed in Capabilities list.
• Control model: Multiple posts frame this as “out of the cloud sandbox, onto your desktop,” but still gated by permission/approval semantics, per Desktop demo clip and Hybrid workflow summary.

BREAKING 🚨: Manus AI released "My Computer", a new desktop app that operates as a local AI agent. My Computer is available today for all macOS and Windows users.

Manus

@ManusAI

My Computer in action: 👉Organizing thousands of unsorted photos 👉Renaming hundreds of invoices 👉Build desktop apps in swift, entirely on your computer. No code written manually. 👉Combine with existing Connectors to create seamless automated workflows. 👉Create local

3:13 PM · Mar 16, 2026

780

Read 37 replies

Browser session becomes the ‘connector’ for computer agents

Agent auth pattern: Several posts converge on a practical shortcut for tool integration—drive the UI in a user’s already-authenticated local browser session instead of building connectors/MCP integrations; Perplexity explicitly markets this in the Comet flow described in Comet control announcement, while broader commentary frames “on your computer” agents as the way many users will get this behavior without installing agent frameworks, as argued in Local agent convergence note.

The trade-off implied by these posts is that integration complexity drops, but supervision, permissioning, and auditability become the primary engineering surface.

Perplexity

@perplexity_ai

5:36 PM · Mar 16, 2026

1.7K

Read 126 replies

Perplexity Computer is now available on Android

Perplexity Computer (Perplexity): The Computer agent is now shipping on Android, extending availability beyond iOS/desktop and reinforcing the “agent everywhere” positioning in the rollout note from Android launch and follow-up coverage in Computer everywhere recap.

• Cross-device implication: This brings the same agent interaction model to the second major mobile platform; Perplexity frames it as part of having Computer on iOS, Android, and Comet in Platform coverage note.

Perplexity

@perplexity_ai

Computer is now on Android.

Perplexity

@perplexity_ai

Perplexity Computer is now on mobile. Start any task on any device. Manage Computer from your phone or desktop with cross-device synchronization. Available now for iOS in the Perplexity app. Coming soon to Android.

3:13 PM · Mar 16, 2026

1.2K

Read 61 replies

Multiple vendors are converging on “agent lives on your machine”

Market convergence: Builders are explicitly lumping together Manus computer, Perplexity computer, and Claude Cowork as the same emerging product category, then asking “who is next,” as in Who is next list and the follow-on “which path wins” framing in Convergence comparison.

The common claim across these threads is that local/on-device control plus cloud reasoning is becoming the default packaging for agents that need to touch authenticated workflows.

Chubby♨️

@kimmonismus

Now we got: -Manus computer -Claude cowork -perplexity computer Curious who is next

Manus

@ManusAI

Today, we're taking Manus out of the cloud and putting it on your desktop. Introducing My Computer, the core feature of the new Manus Desktop app. It’s your AI agent, now on your local machine.

3:53 PM · Mar 16, 2026

554

Read 83 replies

Consent/ethics edge: As computer agents get closer to operating real accounts and messaging surfaces, community pushback is sharpening around disclosure—one widely shared stance is that it’s “insanely disrespectful for an AI agent to talk to real people without consent or at least disclosure,” as stated in Consent norm critique.

This frames an emerging requirement for agent builders: not just permission to access tools, but norms and UX around informing the humans on the other end.

Mitchell Hashimoto

@mitchellh

It's so insanely disrespectful for an AI agent to talk to real people without consent or at least disclosure. This is the type of stuff I'm hugely supportive of government regulation. The FCC must expand the definition of robocalling and TCPA-style regulation to online AI.

5:35 PM · Mar 16, 2026

1.5K

Read 79 replies

🦞 OpenClaw ecosystem + NVIDIA NemoClaw reference stack

The OpenClaw storyline continues with NVIDIA positioning NemoClaw/OpenShell as an enterprise-ready reference stack (security, sandboxing) and broad “every company needs an agent strategy” messaging. Excludes Codex Subagents (feature).

NVIDIA launches NemoClaw, an OpenClaw reference stack built around OpenShell

NemoClaw (NVIDIA): NVIDIA introduced NemoClaw as an open-source reference stack around OpenClaw, bundling OpenShell (sandbox runtime) plus “security and privacy controls” and a single-command installer, as detailed in the NVIDIA newsroom post and echoed in the keynote excerpt. The practical claim is that enterprises can standardize agent runtime boundaries (network, data, approvals) instead of hand-rolling a sandbox per agent.

• Install flow and reality check: the keynote screenshot shows curl …/nemoclaw.sh | bash followed by nemoclaw onboard, visible in the keynote excerpt, while an attendee reports it “doesn’t work yet for me” in the same keynote excerpt.
• Repo status: the public repository frames NemoClaw as early-stage and Linux/Docker oriented, according to the GitHub repo, which is a different expectation than a desktop-native agent app.

Alex Volkov

@altryne

Jensen is covering "OpenClaw" - "is the most popular open source project in the history of humanity and it did so in just a few weeks" Announcing Nemoclaw, an @openclaw wrapper by Nvidia. Nemoclaw.sh doesn't work yet for me

8:08 PM · Mar 16, 2026

NVIDIA’s GTC keynote frames OpenClaw as the default “agent strategy” layer

OpenClaw (NVIDIA/community): Jensen Huang used the GTC stage to frame OpenClaw as “the most popular open source project in the history of humanity” and said “every company needs an OpenClaw strategy,” per the keynote excerpt. The slide used as evidence compares GitHub star trajectories for OpenClaw vs Linux/React, as captured in the star history slide.

• Platform framing: the keynote deck also depicts agents as a composable platform—LLMs + tools + files + memory + sub-agents—matching the “Agents - a new computing platform” diagram shown in the platform diagram.
• Enterprise packaging narrative: “SaaS → Agent-as-a-Service” language shows up in the same keynote context, as seen on the AgaaS slide.

Alex Volkov

@altryne

8:08 PM · Mar 16, 2026

OpenClaw maintainer reports a lean-core refactor and upcoming plugin bundles

OpenClaw core (OpenClaw): The maintainer says OpenClaw has removed significant code from core for speed/memory gains and is moving toward “everything can be a plugin,” while also adding support for Claude/Codex/Cursor plugin bundles, per the refactor note. The implication is a cleaner extension surface for teams that want standardized toolchains and policies without carrying a long-lived fork.

Peter Steinberger 🦞

@steipete

Replying to @steipete

9:18 AM · Mar 16, 2026

192

Read 31 replies

Comet ships opik-openclaw for tracing OpenClaw agent runs end-to-end

opik-openclaw (Comet): Comet released opik-openclaw, a native OpenClaw plugin that traces LLM calls, tool execution, token cost, and sub-agent delegation, as summarized in the plugin brief and detailed in the plugin write-up. It’s positioned as solving the “what happened during this run?” gap once agents become multi-step, multi-tool, and long-lived.

Deep Learning Weekly

@dl_weekly

🤖: A blog post announcing opik-openclaw, a native OpenClaw plugin from Comet that adds full-stack observability, tracing every LLM call, tool execution, token cost, and sub-agent delegation, to address the visibility gap in autonomous agent workflows. buff.ly/SnF2XjL

1:01 PM · Mar 16, 2026

NVIDIA’s “AI Natives” ecosystem slide places OpenClaw in the protocol layer

NVIDIA ecosystem mapping (NVIDIA): NVIDIA’s GTC “AI Natives” ecosystem slide lists OpenClaw under “Agent frameworks / protocols,” placing it alongside other stack-layer primitives rather than as an app-level tool, as shown in the ecosystem slide. A separate infographic counts “103 AI Native” companies, reinforcing the degree of ecosystem curation happening in public, per the AI natives chart.

• Coalition adjacency: the Nemotron Coalition’s founding-member slide includes several of the same ecosystem names (e.g., Cursor, LangChain, Perplexity, Mistral), as shown in the coalition members slide, which is the keynote context where OpenClaw gets elevated.

Baseten

@basetenco

Live from Jensen's keynote remarks at GTC: "The inflection point of inference has arrived. AI now has to think. In order to think, it has to inference. AI now has to do. In order to do, it has to inference. AI has to read. In order to do so, it has to inference. It has to Show more

7:10 PM · Mar 16, 2026

Read 2 replies

Hermes vs OpenClaw migration chatter centers on setup and “things just work”

Hermes Agent vs OpenClaw (community): Multiple posts describe switching from OpenClaw to Hermes with claims that “things just work” more reliably and the transition is smooth, as echoed in the migration quote and the switching chatter. The comparison is being made explicitly (“Hermes vs OpenClaw”) in the head-to-head post, suggesting harness defaults and operational ergonomics are a current fault line inside agent-tool adoption.

Zeneca🔮

@Zeneca

I migrated from Openclaw -> Hermes and so far, so good - Things "just work" a lot better - The transition of data from OC -> Hermes was very easy too - It doesn't seem to randomly crash and stop working on me - When I ask it to do things, it'll create an actual skill, rather Show more

8:30 AM · Mar 16, 2026

496

Read 98 replies

Build-a-Claw demos turn OpenClaw setup into a repeatable booth workflow

Build-a-Claw (NVIDIA/OpenClaw): GTC attendees are posting booth shots and hands-on setup clips that show OpenClaw onboarding as a guided, physical demo loop rather than a docs-only experience, per the Build-a-Claw photo and the DGX Spark setup post. The visible focus is “get an agent running now” on NVIDIA hardware, consistent with the NemoClaw/OpenShell packaging pitch.

jonah lipsitt

@jonah_lipsitt

Check out Build-a-Claw at @nvidia’s GTC! @MatthewBerman @NVIDIA_AI_PC

12:05 AM · Mar 17, 2026

Read 2 replies

🏗️ Infra deals & capacity bets (cloud, capex, enterprise distribution)

Covers contracts and distribution moves that directly affect availability and enterprise adoption: GPU capacity deals, OpenAI enterprise distribution, and cloud routing. Excludes NVIDIA hardware details (separate) and Codex Subagents (feature).

OpenAI courts private equity to distribute enterprise AI via a ~$10B joint venture

OpenAI × private equity (enterprise distribution): Reuters reports OpenAI is in advanced talks with TPG, Bain Capital, Brookfield, and Advent to form a joint venture valued around $10B pre-money with roughly $4B in investor commitments, positioning PE portfolio companies as a fast path to enterprise rollouts, as shown in the Reuters screenshot and echoed with more structure detail in the deal recap.

• Competitive angle: The same reporting notes Anthropic is also exploring PE partnerships (with different equity terms), which turns “enterprise distribution” into a financing-and-channel strategy rather than just direct sales motions, per the deal recap.

The open question is how much of the JV is “deployment + services” (embedding AI engineers, change management) versus a pure resell channel—Reuters’ framing suggests the former, which would affect how quickly big orgs standardize on a vendor’s models and tooling.

OpenAI is in active conversation with TPG, Bain, Brookfield, and Advent to distribute its AI enterprise solutions across their portfolio companies, according to Reuters. A big race for enterprise customers 👀

Fidji Simo

@fidjissimo

This news came out a little earlier than we planned; we're excited to be building a deployment arm and will share more details soon. Companies have a ton of urgency to deploy AI in their organizations and we’re sprinting to meet that demand. More than 1 million businesses run on

2:28 PM · Mar 16, 2026

210

Meta signs up to $27B with Nebius for multi-year AI compute capacity

Nebius × Meta (infrastructure deal): Meta and Nebius signed a multi-year AI infrastructure agreement for $12B of dedicated capacity plus up to $15B of additional compute over five years, with Reuters/CNBC-style reporting emphasizing early large-scale deployments on NVIDIA’s Vera Rubin platform, as shown in the deal summary and the CNBC key points.

This is a supply-side move that directly affects frontier training/inference availability outside the usual hyperscalers: it’s a pre-commit that reserves scarce GPU clusters via a third-party “GPU cloud” rather than waiting for spot capacity, and it sets a public price/size anchor for other large buyers negotiating multi-year GPU blocks.

Nebius and Meta signed a multi-year AI infrastructure agreement. “Under the five-year agreement, Nebius will provide $12 billion of dedicated capacity across multiple locations, based on one of the first large-scale deployments of the NVIDIA Vera Rubin platform.“

Nebius

@nebiusai

Nebius signs a new AI infrastructure agreement with Meta (up to ~$27B). "We are pleased to expand our significant partnership... to accelerate the build-out and growth of our core AI cloud business." - CEO Arkady Volozh Read more: nebius.com/newsroom/nebiu…

12:29 PM · Mar 16, 2026

137

Read 4 replies

Gemini API ships auto tier upgrades, faster Tier 1→2, and new spend caps

Gemini API (Google): Google shipped billing/quota changes aimed at scaling production usage—automatic tier upgrades, faster Tier 1→2 promotion (30 days post payment → 3 days) with lower spend requirement ($250 → $100), plus new billing account caps and spend-cap tooling, as detailed in the billing update.

For teams running agentic workloads where token spend can jump non-linearly, this is a concrete control-plane change: it reduces “capacity friction” during growth while adding guardrails to prevent accidental overruns from tool-heavy or long-context jobs.

Logan Kilpatrick

@OfficialLoganK

We just shipped a bunch of stuff to make it easier to scale with the Gemini API: - Automatic tier upgrades - Tier 1 -> Tier 2 now happens much faster (30 days post payment -> 3 days) and with less spend ($250 -> $100) - New billing account caps on each tier to limit over spend

4:53 PM · Mar 16, 2026

789

Read 87 replies

OpenAI reportedly refocuses around coding and business users

OpenAI (enterprise strategy): A Wall Street Journal excerpt circulating on X claims OpenAI leadership is finalizing a strategy shift to focus on coding and business users, with internal messaging warning against being “distracted by side quests,” as shown in the WSJ excerpt.

This matters operationally because it implies product and capacity prioritization: if “productivity on the business front” becomes the primary KPI, it tends to pull roadmap attention toward higher-volume B2B workloads (admin automation, code+review loops, enterprise integrations) and away from consumer breadth experiments.

Chubby♨️

@kimmonismus

OpenAI will no longer be "distracted by sidequests," which means full focus on revenue-strong B2B sector.

Andrew Curran

@AndrewCurran_

The WSJ is reporting that OpenAI is about to take a hard turn into enterprise.

4:04 AM · Mar 17, 2026

Read 14 replies

Jensen frames “OpenAI to AWS” as a major compute-consumption driver

AWS × OpenAI (capacity signal): Jensen Huang says “we’re going to bring OpenAI to AWS” and frames it as driving “enormous consumption of cloud computing,” citing OpenAI as “completely compute constrained,” as stated in the keynote clip.

This is less about a product feature and more about demand routing: it signals that incremental frontier inference/training load may be explicitly shifted across clouds, which can change where engineers see capacity, pricing, and quota headroom first (and which vendors get prioritized for enterprise procurement paths).

Rohan Paul

@rohanpaul_ai

AWS is going to be busier than ever. "We're going to bring OpenAI to AWS. And so it's going to drive enormous consumption of cloud computing at AWS. It's going to expand the reach, expand the compute of OpenAI. And as you know, they are completely compute constrained. " Jensen Show more

Rohan Paul

@rohanpaul_ai

🚨 BREAKING: Nvidia expects $1 Trillion revenue from AI chips through 2027. "I'm here to tell you that right now where I stand, I see through 2027 at least $1 trillion." revenue opportunity. ~ Jensen Huang, at GTC 2026 in San Jose, California pic.x.com/8zIhmTKbyH

10:08 PM · Mar 16, 2026

🧱 NVIDIA GTC hardware roadmap (Rubin, Groq LPX, DGX Station)

Hardware-specific signals from GTC dominate: Vera Rubin performance claims, Groq LPX disaggregated inference, and workstation-class boxes (DGX/GB300). Excludes NemoClaw/OpenClaw (covered separately).

NVIDIA pitches Vera Rubin as an “inference inflection” platform with 700M tokens/sec

Vera Rubin (NVIDIA): At GTC 2026, NVIDIA positioned Vera Rubin as a full-stack “AI factory” step-change for agentic inference—following up on the CPU/agent bottleneck framing from CPU bottleneck preview with claims like 700M tokens/sec and major perf-per-watt gains, as recapped in a [keynote summary](t:106|keynote summary) and reinforced by a [tokens-per-second slide](t:254|tokens-per-second slide).

The keynote packaging is “7 chips, 5 rack systems” (GPU/CPU/networking/switch + Groq LPU in the same lineup), and NVIDIA also cited $1T+ of purchase-order visibility through 2027 in its growth framing, as shown in the [growth slide](t:487|growth slide).

• What changed vs prior gen: the Rubin-side spec sheet calls out all-to-all scale-up 260 TB/s and tokens/sec 700M versus a Hopper-era baseline, per the [comparison slide](t:254|comparison slide).

• Deployment timeline signal: the keynote recap claims the first Rubin system is already live in Microsoft Azure and ships later this year, per the [keynote recap](t:106|keynote recap).

NVIDIA details Groq 3 LPX inference rack: 315 PFLOPS and 128GB SRAM, due 2H26

Groq 3 LPX (NVIDIA/Groq): NVIDIA’s GTC materials describe Groq 3 LPX as a rack-scale inference accelerator aimed at the latency/throughput tradeoff, with a ship window labeled “Available 2H26” and specs including 315 PFLOPS, 128GB SRAM, and 40 PB/s memory bandwidth, as shown on the [LPX spec slide](t:171|LPX spec slide).

The keynote recap also frames Groq LPX as part of a disaggregated inference architecture—Rubin handling prefill/attention and Groq taking decode FFN—calling out 35× higher inference throughput per megawatt in that combined design, per the [keynote recap](t:106|keynote recap).

• Why engineers noticed: the LPX unit is presented explicitly as an SRAM-heavy complement to HBM-heavy GPUs; the slide highlights scale-up density (256 chips) and scale-up bandwidth (640 TB/s), per the [hardware diagram](t:171|hardware diagram).

A pre-production Dell Pro Max with NVIDIA GB300 shows up at a developer’s house

GB300 workstation-class box (Dell/NVIDIA): A pre-production Dell Pro Max with GB300 was delivered to a builder’s home, described as a ~100lb machine with 750GB+ unified memory for running large open-weight models locally, per the [unboxing clip](t:21|unboxing clip).

The visible signal here is less about a new SKU announcement and more about hardware getting into individual hands early—suggesting workstation/desktop “AI on your desk” configurations are becoming part of day-to-day model testing workflows, as implied by the “what should I test first?” framing in the [same post](t:21|what to test prompt).

NVIDIA talks up orbital data centers and a Space‑1 Vera Rubin module

Space computing (NVIDIA): Jensen Huang said NVIDIA is working toward “data centers out in space,” explicitly calling out the cooling constraint—“no conduction, no convection… just radiation”—in the [space compute clip](t:224|space compute clip).

A separate keynote slide references a “Space‑1 Vera Rubin Module” and depicts the board/module plus a satellite concept render, per the [Space‑1 slide photo](t:169|Space‑1 slide photo).

This is still a concept-level signal in the tweets (no launch dates or SKUs), but it’s being presented as an extension of the same accelerated platform story NVIDIA is using for terrestrial “AI factories.”

NVIDIA announces DLSS 5; early reactions fixate on realism vs altered art direction

DLSS 5 (NVIDIA): NVIDIA unveiled DLSS 5 as “3D-guided neural rendering,” pitching photoreal lighting/material improvements in real time; the demos are circulating as before/after captures, including an [off vs on video comparison](t:51|off vs on video comparison).

The reception in the tweets is notably split: some describe the result as “more natural” and acceptable even if it shifts developers’ style, per the [defense of DLSS 5](t:172|defense of DLSS 5), while others argue it makes frames look “staged in a photo studio,” per the [lighting critique](t:565|lighting critique).

The keynote recap frames DLSS 5 as part of a broader “probabilistic rendering” direction (mixing structured 3D graphics with generative AI), per the [GTC recap thread](t:106|GTC recap thread).

NVIDIA’s “AI Natives” slide tries to make the ecosystem legible in one picture

Ecosystem mapping (NVIDIA): NVIDIA circulated an “AI Natives” graphic enumerating 103 companies across categories (frontier model builders, agent frameworks/protocols, inference/model-to-production vendors, and vertical AI apps), as captured in the [AI Natives infographic](t:78|AI Natives infographic).

The practical implication for engineering leadership is that NVIDIA is treating “agent frameworks/protocols” and “inference frameworks” as first-class layers on the same slide as silicon and CUDA—i.e., it’s selling an integrated stack narrative, not just chips, per the [same slide capture](t:78|slide capture).

📦 Open model releases & partnerships (Mistral Small 4, Nemotron family)

Open-weight model news today is led by Mistral Small 4 and NVIDIA’s open-model push; includes model specs, licensing, and early positioning. Excludes detailed serving/kernel work (covered under inference).

Mistral releases Mistral Small 4: 119B MoE, 256k context, Apache 2.0

Mistral Small 4 (Mistral): Mistral Small 4 shipped as an Apache-2.0 open-weight model positioned as a unified “one model to do it all” checkpoint—119B total parameters with 128 experts / 4 active (about 6.5B active per token), 256k context, and multimodal input (text+image → text), plus “reasoning effort” that’s configurable per request as described in the spec summary and corroborated by the Hugging Face PR screenshot.

• Positioning vs prior Mistral lineup: Mistral-affiliated posts frame Small 4 as a big jump over prior “Small/Medium/Large” internal baselines, with a benchmark breakout shown in the benchmark chart.
• Where builders can touch it today: TestingCatalog notes it’s available in Mistral Playground and highlights an alias (“mistral-small-2603”) along with price tooltips—€0.13 / 1M input tokens and €0.51 / 1M output tokens—as shown in the Playground pricing tooltip.

Weights and usage instructions are also circulating via the model page linked in the model page link and the Hugging Face collection referenced in model collection.

AiBattle

@AiBattle_

Mistral 4 is coming - "Mistral 4 is a powerful hybrid model with the capability of acting as both a general instruction model and a reasoning model. It unifies the capabilities of three different model families - Instruct, Reasoning ( previous called Magistral ), and Devstral - Show more

6:02 PM · Mar 16, 2026

478

Read 16 replies

Mistral and NVIDIA announce partnership to co-develop frontier open-source models

Mistral AI (Mistral) + NVIDIA: Mistral announced a strategic partnership with NVIDIA to co-develop frontier open-source models, tying Mistral’s model architecture/full-stack offering to NVIDIA compute and dev tooling, as stated in the partnership announcement.

This is explicitly connected to the Nemotron Coalition—Mistral calls this the first joint project as it becomes a founding member, per the same partnership announcement.

Mistral AI

@MistralAI

🚀Announcing a strategic partnership with NVIDIA to co-develop frontier open-source AI models, combining Mistral AI’s frontier model architecture and full-stack AI offering with NVIDIA’s leading compute infrastructure and development tools.

8:30 PM · Mar 16, 2026

2.7K

Read 65 replies

NVIDIA launches Nemotron Coalition for open frontier model development

Nemotron Coalition (NVIDIA): NVIDIA unveiled the Nemotron Coalition as a multi-lab effort to advance open frontier models; the GTC slide shows 8 founding members (Black Forest Labs, Cursor, LangChain, Mistral, Perplexity, Reflection, Sarvam, Thinking Machines) joining the initiative, as shown in the keynote slide.

The public signal here is governance + distribution: NVIDIA is positioning itself not just as the hardware layer but as the convenor for an “open frontier” roadmap, with the coalition framed as producing base models that downstream teams can specialize and deploy.

Black Forest Labs

@bfl_ml

We announced at @NVIDIAGTC that we're joining @nvidia's Nemotron Coalition to advance open frontier models. At BFL, we develop multimodal generative models for visual intelligence, ranging from images to real-time video and action prediction models. We've always been convinced Show more

10:03 PM · Mar 16, 2026

177

NVIDIA pitches Nemotron 3 Ultra as best open base model on GB200 NVL72

Nemotron 3 Ultra (NVIDIA): NVIDIA’s GTC deck positions Nemotron 3 Ultra as the “best open base model,” claiming 5× efficiency and the highest reasoning accuracy among the compared open models on GB200 NVL72, as shown in the keynote slide.

The slide breaks out reasoning performance by category (understanding/code/math/multilingual) and compares against GLM and Kimi K2, per the keynote slide; public details on weights, license, and evaluation artifacts weren’t included in the tweets themselves.

Chubby♨️

@kimmonismus

Ngl freaking hyped for Nemotron Ultra. NVIDIA is developing insane open source models. Still underrated.

1:39 AM · Mar 17, 2026

134

Read 10 replies

Mistral’s Leanstral-2603 targets Lean 4 proof assistant workflows

Leanstral-2603 (Mistral): A Small 4 family offshoot called Leanstral appeared as an open-source code agent aimed at Lean 4 (proof assistant) work; the model card screenshot describes it as part of the Mistral Small 4 family and repeats the Small 4 architecture traits (MoE with 128 experts / 4 active, 119B total / 6.5B active, 256k context, and multimodal inputs), as shown in the model card screenshot and hinted at by the upload notification in upload alert.

This is a concrete example of Mistral carving out domain-specific variants under the same “Small 4” umbrella rather than treating “reasoning/coding/math” as separate model lines.

Lisan al Gaib

@scaling01

Leanstral is part of the Mistral Small 4 family

Lisan al Gaib

@scaling01

Some math prover model by Mistral? link is dead again, just got the notif

7:26 PM · Mar 16, 2026

Read 4 replies

NVIDIA releases Nemotron 3 VoiceChat, a ~12B open-weights speech-to-speech model

Nemotron 3 VoiceChat (NVIDIA): NVIDIA released Nemotron 3 VoiceChat (V1) as an open-weights speech-to-speech model around 12B parameters, with benchmarking that separates “conversational dynamics” (turn-taking, interruptions) from “speech reasoning,” per the benchmark breakdown.

• Reported scores: the post cites 77.8% on a Full Duplex conversational dynamics subset and 29.2% on Big Bench Audio speech reasoning, as detailed in the benchmark breakdown.
• Reality check vs closed models: the same thread notes a large gap to proprietary systems (examples listed include Step-Audio R1.1 at 96%), per the gap note.

This is one of the clearer “open weights are improving, but still behind in voice” datapoints in today’s feed.

Artificial Analysis

@ArtificialAnlys

NVIDIA has released Nemotron 3 VoiceChat! A ~12B parameter Speech to Speech model that leads our open weights Conversational Dynamics vs. Speech Reasoning pareto frontier Understanding Speech to Speech model performance is multidimensional - two key and distinct dimensions are Show more

8:30 PM · Mar 16, 2026

170

Grok 4.20 Beta Reasoning hits #7 on Text Arena and #28 on Code Arena

Grok 4.20 Beta Reasoning (xAI): Arena reports Grok 4.20 Beta Reasoning at #7 on Text Arena and #28 on Code Arena, with Code Arena parity claims against DeepSeek-v3.2-thinking and Qwen3.5-122b-a10b, as shown in the leaderboard snapshot.

The post also notes it’s tied with GPT-5.4-high on the overall Text Arena score line, per the same leaderboard snapshot; treat this as an Arena snapshot (tooling, prompt mix, and time window matter) rather than a single definitive “reasoning” ranking.

Arena.ai

@arena

Grok 4.20 Beta Reasoning has landed #7 for Text Arena & #28 for Code Arena. The model is on par with DeepSeek-v3.2- thinking and Qwen3.5-122b-a10b in Code Arena's agentic webdev tasks. More Highlights: - #7 in Text Arena overall tied with GPT-5.4-high - top 10 in Math, Show more

9:10 PM · Mar 16, 2026

164

Read 7 replies

🚚 Serving & inference systems: vLLM/SGLang, speculative decoding, disaggregation

Runtime/serving engineers had a busy day: day‑0 support for new open weights, speculative decoding improvements, and new distributed/disaggregated inference building blocks. Excludes model releases themselves.

P‑EAGLE cuts speculative decoding passes by drafting K tokens in one forward pass (now in vLLM)

P‑EAGLE (Amazon + NVIDIA): vLLM highlighted P‑EAGLE, a speculative decoding variant that generates all K draft tokens in a single forward pass (instead of K autoregressive passes), reporting up to 1.69× speedup over vanilla EAGLE‑3 on NVIDIA B200 and sustained 5–25% gains at high concurrency (c=64), according to P‑EAGLE summary.

• How to turn it on: the same post shows vLLM’s --speculative-config JSON with "parallel_drafting": true and "num_speculative_tokens": 7, plus pre-trained heads for GPT‑OSS 120B/20B and Qwen3‑Coder 30B in the P‑EAGLE summary.

The immediate engineering implication is that the speculative “drafter” step becomes less of a sequential micro-loop, which can matter a lot once you’re already bottlenecked on memory bandwidth and kernel launch overhead during decode.

vLLM

@vllm_project

P-EAGLE from @AmazonScience and @NVIDIAAIDev removes the sequential bottleneck in speculative decoding — all K draft tokens generated in a single forward pass. 📈 Up to 1.69x speedup over vanilla EAGLE-3 on NVIDIA B200, with 5-25% gains sustained at high concurrency (c=64). How Show more

8:00 PM · Mar 16, 2026

NVIDIA Dynamo 1.0 adds native vLLM support for disaggregated, topology-aware serving

NVIDIA Dynamo 1.0 (NVIDIA): vLLM’s team called out Dynamo 1.0 shipping with native vLLM support, positioning it around disaggregated serving, agentic-aware routing, and topology-aware Kubernetes scaling, per Dynamo support note.

This lands as “inference control plane” plumbing rather than a model feature: the main new surface area is how requests and stages get routed/scaled in a multi-node deployment, not how a single GPU run behaves.

vLLM

@vllm_project

Great to see @NVIDIA Dynamo 1.0 ship with native vLLM support! Disaggregated serving, agentic-aware routing, and topology-aware K8s scaling — exciting building blocks for production distributed inference. 🚀 Thanks to the @NVIDIAAIDev Dynamo team!

NVIDIA AI Developer

@NVIDIAAIDev

Reasoning models are growing fast, and running them efficiently requires distributing workloads across multiple GPU nodes. NVIDIA Dynamo 1.0 delivers low-latency, high-throughput distributed inference for production AI deployments—while boosting NVIDIA Blackwell inference

12:14 AM · Mar 17, 2026

Read 1 reply

vLLM ships day-0 serving support for Mistral Small 4 (MLA backend, tool calling, reasoning mode)

vLLM (vLLM Project): vLLM announced day-0 support for serving Mistral Small 4 (119B MoE, 256k context), calling out the MLA attention backend, tool calling, and a configurable reasoning mode that’s verified on NVIDIA GPUs, per the launch note in Day-0 support post.

The operationally relevant details are the launch-time knobs shown in the same snippet—--max-model-len 262144, --attention-backend FLASH_ATTN_MLA, --tool-call-parser mistral, and --reasoning-parser mistral—which are the pieces that tend to lag when a new open-weights model family lands and people want OpenAI-compatible endpoints plus structured tool output.

vLLM

@vllm_project