GLM‑5 open weights debut at 744B – $1 input per 1M tokens

GLM-5 is the new leading open weights model! GLM-5 leads the Artificial Analysis Intelligence Index amongst open weights models and makes large gains over GLM-4.7 in GDPval-AA, our agentic benchmark focused on economically valuable work tasks GLM-5 is @Zai_org's first new Show more

8:10 PM · Feb 11, 2026

563

Read 25 replies

Z.ai launches GLM‑5 with open weights and a 200K context window

GLM‑5 (Z.ai): Z.ai formally launched GLM‑5—a new flagship MoE model scaling to 744B total params (40B active) and 28.5T pretraining tokens, positioned for long-horizon agentic work, as described in the launch thread and detailed in the Tech blog; it’s already selectable in the Z.ai chat UI per the Z.ai model picker.

The release also pins down operational specs that matter for builders—text-only, 200K context, and 128K max output, as shown in the Context and output card. Following up on architecture hints (DeepSeek-style sparse attention + MoE scaffolding), the public release makes GLM‑5 a concrete option rather than a rumor.

Z.ai

@Zai_org

Introducing GLM-5: From Vibe Coding to Agentic Engineering GLM-5 is built for complex systems engineering and long-horizon agentic tasks. Compared to GLM-4.5, it scales from 355B params (32B active) to 744B (40B active), with pre-training data growing from 23T to 28.5T tokens. Show more

5:33 PM · Feb 11, 2026

4.1K

Read 239 replies

GLM‑5 weights land on Hugging Face under MIT license

GLM‑5 (Hugging Face): The zai-org/GLM‑5 weights are now public on Hugging Face under an MIT License, with community notes emphasizing the native BF16 release size (roughly 1.5TB) and day‑0 compatibility with common tooling, according to the HF availability note and the official Model card.

This matters because MIT licensing makes downstream packaging (agents, fine-tunes, internal deployment) much simpler than research-only terms, but the BF16 footprint sets a high bar for self-hosting unless you rely on FP8 provider endpoints (which show up repeatedly elsewhere in today’s threads, such as the provider availability summary).

merve

@mervenoyann

GLM-5 is out on @huggingface 🔥 > A40B/744B, trained on more tokens (28.5T) > outperforms/on par with closed sota > allows commercial use (MIT licensed) 💗 use with vLLM/SGLang locally or through HF Inference Providers thanks to @novita_labs and @Zai_org 📦

5:49 PM · Feb 11, 2026

283

Z.ai says traffic jumped 10×; GLM‑5 rollout is gated by tight compute and plan repricing

Z.ai (GLM‑5 rollout ops): Z.ai says user traffic increased roughly tenfold and it’s actively scaling capacity, per the scaling note and the standalone traffic update.

It also warns “compute is very tight” and describes a phased rollout (starting with certain paid tiers) plus GLM Coding Plan pricing adjustments effective Feb 11, 2026, as laid out in the pricing and rollout post. The net for teams is that “open weights exist” doesn’t automatically mean “unconstrained capacity,” especially for long-context agent runs.

Z.ai

@Zai_org

A new model is now available on chat.z.ai.

12:37 PM · Feb 11, 2026

3.4K

Read 278 replies

Arena shows GLM‑5 as the top open model in Text, landing near #11 overall

GLM‑5 (Arena): Arena reports GLM‑5 is now the #1 open model on the Text leaderboard and sits around #11 overall, with a displayed score near 1452 and a still-growing vote count, according to the leaderboard announcement and the screenshot in the rank evidence.

This is a different kind of signal than benchmark charts: it’s a preference-driven arena score with wider noise sources, but it’s also a direct read on how humans are experiencing the model in head-to-head comparisons.

Arena.ai

@arena

GLM-5 from @Zai_org just climbed to #1 among open models in Text Arena! ▫️#1 open model on par with claude-sonnet-4.5 & gpt-5.1-high ▫️#11 overall; scoring 1452, +11pts over GLM-4.7 Test it out in the Code Arena and keep voting, we’ll see how GLM-5 performs for agentic coding Show more

Z.ai

@Zai_org

11:17 PM · Feb 11, 2026

183

Read 14 replies

Early GLM‑5 user reports: long-running agent workflows look strong; UI polish is mixed

GLM‑5 (early usage): Early practitioner posts describe GLM‑5 as a noticeable jump over prior GLM releases—one summary says “GLM‑5 feels like a big update” in the voxel comparison, while another frames it as “competitive…level with Opus 4.5” in the vibe check.

• Long-horizon agent behavior: Z.ai is explicitly leaning into “long-task era” narratives, including a reported 24+ hour single-agent run with 700+ tool calls in the long-task clip, which is more representative of real agent harness stress than single-turn codegen.
• Frontend/design taste: The same vibe checks that praise agentic performance also call out “taste” gaps in visual/frontend outputs (for example voxel scene completeness) per the voxel comparison—a common pattern when models are tuned primarily for tool-use and long workflows.

Separate threads also flag practical throughput constraints (tens of tokens/sec across providers) as a limiting factor for “agentic” use, as shown in the throughput comparison, even when per-token pricing is aggressive.

cedric

@cedric_chee

Replying to @cedric_chee

Voxel pagoda scene: GLM-5 vs GLM-4.7 vs Opus 4.6 GLM-5 feels like a big update. Its voxel design is close to Opus 4.6 but misses the torii gate. GLM-5's design taste differs from Opus. Opus 4.6 is more thorough, with more complete garden elements and extreme attention to detail. Show more

5:19 PM · Feb 11, 2026

Ollama Cloud hosts GLM‑5:cloud and wires it into popular coding agent CLIs

Ollama (GLM‑5): Ollama announced GLM‑5 on Ollama’s cloud with ollama run glm-5:cloud and explicit ollama launch commands that let you point tools like Claude Code, Codex, OpenCode, and OpenClaw at GLM‑5 as the backend, per the Ollama launch thread and the command list shown in app launch list.

Ollama also said it’s increasing capacity due to demand, according to the capacity update, which lines up with the broader “compute is tight” theme across GLM‑5 surfaces.

ollama

@ollama

❤️ GLM-5 is on Ollama's cloud! It's free to start, and with higher limits available on the paid plans. ollama run glm-5:cloud It's fast. You can connect it to Claude Code, Codex, OpenCode, OpenClaw via ollama launch! Claude: ollama launch claude --model glm-5:cloud Show more

Z.ai

@Zai_org

7:28 PM · Feb 11, 2026

Read 49 replies

SGLang posts a GLM‑5‑FP8 server launch recipe with EAGLE speculative decoding

SGLang (GLM‑5): LMSYS/SGLang announced day‑0 support for GLM‑5 and published a launch_server command targeting zai-org/GLM‑5‑FP8, including EAGLE speculative decoding and the same GLM tool/reasoning parsers seen across other stacks, according to the SGLang cookbook post.

The launch recipe bakes in --tp-size 8 and a fixed memory fraction (--mem-fraction-static 0.85), which are the knobs teams typically end up rediscovering the hard way when trying to stabilize throughput under long-context workloads.

LMSYS Org

@lmsysorg

🎉 The mysterious Pony Alpha is finally revealed, congrats to @Zai_org on releasing GLM-5! SGLang is ready to support on day-0. 🛠️ 744B params (40B active) model built for complex systems engineering & long-horizon agentic tasks 📚 28.5T tokens pretraining for a stronger Show more

Z.ai

@Zai_org

5:36 PM · Feb 11, 2026

Modal (GLM‑5): Modal announced GLM‑5 availability via a hosted endpoint and says it will be free for a limited time, positioning it as a plug-in backend for agent frameworks like OpenClaw/OpenCode, per the Modal announcement and the accompanying writeup in the Modal blog.

The practical angle here is reduction in “bring-your-own-serving” friction for a model whose raw BF16 footprint would otherwise push most teams toward managed FP8 providers.

Modal

@modal

GLM-5, the latest frontier open model from @Zai_org, is available now on Modal. We partnered with Z.ai to release an endpoint that will be free for a limited time.

6:01 PM · Feb 11, 2026

170

Read 6 replies

W&B Inference adds GLM‑5 day‑0 with tracing and credits, but availability looks fluid

W&B Inference (GLM‑5): Weights & Biases announced GLM‑5 is live on W&B Inference, describing an OpenAI-compatible API and Weave tracing integration in the W&B launch post, then offered $20 inference credits for early users per the credits offer.

One follow-on note suggests the endpoint may have been pulled or temporarily unavailable (“it’s gone”), per the availability note, which is a reminder that “day‑0” distribution can still be operationally unstable even when the model weights are public.

Weights & Biases

@wandb

NEW: GLM-5 is live on W&B Inference! The mysterious "Pony Alpha" model that had AI twitter in a frenzy for weeks. 744B params. MIT licensed. SOTA open-source across coding, reasoning, and agentic tasks. Follow along below to claim your $20 in inference credits 👇

1:25 AM · Feb 12, 2026

536

Read 16 replies

🧰 Codex in production: harness engineering + enterprise rollouts

What’s new today is less about model hype and more about operationalizing Codex: OpenAI describes the harness (tests/linters/observability/UI automation) that makes agent output mergeable, plus enterprise rollout details. Excludes GLM‑5 (covered as the feature).

OpenAI details a Codex harness that merged ~1,500 PRs with zero manual coding

Codex harness engineering (OpenAI): OpenAI described how a 3‑engineer team “steering Codex” shipped a product by opening and merging ~1,500 PRs into a ~1M‑line repo without writing code by hand, by building a tight harness around the agent—tests/linters, repo-specific instructions, isolated environments, UI automation, and observability loops, as outlined in the case study post and the article screenshot.

The practical shift is that throughput comes from automatic, repeatable validation (the harness), not from longer prompts; OpenAI’s writeup calls out patterns like using a concise AGENTS.md that points into a docs/ knowledge base (kept honest via CI), spinning up per‑git‑worktree app environments, driving UI checks via Chrome DevTools Protocol, and exposing logs/metrics/traces so agents can query systems (e.g., LogQL/PromQL) during iteration, per the Harness engineering post.

📣 Shipping software with Codex without touching code. Here’s how a small team steering Codex opened and merged 1,500 pull requests to deliver a product used by hundreds of internal users with zero manual coding. openai.com/index/harness-…

5:30 PM · Feb 11, 2026

1.4K

Read 57 replies

OpenAI rolls Codex to ~30k NVIDIA engineers with enterprise controls

Codex at NVIDIA (OpenAI): OpenAI says Codex is rolling out company-wide at NVIDIA to ~30k engineers, with cloud-managed admin controls plus US-only processing and fail-safes, according to the enterprise rollout note.

The operational detail here is the emphasis on jurisdictional processing and admin control surfaces—features teams typically need before they can standardize an agent in a regulated engineering org—as echoed in NVIDIA’s rollout reactions in the rollout graphic.

Codex is rolling out company-wide at NVIDIA to ~30k engineers. We partnered closely with their team to deliver cloud-managed admin controls and US-only processing with fail-safes.

Dennis Hannusch

@DennisHannusch

I started daily driving Codex with gpt-5.3-codex this week.. it's reaaally good. I've gotten used to complex workflows and context management, but Codex just does what I ask. I keep expecting quality to drop deep into a session, but it doesn't. @OpenAIDevs ya'll cooked!

12:24 AM · Feb 12, 2026

1.7K

Read 65 replies

OpenAI publishes 10 operational tips for multi-hour agent workflows

Shell + Skills + Compaction (OpenAI Devs): OpenAI published a set of practical reliability patterns for multi-hour agent runs—explicitly aimed at long workflows that keep making progress without babysitting—following up on Server compaction (server-side context compression) with more concrete runbook-style guidance in the tips announcement.

Their framing is that you combine a hosted shell for real execution, reusable skills as packaged capabilities, and compaction to keep context stable over hours; the detailed writeup lives in the Tips post.

We just announced new primitives for building agents. Here are 10 tips on running multi-hour workflows reliably 👇 developers.openai.com/blog/skills-sh…

11:17 PM · Feb 11, 2026

1.8K

Read 37 replies

Codex Alpha desktop app opens Windows waitlist (Linux build also listed)

Codex Alpha app (OpenAI): An early access waitlist for the Codex Alpha desktop app surfaced with Windows as a target OS and Linux also listed as an option, per the waitlist screenshot.

This is a concrete distribution signal that Codex’s “agent app” UX is moving beyond macOS-heavy early adopters into broader enterprise workstation coverage.

OpenAI opened an early access waitlist for Codex Alpha app on Windows! Linux build is on the list as well. Competition for Windows users 🔥

Andrew Ambrosino

@ajambrosino

we may be ready for windows alpha testing ~EOW, so make sure you've signed up at openai.com/form/codex-app/

8:52 AM · Feb 11, 2026

323

Harvey: Codex helps engineers run parallel approaches, then converge on design

Codex at Harvey (OpenAI): OpenAI shared a usage pattern from Harvey—engineers use Codex to explore multiple approaches in parallel and converge faster, shifting human time toward system design and harder decisions, as shown in the Harvey workflow clip.

The notable workflow claim is “parallel exploration then converge,” which fits the harness-first theme: agents generate options, while humans arbitrate architecture and tradeoffs.

Engineers at @harvey use Codex to explore multiple approaches in parallel and converge faster on a solution. Codex frees up their time for deeper system design and complex decision-making.

11:39 PM · Feb 11, 2026

232

Read 11 replies

Altman signals Codex is “winning” faster than expected

Codex adoption (OpenAI): Sam Altman wrote that he expected Codex to “eventually win” but is “pleasantly surprised” it’s happening so quickly, explicitly crediting builders for the acceleration in the Altman comment.

This aligns with broader (if anecdotal) chatter that “nearly all of the best engineers… are switching from claude to codex,” as quoted in the switching claim, and with blunt preference statements like “no idea why people would still be using Claude” in the preference repost.

Sam Altman

@sama

From how the team operates, I always thought Codex would eventually win. But I am pleasantly surprised to see it happening so quickly. Thank you to all the builders; you inspire us to work even harder.

Craig Weiss

@craigzLiszt

nearly all of the best engineers i know are switching from claude to codex

3:27 PM · Feb 11, 2026

5.7K

Read 1.6K replies

Codex CLI 0.99 ships /statusline and better concurrent shell execution

Codex CLI 0.99 (OpenAI): A new Codex CLI release adds /statusline to customize the TUI footer metadata and changes shell command handling so direct commands no longer interrupt an in-flight turn, as shown in the release notes screenshot.

These are small, workflow-level improvements, but they target the two things that tend to break “agent as daily driver”: situational awareness (statusline) and terminal concurrency.

Ian Nuttall

@iannuttall

Codex CLI 0.99 is out New /statusline to add the metadata you want in the footer!

8:52 PM · Feb 11, 2026

OpenAI’s Atlas browser team says Codex wrote over half the codebase

Atlas built with Codex (OpenAI): A long interview clip describes the team building OpenAI’s agentic browser Atlas and claims “more than half of Atlas’s code was written by Codex,” with Codex also used for navigating Chromium, prototyping UI, and learning implementation techniques, per the podcast segment.

This is another concrete “Codex in production” datapoint: not a benchmark claim, but a statement about how senior engineers are using it inside a large, legacy codebase.

Dan Shipper 📧

@danshipper

I use @OpenAI’s browser Atlas every day, and this week, I got to talk to the team building it. Ben Goodger (@bengoodger), Atlas’s head of engineering, and Darin Fisher (@darinwf), member of technical staff, are legends of the browser world. They’ve worked together on Netscape, Show more

4:00 PM · Feb 11, 2026

OpenAI presents Codex steering practices at Pragmatic Summit

Pragmatic Summit (OpenAI DevRel): OpenAI’s developer team posted from Pragmatic Summit, pitching what it means to “steer an engineering team in an agent-first world” based on internal Codex usage, with a public demo invite in the summit session clip and follow-up logistics in the demo invite.

This mainly reads as field-positioning: Codex is being framed as an org-level system with steering, not a single-user coding assistant.

We’re here at Pragmatic Summit talking about the future of AI engineering. Our team is covering how OpenAI uses Codex internally and what it means to steer an engineering team in an agent-first world.

8:54 PM · Feb 11, 2026

201

Read 29 replies

🟤 Claude product updates: free plan upgrades + Claude Code in Slack

Anthropic expanded free-tier capabilities and pushed more “work OS” features into Claude/Claude Code surfaces (connectors, skills, compaction, Slack workflows). Excludes GLM‑5 and Codex harness story (feature + Codex category).

Claude Code can run in an open-source sandbox runtime with isolation controls

Claude Code (Anthropic): Claude Code can opt into an open-source sandbox runtime via /sandbox, with both file and network isolation; the post notes Windows support “coming soon,” as described in Sandboxing tip with the runtime linked in Sandbox runtime repo.

This is about fewer permission prompts while keeping containment.

6/ Enable sandboxing Opt into Claude Code's open source sandbox runtime (github.com/anthropic-expe…) to improve safety while reducing permission prompts. Run /sandbox to enable it. Sandboxing runs on your machine, and supports both file and network isolation. Windows support Show more

9:38 PM · Feb 11, 2026

Claude Code plugins can install LSPs, MCPs, and skills via marketplaces

Claude Code (Anthropic): Claude Code’s /plugin flow can install LSPs, MCP servers, skills, and other components; the post also calls out the ability to run company/private marketplaces and check settings into version control, as described in Plugins tip with docs in Plugin marketplace docs.

This is an ops surface. It’s how teams standardize toolchains.

3/ Install Plugins, MCPs, and Skills Plugins let you install LSPs (now available for every major language), MCPs, skills, agents, and custom hooks. Install a plugin from the official Anthropic plugin marketplace, or create your own marketplace for your company. Then, check the Show more

9:36 PM · Feb 11, 2026

Anthropic’s advanced tool use framing resurfaces: tools as determinism, not browsing

Claude Developer Platform (Anthropic): An Anthropic engineering write-up on “advanced tool use” is being recirculated, framed around saving time/tokens and improving determinism when sites expose bespoke tools, as referenced in Advanced tool use link with details in Engineering post.

This aligns with the broader “tool interfaces beat UI automation” direction, but the tweets don’t include new metrics.

Ivan Leo

@ivanleomk

anthropic.com/engineering/ad… Finally had the time to look at this and it feels like the start of a new trend with Gemini pushing its variant of agentic vision too. Similar ideas, very exciting space

2:17 AM · Feb 12, 2026

Claude Code exposes Low/Medium/High effort levels via /model

Claude Code (Anthropic): Claude Code supports an explicit effort level selection via /model; the thread frames it as a speed/cost vs quality dial (Low/Medium/High), as outlined in Effort level tip.

This is a product-level acknowledgement that “same model” is not one behavior.

2/ Adjust effort level Run /model to pick your preferred effort level. Set it to: - Low, for less tokens & faster responses - Medium, for balanced behavior - High, for more tokens & more intelligence Personally, I use High for everything.

9:36 PM · Feb 11, 2026

167

Read 11 replies

Claude Code Slack app install link is live

Claude Code in Slack (Anthropic): Anthropic shared an install entry point for the Slack app in Install link post, pointing to setup documentation in Slack app docs.

Distribution is the change here. It’s no longer “internal beta only” vibes.

This turned out really nice. Excited to hear what you think! Install the Claude Code Slack app here: code.claude.com/docs/en/slack

Thariq

@trq212

We've added Plan Mode to Claude Code in Slack. When you give Claude a complex task it will ask you clarifying questions and show you an implementation plan before proceeding.

5:20 PM · Feb 11, 2026

597

Read 55 replies

Claude Code status lines let you surface model, context, and cost inline

Claude Code (Anthropic): Claude Code supports custom status lines shown below the composer; the thread calls out showing model, directory, remaining context, and cost, with setup via /statusline, as described in Status line tip and detailed in Status line docs.

This is small but operational. It reduces “what state am I in?” confusion.

7/ Add a status line Custom status lines show up right below the composer, and let you show model, directory, remaining context, cost, and pretty much anything else you want to see while you work. Everyone on the Claude Code team has a different statusline. Use /statusline to Show more

9:40 PM · Feb 11, 2026

161

Claude Code terminal setup adds shift+enter newlines across more terminals

Claude Code (Anthropic): The customization thread highlights /terminal-setup for enabling shift+enter newlines (avoiding backslash line continuation) when running in IDE terminals and apps like Warp/Alacritty, as described in Terminal config tip and detailed in Terminal setup docs.

This targets a real papercut. It’s about writing multi-line prompts faster.

1/ Configure your terminal - Theme: Run /config to set light/dark mode - Notifs: Enable notifications for iTerm2, or use a custom notifs hook - Newlines: If you use Claude Code in an IDE terminal, Apple Terminal, Warp, or Alacritty, run /terminal-setup to enable shift+enter for Show more

9:36 PM · Feb 11, 2026

161

Claude Code keybindings are fully remappable with live reload

Claude Code (Anthropic): Claude Code allows customizing every keybinding via /keybindings, with settings hot-reloading so you can feel changes immediately, as described in Keybindings tip and documented in Keybindings docs.

This matters for teams standardizing a workflow across editors and terminals.

8/ Customize your keybindings Did you know every key binding in Claude Code is customizable? /keybindings to re-map any key. Settings live reload so you can see how it feels immediately code.claude.com/docs/en/keybin…

9:40 PM · Feb 11, 2026

Plan Mode gets pushback from builders who want a persistent plan artifact

Plan Mode workflows (Claude Code): Alongside the Slack Plan Mode announcement in Plan Mode announcement, a counterpoint argues “plan mode sucks” and describes keeping the plan in a dedicated doc that doesn’t get compacted, as shown in Plan Mode debate screenshot.

This is an emerging split: embed planning in the agent loop vs keep a long-lived human-readable artifact.

Thariq

@trq212

We've added Plan Mode to Claude Code in Slack. When you give Claude a complex task it will ask you clarifying questions and show you an implementation plan before proceeding.

5:10 PM · Feb 11, 2026

659

Read 46 replies

Claude Code CLI benchmark shows claude --version ~15× faster in next build

Claude Code CLI (Anthropic): A benchmark claims the next Claude Code version makes claude --version ~15× faster—about 12 ms vs 180 ms—as shown in the hyperfine output shared in CLI benchmark.

It’s small, but it signals continued focus on CLI responsiveness.

alistair

@alistaiir

In the next version of Claude Code `claude --version` gets 15x faster

➜ code hyperfine "./claude-new --version" "./claude-old --version"
Benchmark 1: ./claude-new --version
Time (mean ± σ): 12.1 ms ± 1.6 ms [User: 3.8 ms, System: 2.2 ms]
Range (min … max): 7.0 ms … 15.4 ms 123 runs

Benchmark 2: ./claude-old --version
Time (mean ± σ): 179.9 ms ± 7.7 ms [User: 179.9 ms, System: 20.0 ms]
Range (min … max): 170.4 ms … 200.0 ms 15 runs

Summary
./claude-new --version ran
14.91 ± 2.03 times faster than ./claude-old --version

varepsilon

@var_epsilon

actually codex is better because it’s 7x faster to print the version number

5:35 PM · Feb 11, 2026

806

Read 39 replies

🧑‍💻 Cursor & editor copilots: higher limits and model routing ergonomics

Tactical shipping updates for people using Cursor-style IDE agents: quota/limit changes and practical model allocation decisions. Excludes GLM‑5 (feature).

Routing heuristic: Composer/Opus for live iteration, Codex for background work

Model routing (practice): A practical allocation pattern is circulating: use Composer 1.5 + Opus 4.6 for “sync work” (interactive, fast feedback loops) and switch to GPT‑5.3 Codex for “async work” (longer, background-style tasks), as described in the Routing heuristic post—notably framed as a workflow choice, not a benchmark race.

The same post ties into Cursor’s temporary quota headroom—individual plans get 3× more Composer 1.5 than Composer 1, with a limited-time bump to 6× through Feb 16, as stated in the Limits increase note—which makes “use the faster/cheaper model while you’re present” a more viable default for day-to-day iteration.

eric zakariasson

@ericzakariasson

composer 1.5 is a good model, and now you're getting a lot more of it! sync work → composer 1.5, opus 4.6 async work → 5.3 codex

Cursor

@cursor_ai

We've raised limits for Auto and Composer 1.5 for all individual plans. Individual plans now have 3x usage of Composer 1.5 versus Composer 1. For a limited time (through February 16), we're increasing that to 6x.

2:22 AM · Feb 12, 2026

230

Read 20 replies

🔷 Google Gemini developer surfaces: AI Studio UX + NotebookLM styles + Gemini 3.1 signals

Google’s builder UX got multiple small-but-real workflow updates today (AI Studio navigation/omnibar, design tooling modes) alongside renewed Gemini 3.1 preview chatter. Excludes GLM‑5 (feature).

Gemini 3.1 Pro Preview reference appears in public model listings

Gemini 3.1 Pro Preview (Google): Multiple watchers report seeing “Gemini 3.1 Pro Preview” referenced in model lists, suggesting an upcoming release or staged rollout; one example is a listing screenshot shared in the Artificial Analysis screenshot, with additional sightings echoed in the Model list highlight.

Treat this as a “surface signal,” not a spec drop: the tweets don’t include context window, pricing, or API availability details—only the name appearing in listings, as documented in the Artificial Analysis screenshot.

BREAKING 🚨: GOOGLE MIGHT BE PREPARING GEMINI 3.1 PRO PREVIEW FOR RELEASE! The same reference has been spotted on the Artificial Analysys Arena earlier. Straight to 3.1 👀

leo 🐾

@synthwavedd

gemini-3.1-pro-preview

10:49 PM · Feb 11, 2026

Read 33 replies

Google AI Studio redesign focuses on fast resume and an Omnibar

Google AI Studio (Google): A redesigned home page is rolling out that makes it easier to jump back into prior chats and “vibe-coded apps,” check usage, and start new work from a central Omnibar, as shown in the Homepage walkthrough and reiterated in the Build for speed clip.

• Navigation and retrieval: The emphasis is “get back to past chats” and “jump back to a past vibe coding session,” with a global keyboard shortcut (Ctrl + /) called out in the Build for speed clip.
• Ops visibility: The new surface highlights usage and project entry points, with a concrete view of the “Jump back in” list and usage panel visible in the UI screenshot.

Logan Kilpatrick

@OfficialLoganK

Say hello to the new @GoogleAIStudio home page : ) We made it way easier to quickly get back to past chats, vibe coded apps, check project usage, and quickly start building with the new Omnibar. And this is just the start!

5:38 PM · Feb 11, 2026

1.8K

Read 173 replies

NotebookLM adds infographic style presets in testing

NotebookLM (Google): NotebookLM is testing infographic customization with an auto-selection mode plus nine explicit styles—sketch, kawaii, professional, anime, 3D clay, editorial, storyboard, bento grid, and bricks—as shown in the Styles preview video.

This is a small UX change, but it matters for teams using NotebookLM outputs in external docs: it turns “same content, different visual treatment” into a first-class control, per the Styles preview video.

BREAKING 🚨: Google is testing new customisation styles for NotebookLM infographics! These new options include an auto-selection mode and 9 specific styles: sketch, kawaii, professional, anime, 3D clay, editorial, storyboard, bento grid, and bricks. Which is the best? 👀

9:03 PM · Feb 11, 2026

409

Stitch adds direct export to Figma with editable layers

Stitch (Google): Stitch now supports direct export of generated designs to Figma with editable layers, framed as a long-requested capability in the Export demo.

This changes the handoff path for teams: it turns Stitch output into a native design artifact rather than a screenshot-to-rebuild step, as shown in the Export demo.

Wes Roth

@WesRoth

Stitch has finally shipped its most requested feature: Direct export of Stitch agent designs to Figma with editable layers.

Stitch by Google

@stitchbygoogle

🚨 Our #1 most requested feature is here: You can now export designs from any Stitch agent directly to Figma as editable layers. Vibe Design is perfect for exploring many ideas in minutes. But sometimes, you need that final layer of polish. Now you can move seamlessly between

2:30 PM · Feb 11, 2026

Stitch introduces an Ideate mode for solution exploration

Stitch (Google): Stitch gained an Ideate mode positioned as “Bring a problem to solve and see solutions,” expanding beyond redesign-style workflows; the mode picker and prompt framing are visible in the Ideate mode screenshot.

The same UI capture also shows an “Export to Figma” callout in-product, but the tweet’s concrete change is the new Ideate workflow and its intent (“problem → solutions”), as documented in the Ideate mode screenshot.

Stitch by Google got a new "Ideate" mode 👀 "Bring a problem to solve and see solutions"

Stitch by Google

@stitchbygoogle

Introducing the💡 Ideate agent in Stitch. Use it at the start of a project to explore what’s possible before you converge on a solution. It takes a moment to think deeply, fetching context and exploring ideas so you don't have to. Think of it as your design buddy that: 🌐

11:57 PM · Feb 11, 2026

🧑‍✈️ Agent orchestration & ops tooling: cloud runners, registries, memory, and multi-session UX

Ops-layer tooling for running many agents reliably: cloud agent platforms, registries, agent memory, and multi-session management in editors/terminals. Excludes GLM‑5 provider rollout (feature).

Devin Review hits 40k+ daily runs and adds one-click fixes, merge, and REVIEW.md

Devin Review (Cognition): Two weeks after launch, Devin Review is reportedly running 40,000+ times per day; the team added one-click apply fixes, a merge button, REVIEW.md support, and comment mentions, per the feature update demo.

This is a clear scaling signal for agent ops: PR-level automation is moving from “demo” to “high-volume workflow surface,” and the shipped features are aimed at collapsing the loop from review → edits → merge.

Cognition

@cognition

Two weeks since launch, Devin Review runs >40,000 times a day. Thanks for the love & feedback. Some updates based on popular demand: - One-click-apply fixes - Merge button - REVIEW md support Swap github -> devinreview in any PR link to try, no account needed.

9:15 PM · Feb 11, 2026

Warp open-sources the Oz Skills pack used for coding-agent automations

Oz Skills (Warp): Following up on cloud agents launch, Warp open-sourced the set of Skills they built into Oz—packaged automations for agentic chores like accessibility audits, docs updates, and test-coverage improvements, as announced in the skills open-source thread.

• What changes for teams: instead of re-creating “house style” automations per harness, you can install/inspect the same Skill definitions and reuse them across agent runners, per the skills open-source thread and the GitHub repo.

This is another data point that “skills as artifacts” is becoming the portability layer between agent products.

Warp

@warpdotdev

We open sourced all of the Skills that we built into Oz. These cover the most useful coding agent automations, like auditing accessibility, updating docs, improving test coverage, and more. Install into your favorite coding agent, or reference them in your next Oz agent run 🙌

6:28 PM · Feb 11, 2026

189

RepoPrompt 2.0 adds built-in agent mode and Codex app-server integration

RepoPrompt 2.0 (RepoPrompt): RepoPrompt shipped v2.0 with a built-in Agent mode that uses its MCP tools more fully, plus first-class support for Codex via its app server, while also supporting Claude Code and Gemini CLI, per the release notes and the changelog link.

This is part of a wider ops trend: “context builder + execution harness” tools are converging into products that sit between your repo and whichever agent you run.

eric provencher

@pvncher

Just released @RepoPrompt 2.0! - New built in Agent mode, making full use of the RP MCP tools - First class support for Codex, leveraging it's app server. - Good support for Claude Code + Gemini CLI as well! - Brand new onboarding making the app easier than ever to use.

10:53 PM · Feb 11, 2026

110

Read 19 replies

Warp agent adds Skills: save to .agents, browse with /skills, edit with /edit-skill

Warp agent (Warp): Warp’s built-in agent now supports Skills stored in a local .agents/ folder; you can search them with /skills and modify them via /edit-skill in a rich viewer, as shown in the skills support demo.

The operational impact is that teams can treat Skills as versionable repo artifacts (reviewable diffs) rather than ad-hoc prompts floating in chat history.

Warp

@warpdotdev

The Warp agent now supports skills! ✅ Save skills to your .agents/ folder ✅ Use /skills to search through all skills ✅ Use /edit-skill to modify the SKILL[md] in our rich text viewer

11:35 PM · Feb 11, 2026

Warp ships an /oz Skill to let other agents manage Oz cloud runs

/oz Skill (Warp): Warp released an /oz Skill that lets other coding agents (Claude Code, Codex, OpenCode, etc.) query Oz cloud-agent runs, update schedules, and modify Docker environment dependencies, as demonstrated in the oz skill demo.

This is a concrete interoperability move: orchestration state (runs, schedules, env) becomes tool-callable from whichever harness your team prefers.

Warp

@warpdotdev

You can teach your favorite coding agent how to use Oz cloud agents. With the /oz Skill, Claude Code, Codex, @opencode, etc can look at your cloud agent runs, update schedules, add dependencies to your Docker environment, and more. Who needs a GUI these days?

10:08 PM · Feb 11, 2026

Zed v0.223 adds URL-launched Agent Panel and terminal-to-thread capture

Zed v0.223 (Zed): Zed shipped deep multi-session UX improvements for agent workflows: you can open the Agent Panel via a custom URL (zed://agent?prompt=...) and send terminal selections into an agent thread via a context-menu action, as shown in the release demo.

These are small primitives, but they reduce the friction of “turn output into context” when you’re running multiple agent threads and iterating fast.

Zed

@zeddotdev

🚀 We just shipped v0.223! Launch Zed's Agent Panel with a prompt via URL: zed://agent?prompt=your+prompt+here Now, you can add an "Ask in Zed Agent" button to your software today.

10:05 PM · Feb 11, 2026

142

agent-browser crosses 500k weekly downloads a month after launch

agent-browser (open source): The agent-browser project crossed roughly 500,000 weekly downloads about one month after being launched and open-sourced, according to the downloads screenshot.

For ops-minded teams, that adoption curve suggests “agent-capable browser primitives” are becoming standard dependencies—raising the bar on reliability, observability, and safety defaults for web-task execution.

Chris Tate

@ctatedev

Exactly one month ago, we launched and open sourced agent-browser Today, it has crossed 500,000 weekly downloads This momentum belongs to the community. Every issue opened, every pull request merged and every bit of feedback shared has pushed the project forward.

3:09 PM · Feb 11, 2026

167

Read 14 replies

LangSmith Agent Builder explains its memory system for repeatable autonomous tasks

LangSmith Agent Builder memory (LangChain): LangChain shared how they designed memory into Agent Builder from the start—storing reusable instructions and learning from feedback, with portability via Markdown/JSON formats, as described in the memory deep dive.

The practical ops angle is that memory becomes an artifact you can migrate across harnesses (and review), instead of a proprietary per-app toggle that behaves unpredictably.

LangChain

@LangChain

We built memory into LangSmith Agent Builder from the outset, which wasn't an obvious choice. Most AI products ship without it. Memory enables agents to work autonomously on repetitive tasks, without you needing to repeat instructions every time. It lets agents: - Store Show more

6:05 PM · Feb 11, 2026

Zed’s ACP Registry adds Junie (JetBrains) and Kimi CLI agents

ACP Registry (Zed): Zed highlighted growing agent availability via its ACP Registry—calling out new installables including Junie (JetBrains) and Kimi CLI (Moonshot), per the registry screenshot.

The immediate value is operational: “agent choice” moves from per-tool setup to a registry install step, which matters once teams are running multiple specialized agents in parallel.

Zed

@zeddotdev

We have more agents than ever in Zed thanks to the ACP Registry 🤩 This week you can try out @Kimi_Moonshot or Junie by @JetBrains.

4:25 PM · Feb 11, 2026

317

🔎 Codebase intelligence & context extraction: Q&A over repos, ripping dependencies, and doc parsing

Tools and patterns for turning repos/docs into agent-ingestible context: repo Q&A, targeted code extraction, diagram-to-graph conversion. Excludes GLM‑5 (feature).

DeepWiki MCP plus GitHub CLI is being used to extract small, self-contained modules from large deps

DeepWiki MCP (Context-to-code extraction): Karpathy reports a workflow where an agent uses DeepWiki via MCP plus GitHub CLI to locate the real implementation details inside a dependency, then re-implements only the needed slice with tests—he describes getting ~150 lines of self-contained FP8 training code that let him drop torchao and even run ~3% faster in one case, per his DeepWiki MCP workflow. The point is less “read the repo” and more “give the agent a repo-explainer API, then ask it to carve out a minimal equivalent.”

Andrej Karpathy

@karpathy

On DeepWiki and increasing malleability of software. This starts as partially a post on appreciation to DeepWiki, which I routinely find very useful and I think more people would find useful to know about. I went through a few iterations of use: Their first feature was that it Show more

5:12 PM · Feb 11, 2026

5.0K

Read 225 replies

DeepWiki URL swap turns any GitHub repo into an instant Q&A surface

DeepWiki (Context extraction): A lightweight trick—swap github.com to deepwiki.com—creates auto-generated wiki pages plus repo-grounded Q&A, which Karpathy says often beats stale library docs because “the code is the source of truth,” as described in his DeepWiki usage thread. This is showing up as a practical way to answer implementation questions (e.g., internal FP8 details) without first finding the “right” doc page.

Andrej Karpathy

@karpathy

5:12 PM · Feb 11, 2026

5.0K

Read 225 replies

Diagram-to-Mermaid parsing turns dense PDFs into LLM-ingestible graphs

LlamaCloud (LlamaIndex): A diagram parsing feature is being demoed that converts complex diagrams inside PDFs/PowerPoints into Mermaid plaintext so LLMs can reason over structure without “burning” extra vision tokens; the before/after is shown in the Diagram to mermaid example, alongside a pointer to Anthropic’s multi-agent architecture diagram in the Agent architectures report.

This is a direct bridge from visual documentation into graph-shaped context that can be versioned, diffed, and fed into agents.

Jerry Liu

@jerryjliu0

We love parsing diagrams. Anthropic’s recent report on coding trends has a nice diagram on the evolution from single-agent to hierarchical multi-agent architectures With our latest VLM-enabled document parsing, we’re able to one-shot this diagram into a `mermaid` plaintext Show more

8:46 PM · Feb 11, 2026

163

Read 9 replies

Agent-assisted code extraction is pushing a ‘bacterial code’ philosophy for libraries and deps

Software malleability (Design signal): Following the same DeepWiki MCP experience, Karpathy argues agents make it economical to “rip out the exact part you need,” which could change how software is written—favoring self-contained, stateless, easy-to-extract modules (“bacterial code”) over tangled dependency graphs, as framed in his Malleable software argument. Scott Wu echoes that as agents write less code directly, interfaces that help humans and agents ask precise questions against reality (code + surrounding context) become the new bottleneck, per his Interface matters take.

Andrej Karpathy

@karpathy

5:12 PM · Feb 11, 2026

5.0K

Read 225 replies

Doc Q&A agents are using a virtualized filesystem plus bash to harvest context deterministically

Doc ingestion pattern (agent-browser, json-render): ctatedev describes “Ask AI” endpoints that spin up a virtualized filesystem and run deterministic bash commands to traverse docs, extract relevant files, and assemble context for answers—positioned as a fast, inspectable alternative to black-box browsing, per the Just-bash doc search example.

The same pattern is being applied across multiple doc sites, including the surfaces cited in Project docs and Agent-browser docs.

Chris Tate

@ctatedev

You can now "Ask AI" about json-render.dev and agent-browser.dev But the best part is what powers it behind the scenes: These agents use 𝚓𝚞𝚜𝚝-𝚋𝚊𝚜𝚑 to spin up a virtualized file system, run bash commands that rip thru the docs and gather relevant context

8:43 PM · Feb 11, 2026

opensrc CLI adds a one-command “give my agent the source” flow

opensrc (ctatedev): A new CLI flow—npx opensrc <package|repo>—clones the resolved upstream repo at a detected version into a local directory, explicitly positioned as a way to “give it the source” when agents need deeper context, as shown in the CLI output screenshot.

This is a concrete alternative to doc-only context packing: pull the exact source snapshot first, then point tools/agents at a stable filesystem path.

Chris Tate

@ctatedev

Sometimes your agent needs deeper context. Give it the source. 𝚗𝚙𝚡 𝚘𝚙𝚎𝚗𝚜𝚛𝚌 <𝚙𝚊𝚌𝚔𝚊𝚐𝚎|𝚛𝚎𝚙𝚘>

5:35 PM · Feb 11, 2026

205

A single repo tries to standardize “Generative UI” building blocks for agentic apps

Generative UI (CopilotKit): CopilotKit published a consolidated resource repo that frames “GenUI” as agentic UI specs and groups three implementation patterns—MCP Apps (sandboxed iframe apps), Google’s A2UI (declarative JSON UI), and CopilotKit’s AG‑UI (state/protocol sync), as captured in the Repo screenshot.

It reads like an attempt to make UI artifacts as portable and inspectable as prompts/tools—so agents and frontends can share a common schema.

CopilotKit🪁

@CopilotKit

𝐉𝐔𝐒𝐓 𝐑𝐄𝐋𝐄𝐀𝐒𝐄𝐃: GitHub Repo for Generative UI in 2026 Anthropic's MCP Apps + Google's A2UI + CopilotKit's AG-UI One place with all the starter resources you need to build Generative UIs. This README includes: → what GenUI actually means (beyond buzzwords) → how it Show more

2:45 PM · Feb 11, 2026

139

Property-based testing is being pitched as the safety rail for dependency extraction refactors

Equivalence testing pattern: A recurring tactic for ripping functionality out of a dependency is to keep a thin bridge to the original implementation and use property-based tests to assert behavioral equivalence across many generated inputs, as suggested in the Property-based testing tip. This pairs naturally with agent-written re-implementations: fast extraction, then high-coverage behavioral checks.

Christopher Ehrlich

@ccccjjjjeeee

By the way, the secret to this is property-based testing. Write a bridge that calls the original code, and assert that for arbitrary input, both versions do the same thing. Make the agent keep going until this is consistently true.

Christopher Ehrlich

@ccccjjjjeeee

It actually worked! For the past couple of days I’ve been throwing 5.3-codex at the C codebase for SimCity (1989) to port it to TypeScript. Not reading any code, very little steering. Today I have SimCity running in the browser. I can’t believe this new world we live in.

2:16 PM · Feb 10, 2026

Read 40 replies

🦞 OpenClaw ecosystem: power-user workflows, scaling pains, and trust issues

OpenClaw remains a high-signal community harness, but today’s tweets are about operational friction (usability, rate limits) and trust boundaries (scraping/stargazer spam). Excludes GLM‑5 (feature).

OpenClaw power-user walkthrough shows a “Codex + Opus” operating setup

OpenClaw (open source): A detailed power-user walkthrough shows how OpenClaw gets used as the “glue layer” across daily knowledge work—personal CRM, KB, content pipeline, X search, analytics tracking, automations, backups, and memory—while routing execution across GPT‑5.3 Codex and Opus 4.6, as demonstrated in the Workflow video rundown.

The author also published the exact prompts behind those workflows in a public artifact, as linked in the Prompt pack follow-up and captured in the Prompt pack gist. For teams evaluating agent harnesses, this is a concrete “here’s the scaffolding” example rather than a generic endorsement.

Matthew Berman

@MatthewBerman

I'm one of the most advanced users of OpenClaw. OpenClaw + GPT5.3 Codex + Opus 4.6 has been the trifecta that changed everything. I made a video going over everything I'm doing with these tools. Learn these tools, stay ahead. Watch this video right now. 0:00 Intro 1:02 Show more

7:37 PM · Feb 11, 2026

2.5K

Read 107 replies

OpenClaw maintainer signals workload pressure and steps back briefly

OpenClaw (open source): The project’s creator publicly said they “need a break,” signaling maintainer bandwidth as a real constraint when a community harness scales quickly, per the Maintainer comment.

The surrounding replies also show users explicitly asking for roadmaps and more features, which frames the “scaling pains” as expectation management as much as engineering throughput, as captured in the Roadmap pressure screenshot.

Peter Steinberger 🦞

@steipete

Replying to @PhoenixDevLuca

Agree. I just need a break.

5:34 PM · Feb 11, 2026

722

Read 41 replies

OpenClaw stargazers reportedly targeted via GitHub scraping for cold email

OpenClaw (open source): A complaint alleges a startup scraped the list of users who starred OpenClaw and emailed them (“I noticed that you starred OpenClaw”), raising a practical trust issue for open-source adoption funnels and the privacy surface of GitHub’s API, per the Stargazer email report.

The post frames this as both a growth tactic and an ecosystem problem—if “who starred what” is easily extractable at scale, dev tools with large star counts become easy outbound targets, as argued in the same Stargazer email report.

Peter Steinberger 🦞

@steipete

Did @aden_hq just spam all 180k followers who starred @openclaw? Kudos, that's bold. Also kinda bummed that @github 's API allows to just scrape every users who stars a repo.

1:48 PM · Feb 11, 2026

2.2K

Read 164 replies

ClawHub gets blunt usability feedback: “unusable”

ClawHub (OpenClaw ecosystem): A user posted a short clip calling ClawHub “unusable,” highlighting day-to-day friction in the ecosystem UI layer even when the underlying agent harness is popular, as shown in the ClawHub complaint.

The critique is about operational UX (interaction lag and control issues), not model quality—useful context for leaders tracking whether agent adoption is being limited by tooling ergonomics rather than capability, per the same ClawHub complaint.

Luca - App Developer

@PhoenixDevLuca

Aheemmm… @steipete I love OpenClaw, but ClawHub is unusable 😂

5:26 PM · Feb 11, 2026

111

OpenClaw builds a Game Boy Snake clone via a local emulation feedback loop

OpenClaw (open source): A builder reports using OpenClaw with Gemini 3 Flash to generate a Game Boy Snake clone that runs on an emulator—explicitly calling out a “local emulation feedback loop” during development, per the Game Boy build report.

This is a crisp example of an agent workflow that benefits from tight run-verify-iterate cycles in a constrained environment (emulator), as described in the same Game Boy build report.

fofr

@fofrAI

I wanted to challenge OpenClaw and Gemini 3 Flash. I asked it to make a Game Boy snake game, but as a real game that would run on an emulator. That means native Z80 Assembly / RGBDS, apparently. It made this in ~2hrs with its own local emulation feedback loop.

8:54 PM · Feb 11, 2026

Adam Silverman (Hiring!) 🖇️

OpenClaw vs Claude Code/Cowork: people ask what’s uniquely enabled

OpenClaw (open source): A thread prompt asks what people do with OpenClaw that they can’t already do with Claude Code or Claude Cowork, which is a useful framing for evaluating whether a third-party harness is adding unique orchestration primitives (scheduling, hooks, multi-tool automation) versus being “just another chat surface,” per the OpenClaw comparison question.

A follow-up question pushes for specifics on “cron jobs and hooks,” implying the differentiator might be operational automation patterns rather than raw coding assistance, as asked in the Cron jobs and hooks ask.

@adamsilverman

What are you doing with OpenClaw that you are not able to do with Claude Code or Claude Cowork?

8:38 PM · Feb 11, 2026

Read 28 replies

Community claim: OpenClaw passed VS Code in GitHub stars

OpenClaw (open source): A retweeted claim says OpenClaw has surpassed VS Code in GitHub stars (and multiples of other projects), which is a pure adoption-signal datapoint—more about community scale than feature capability—per the Stars comparison claim.

No independent verification artifact is included in the tweet thread, so treat it as directional sentiment about momentum rather than a confirmed metric, based on the same Stars comparison claim.

Yuchen Jin

@Yuchenj_UW

OpenClaw passed VSCode in GitHub stars, 2x PyTorch, 3x Claude Code. This was not on my 2026 bingo card. @steipete might've actually built the first one-person, billion-dollar company, staffed entirely by AI agents (Codex).

4:55 AM · Feb 12, 2026

422

Read 31 replies

OpenClaw “sub-agent swarms” demo gets livestreamed and shared

OpenClaw (open source): A livestream/demo shows “sub-agent ready swarms” running via OpenClaw with Orgo, positioning OpenClaw as a coordination harness for multiple concurrent agent threads, per the Swarms livestream note.

A replay link is provided via the same thread’s follow-up, pointing to the YouTube replay.

Ray Fernando

@RayFernando1337

Live: openclaw sub-agent ready swarms with Orgo @nickvasiles

8:50 PM · Feb 11, 2026

OpenClaw project builds a prompt library and vector search for better image prompting

OpenClaw (open source): A builder describes using OpenClaw and Gemini 3 Flash to build a system that writes its own image prompts (avoiding “keyword slop”), including a vector search index over 500+ prior prompts for inspiration, per the Prompt generation project.

The artifact shown includes a generated prompt card and an example output, implying the workflow is “retrieve past style → generate new prompt → render,” as documented in the same Prompt generation project.

fofr

@fofrAI

I've been experimenting with OpenClaw and Gemini 3 Flash to build harder things. For a long time I've wanted a model to write its own prompts and make good images that interest me, prompts that aren't keyword slop. I had some agents work on this problem, including making a Show more

1:38 PM · Feb 11, 2026

134

Honolulu OpenClaw meetup gets scheduled for Feb 13

OpenClaw (community): A local OpenClaw meetup is scheduled in Honolulu, with details shared via the Meetup announcement and the linked Meetup page.

It’s a small but concrete signal that OpenClaw is forming in-person user groups, which tends to correlate with sustained tool adoption beyond online novelty, as implied by the same Meetup announcement.

Ray Fernando

@RayFernando1337

Who is in Hawaii this Friday? Let’s meet at 6pm HST

1:01 AM · Feb 12, 2026

✅ Quality, review, and safety rails for agent-written code

Engineering hygiene to keep agent throughput from breaking production: test harnesses, review bottlenecks, sandboxing and flags, and practical verification patterns. Excludes Codex harness engineering details (covered in the Codex category).

Vercel Sandbox adds network egress controls to limit agent data exfiltration

Vercel Sandbox (Vercel): Vercel added network isolation and explicit egress policies, so agent-run code can be constrained to an allowlist of outbound domains; the CLI supports an --allowed-domain flow, as shown in Allowed-domain demo.

The rollout is also reflected in the Vercel changelog, which describes “advanced egress firewall filtering,” as detailed in Changelog announcement and the linked Changelog post. The practical impact is that teams can move from “ask forgiveness” network access to “prove necessity” network access for agent sandboxes.

Guillermo Rauch

@rauchg

Vercel Sandbox isolation levels: ✅ Compute & memory resource isolation ✅ Filesystem and durability isolation 🆕 Network isolation Wild how easy this is: --𝚊𝚕𝚕𝚘𝚠𝚎𝚍-𝚍𝚘𝚖𝚊𝚒𝚗 (CLI) or 𝚗𝚎𝚝𝚠𝚘𝚛𝚔𝙿𝚘𝚕𝚒𝚌𝚢 in 𝚂𝚊𝚗𝚍𝚋𝚘𝚡.𝚌𝚛𝚎𝚊𝚝𝚎. Try it out:

Vercel

@vercel

Secure your agents and prevent data exfiltration with Vercel Sandbox. You can now control network traffic by configuring egress policies. vercel.com/changelog/adva…

6:44 PM · Feb 11, 2026

224

Read 16 replies

Claude Code in Slack ships Plan Mode, and the “plan artifact” debate follows

Claude Code in Slack (Anthropic): Anthropic added Plan Mode to Claude Code’s Slack experience—Claude asks clarifying questions and proposes an implementation plan before proceeding, per the product demo in Slack plan mode demo.

There’s already pushback that “plan mode sucks,” with an alternative workflow of keeping a persistent plan doc outside compaction and iterating against that artifact, as captured in the discussion screenshot in Plan mode critique.

Slack installation and docs are linked in Slack app install and the Slack app docs, which matters for teams trying to standardize how plans are reviewed before agent execution.

Thariq

@trq212

We've added Plan Mode to Claude Code in Slack. When you give Claude a complex task it will ask you clarifying questions and show you an implementation plan before proceeding.

5:10 PM · Feb 11, 2026

659

Read 46 replies

Review throughput emerges as the limiting factor as code generation gets cheap

Code review throughput: A recurring claim is getting stated bluntly—“the bottleneck isn’t compute, it’s biology”—arguing that code generation is approaching machine speed while review remains human-speed, leading to teams “drowning in PRs,” as framed in Review bottleneck post. The same post reframes the skill shift as auditing code quickly (often with LLMs) rather than writing it.

This is less about any single tool and more about an org-level failure mode: agent adoption increases the cost of quality gates unless review workflows and test signals scale with it.

Anurag Goel

@anuraggoel

The bottleneck isn't compute. It’s biology. As we've driven the marginal cost of writing code to zero, the cost of reviewing it hasn't budged. We are drowning in PRs because we write at machine speed but review at human speed. The new directive isn't "learn to code." It's Show more

7:50 PM · Feb 11, 2026

Vercel launches Vercel Flags as a safety valve for agentic shipping

Vercel Flags (Vercel): Vercel shipped Vercel Flags and explicitly frames flags as a way to “de-risk agentic engineering” as teams scale via agents, per the product note in Vercel Flags announcement; docs are live in the Flags docs. The operational point is familiar but newly urgent: when PR throughput spikes, flags become a first-line containment tool for shipping partial agent output without exposing it to everyone.

Vercel also claims heavy internal dogfooding for velocity in the Vercel Flags announcement, which is a useful signal that rollout/rollback is being treated as a default posture rather than an edge-case process.

Guillermo Rauch

@rauchg

Today we're announcing our very own Vercel Flags. Flags are essential to "ship fast without breaking things" as engineering teams grow. But now *every* engineering team is growing, because it's you + agents. Flags help you de-risk agentic engineering: vercel.com/docs/flags/ver… Show more

8:03 PM · Feb 11, 2026

494

Read 38 replies

Zed hard-blocks dangerous shell commands even in chained expressions

Zed Agent permissions (Zed): Zed added hardcoded safety guards that block dangerous commands like rm -rf /, including when they’re buried in chained shell expressions (e.g., ls && rm -rf /), as described in Hardcoded guardrails note. This is a distinct posture from allow/deny lists alone: it’s an invariant that can’t be relaxed via settings.

Zed

@zeddotdev

Replying to @zeddotdev

Agents don't typically try to `rm -rf /` your machine, but even if one does, Zed's tool permission layer now blocks it before it can execute. These hardcoded safety guards catch dangerous commands even in chained expressions like `ls && rm -rf /`, and can't be overridden by any Show more

10:05 PM · Feb 11, 2026

Property-based equivalence tests as a guardrail for agent refactors

Property-based testing as a safety rail: A practical technique is resurfacing for “rip out dependency / reimplement locally” work—write a bridge that calls the old code and assert equivalence across generated cases, per the suggestion in Property-based testing tip. This pairs well with agent-generated rewrites because it gives a deterministic pass/fail signal that doesn’t rely on subjective review.

Christopher Ehrlich

@ccccjjjjeeee

Christopher Ehrlich

@ccccjjjjeeee

2:16 PM · Feb 10, 2026

Read 40 replies

📏 Benchmarks & measurement: coding arenas, time-horizon plots, and evaluation gaps

Measurement chatter today is about coding agent leaderboards, model selection signals, and the widening eval gap—not GLM‑5’s scores (those are in the feature).

Windsurf Arena Mode leaderboard points to speed as the winning UX metric

Windsurf Arena Mode (Windsurf): A week in, Arena Mode logged ~40,000 votes and surfaced a consistent preference for “fast but good enough,” with several notable “upsets” called out in the Leaderboard highlights and contextualized in the Leaderboard blog post. This is a measurement signal, not a benchmark claim: it’s explicitly optimizing for human-in-the-loop coding ergonomics rather than pure frontier accuracy.

• Upset pattern: The same post highlights Gemini 3 Flash beating 3 Pro, Grok Code Fast beating Gemini 3, and Claude Haiku 4.5 beating GPT-5.2, all as “major upsets” in this arena’s objective function per the Leaderboard highlights.

It’s an early datapoint that “model choice” in IDE workflows is drifting toward latency/iteration-loop preference, even when engineers believe a slower model is smarter.

swyx

@swyx

Arena Mode leaderboard is out! - 40,000 votes in first week (code arena has 140k lifetime votes) - first in-product arena at scale - first arena NOT to penalize “fast but good enough” - major upsets: •Gemini 3 Flash beat Gemini 3 Pro •@xai Grok Code Fast beat Gemini 3 •Claude Show more

Windsurf

@windsurf

The Arena Mode Public Leaderboard is live! Top Frontier models: 1. Opus 4.6 2. Opus 4.5 3. Sonnet 4.5 Top Fast models: 1. SWE 1.5 2. Haiku 4.5 3. Gemini 3 Flash Low Live leaderboard link and analysis below.

9:54 PM · Feb 11, 2026

Read 15 replies

Code Arena adds multi-file app builds to evaluate agentic web-dev workflows

Arena Code (Arena): Code Arena added multi-file apps, positioning it as a closer proxy for production web-dev agent workflows (project structure, cross-file edits, integration points) rather than single-prompt snippets, as announced in the Multi-file apps announcement.

• Workflow surface: The rollout framing emphasizes “production-ready projects” and “real-world, agentic coding tasks,” which changes what’s being measured versus single-file codegen, as stated in the Multi-file apps announcement.
• Where it lives: The entry point is the Code Arena UI at the Code Arena destination, which the follow-up post uses as the canonical surface for trying multi-file comparisons.

Arena.ai

@arena

Multi-file apps are now live in Code Arena! Since launching Code Arena in November to evaluate frontier AI models on real-world, agentic coding tasks, we’ve received a lot of feedback asking to adapt more complex workflows. With multi-file apps, you can now build and compare Show more

6:07 PM · Feb 11, 2026

Read 9 replies

METR time-horizon debate shifts from “when” to “how to measure” multi-hour tasks

METR time-horizon plot (METR): A new poll asks whether the METR-style curve hits ~20-hour tasks by Jan 1, 2027 or ~50-hour tasks by 2028, alongside the more operational question: “What would be the right way to measure tasks of that scope?” as posed in the Time-horizon poll.

The core measurement problem being surfaced is scoping: multi-hour “tasks” are rarely single-threaded, and evaluation design needs to decide what counts as success (handoffs, partial credit, tool failures, retries) rather than only extrapolating from shorter task distributions.

Nathan Lambert

@natolambert

Poll. Do you see the famous @METR_Evals plot holding true on Jan. 1st of 2027 (~20 hours), or 2028 (~50 hours). What would be the right way to measure tasks of that scope?

5:11 PM · Feb 11, 2026

Read 21 replies

OpenHands agentic coding index highlights score vs cost vs runtime tradeoffs

OpenHands agentic coding index (OpenHands): A leaderboard snapshot shows Claude Opus 4.6 leading on average score, but with closely tracked average cost and runtime comparisons that make it harder to treat “#1” as a single dimension, as shown in the OpenHands index post.

• What engineers can actually infer: The table format (Average Score / Average Cost / Average Runtime) makes explicit that model selection for agentic coding is a three-way trade (quality, dollars, wall-clock), not a single scalar, as shown in the

Lisan al Gaib

@scaling01

Claude 4.6 Opus #1 on the OpenHands agentic coding index

4:25 PM · Feb 11, 2026

110

Open Benchmarks Grants signal more money and coordination for harder evals

Open Benchmarks Grants (SnorkelAI + partners): A partnership announcement frames the core bottleneck as measurement—“the world needs more hard benchmarks”—and points at new funding/coordination mechanisms in the Grants partnership note.

• Follow-on signal: Separately, there’s a claim that “a large eval company is starting a task force” to launch something in 1–2 years, as stated in the Eval task force claim.

No specific benchmark spec is described in the tweets, but the combined signal is that eval infrastructure is becoming an organized, staffed effort rather than a community side-project.

Alex Shaw

@alexgshaw

Excited to partner with @SnorkelAI on the Open Benchmarks Grants. The world needs more hard benchmarks. Consider using Harbor to build you benchmark!

vincent sunn chen

@vincentsunnchen

Our ability to measure AI has been outpaced by our ability to develop it, and this evaluation gap is one of the most important problems in AI. Today we're launching Open Benchmarks Grants — a $3M commitment to fund open benchmarks for frontier AI and close the evaluation gap.

7:32 PM · Feb 11, 2026

⚙️ Inference & serving engineering: throughput, long-context scheduling, and hybrid attention

Serving-side engineering updates beyond the GLM‑5 rollout: cache-aware scheduling, long-context efficiency, and new attention architectures aimed at faster inference. Excludes GLM‑5 day‑0 serving posts (feature).

Cache-aware CPD adds a third tier for long-context serving and claims +40% sustainable throughput

Cache-aware scheduling (Together Research): Together describes cache-aware prefill–decode disaggregation (CPD) as a scheduling fix for long-context inference—separating cold requests that need full prefill from warm follow-ups that can reuse KV cache; they report up to ~40% higher sustainable throughput without changing model weights or hardware, per the CPD thread and the linked technical writeup.

• Three-tier serving shape: CPD introduces pre-prefill nodes for cold contexts that write KV state into a distributed cache, while warm requests fetch cached KV blocks via RDMA and skip recomputation—keeping decode isolated and latency-focused, as described in the tier breakdown.

The point is that, as context windows stretch into 100K+ tokens, KV reuse and queueing policy start to dominate TTFT and tail latency under load, which is the core claim in the throughput results.

Together AI

@togethercompute

New from Together Research: cache-aware prefill–decode disaggregation (CPD) improves sustainable throughput for long-context inference by up to 40% — without changing the model or hardware. The key insight: the bottleneck is scheduling, not compute. 🧵

6:22 PM · Feb 11, 2026

MiniCPM-SALA claims 3.5× faster 256K inference via sparse+linear hybrid attention

MiniCPM-SALA (OpenBMB): OpenBMB announced MiniCPM-SALA, a 9B model trained with a hybrid Sparse-Linear Attention (SALA) architecture—75% linear attention for global flow and 25% sparse attention for recall; they claim 3.5× inference speedup vs Qwen3-8B at 256K context and support up to 1M context on edge GPUs, per the release thread and the linked model card.

• Positional and length generalization: the release highlights a hybrid positional encoding (HyPE) intended to keep behavior stable across varying sequence lengths, as described in the release thread.

• Inference-optimization pressure test: OpenBMB also launched the SOAR optimization contest targeting SGLang acceleration for this architecture (single/multi-batch, ultra-long context on consumer VRAM, low latency), per the competition details.

OpenBMB

@OpenBMB

🚀 We are excited to unveil #MiniCPM-SALA, the industry's first 9B model trained with Sparse-Linear Attention (SALA) hybrid architecture. At OpenBMB, we believe the next leap in #LLM scaling lies in Hybrid Architectures. MiniCPM-SALA, has validated this vision by achieving a Show more

3:33 PM · Feb 11, 2026

147

Read 43 replies

vLLM passes 70K GitHub stars and spotlights Blackwell multi-node serving primitives

vLLM (vLLM Project): vLLM crossed 70K GitHub stars and used the milestone to highlight recent work on large-scale serving—especially production multi-node support on NVIDIA Blackwell with WideEP and expert parallelism, plus broader async scheduling and multimodal streaming work, as summarized in the 70K stars post.

• Serving focus: the post frames recent engineering as making “the biggest models” practical to serve at scale (multi-node + expert parallelism), alongside real-time streaming for speech/audio and a “growing multimodal story,” per the 70K stars post.

• Ecosystem signal: it also notes the founding of Inferact by core maintainers (inference cost/latency focus), which matters if you’re tracking where vLLM’s production roadmap might concentrate next, per the 70K stars post.

vLLM

@vllm_project

🚀 vLLM just hit 70K GitHub stars! 🎉 The engine has kept evolving fast since the last milestone. We've been pushing hard on large-scale serving — production-grade multi-node support on NVIDIA Blackwell with WideEP and expert parallelism, making it practical to serve the biggest Show more

8:35 AM · Feb 11, 2026

BaseTen’s Kimi K2.5 speed recipe leans on EAGLE-3 speculation and NVFP4 on Blackwell

Kimi K2.5 inference (BaseTen): BaseTen published a concrete recipe for speeding up Kimi K2.5 inference using a custom EAGLE-3 speculator trained on synthetic queries plus INT4→NVFP4 conversion to unlock NVIDIA Blackwell inference, per the performance roundup and the linked technical post.

The engineering takeaway is that they’re stacking speculative decoding plus new low-precision paths (NVFP4) as a combined latency/throughput lever, rather than treating quantization and decoding tricks as separate optimizations, as described in the performance roundup.

Baseten

@basetenco

Following up on yesterday's release 🚨 How did we build the fastest Kimi K2.5 inference? • Custom EAGLE-3 speculator trained on synthetic query dataset • INT4 to NVFP4 conversion to unlock Blackwell inference Get the technical details: baseten.co/blog/how-we-bu…

Baseten

@basetenco

Introducing Kimi K2.5 on Baseten’s Model APIs with the most performant TTFT (0.26 sec) and TPS (340) on Artificial Analysis. Even among a landscape of incredible open source models, Kimi K2.5 stands out with its multi-modal capabilities and it's ability to accommodate an

3:37 PM · Feb 11, 2026

Crush adds multi-process management for running multiple agent and serving loops in parallel

Crush (Charm): Crush can now run and manage multiple background processes—multiple web servers, docker swarms, or other long-running jobs inside the terminal UI—shown in the feature demo.

This is a pragmatic fit for local inference + eval setups where you’re often juggling several services (gateway, tracing, cache, model server) and want them controllable from one interface, as demonstrated in the feature demo.

Charm

@charmcli

Crush let's you run and manage multiple background processes. Multiple web servers, docker swarms, anything you want. Just let it run stuff for you in the background!

4:09 PM · Feb 11, 2026

🛡️ Security, safety, and platform abuse signals (spam, privacy, and cyber risk)

Today’s security signal is about agent misuse risk, privacy surfaces, and automation flooding—plus governance/safety-org churn. Excludes AI-infrastructure power commitments (covered under infrastructure).

Embedding vectors aren’t “irreversible” anymore: Jina AI inverts embeddings back to text

Embedding inversion (Jina AI): A new demo shows recovering original text from embedding vectors using conditional masked diffusion, challenging the common assumption that stored embeddings are “safe” because they’re non-human-readable; Jina claims ~80% token recovery with a 78M parameter inversion model against Qwen3-Embedding and EmbeddingGemma vectors, as described in the demo overview and method details. This is a privacy and security issue. Embeddings can carry secrets.

• Why this is different: Instead of autoregressive vec2text plus iterative re-embedding, they condition a denoiser on the target embedding via AdaLN-Zero and refine all positions in parallel, as explained in the method details.
• Operational implication: Any product logging or sharing embeddings (telemetry, vector DB backups, vendor “debug traces”) may need to treat them like sensitive plaintext, given the inversion capability shown in the demo overview and the linked live demo.

Jina AI

@JinaAI_

Most don't know (1) how easy it is to invert embedding vectors back into sentences, (2) this is a perfect task text diffusion models. Here's a 78M parameter model and live demo that recovers 80% of tokens from Qwen3-Embedding and EmbeddingGemma vectors. Works even on multilingual Show more

7:34 PM · Feb 11, 2026

Bot automation is expected to overwhelm more channels within ~90 days

Platform abuse (automation & spam): A circulated prediction argues that in <90 days “all channels we thought were safe from spam & automation” will be flooded, as amplified in the spam prediction RT. This is a direct product risk. It hits support inboxes, community channels, and even internal tooling.

The point is scaling “agentic” posting changes the baseline. Moderation load and trust signals become core infrastructure, not a side feature, per the framing in the spam prediction RT.

Nikita Bier

@nikitabier

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭

Prediction: In less than 90 days, all channels that we thought were safe from spam & automation will be so flooded that they will no longer be usable in any functional sense: iMessage, phone calls, Gmail. And we will have no way to stop it.

5:09 PM · Feb 11, 2026

17.5K

Read 2.8K replies

Dual-use anxiety rises as builders call Claude-based work “cyber-weapon level”

Dual-use (Claude Opus 4.6): A developer claims a principal threat researcher told them their Opus‑4.6-driven project can’t be open-sourced because it’s a “nation‑state‑level cyber weapon,” as stated in the cyber weapon comment. That’s a strong signal. The details aren’t provided.

This illustrates the widening gap between “can a model write code?” and “can it produce operational exploit chains,” and it’s part of why teams are increasingly treating model access, logging, and sharing policies as security controls, not just compliance paperwork, as suggested by the tone in the cyber weapon comment.

@elder_plinius

showed my buddy (a principal threat researcher) what i've been cookin with Opus-4.6 and he said i can't open-source it because it's a nation-state-level cyber weapon 🥲

2:44 PM · Feb 11, 2026

1.9K

Read 138 replies

Signal’s founder repeats: Telegram isn’t a private messenger

Messaging security (Telegram vs Signal): A quote attributed to Signal founder Moxie Marlinspike is resurfacing, stating “Telegram’s not a private messenger,” as shared in the Telegram privacy RT. This matters if teams use chat apps to move model outputs, credentials, incident info, or customer data.

It’s not a new technical disclosure. It’s a reminder about threat models and what “private” means in practice, per the Telegram privacy RT.

Sabrina Halper

@SabrinaHalper

Founder of @signalapp, @moxie Marlinspike on Telegram: "Telegrams not a private messenger. There's nothing private about it. It's the opposite. It's a cloud messenger where every message you've ever sent or received is in plain text in a database that telegram the organization Show more

12:14 AM · Feb 11, 2026

5.8K

Read 315 replies

🔌 Compute, power, and hardware supply chain for AI buildout

Concrete infra moves affecting capacity and cost: power pricing commitments, custom inference silicon, and datacenter power delivery experiments. Excludes funding/valuation chatter (business category).

ByteDance plans an in-house inference chip and targets 100k units in 2026

ByteDance (Reuters via rohanpaul_ai): ByteDance is reportedly developing an in-house AI inference chip and is in talks with Samsung for manufacturing; the report says ByteDance is targeting at least 100,000 chips in 2026, with a possible ramp toward 350,000, and that access to scarce memory (HBM/DRAM) is part of the discussions, per the Reuters excerpt.

This matters because it reinforces a supply-chain reality builders already feel: GPU availability isn’t the only limiter—memory supply can bottleneck deployments even when compute silicon exists. It also signals more vertical integration pressure on NVIDIA-alternatives and on memory allocation across hyperscalers and large AI buyers.

ByteDance is building an in-house AI inference chip and is in talks with Samsung to manufacture it, with memory supply as part of the discussions. ByteDance is targeting at least 100,000 chips in 2026, with a possible ramp toward 350,000. The deal talks reportedly include Show more

8:48 PM · Feb 11, 2026

162

Read 9 replies

Anthropic says it will pay 100% of grid upgrade costs tied to its data centers

Anthropic (AnthropicAI): Anthropic says it will cover electricity price increases attributable to its data centers by paying 100% of grid upgrade costs, working to bring new power online, and investing in systems that reduce grid strain, as laid out in the policy post and detailed in the Policy post. This is a direct attempt to pre-empt “AI data centers raise my rates” backlash and permitting friction.

For engineering and infra leads, the practical implication is that power contracts and interconnection work are becoming a first-class part of AI delivery, not a back-office detail; this kind of pledge can shift how projects get approved, where capacity is available, and how costs get allocated across tenants and regions.

Anthropic

@AnthropicAI

We're committing to cover electricity price increases from our data centers. To ensure ratepayers aren’t picking up the tab, we'll pay 100% of grid upgrade costs, work to bring new power online, and invest in systems to reduce grid strain. Read more: anthropic.com/news/covering-…

9:15 PM · Feb 11, 2026

4.1K

Read 196 replies

Microsoft tests superconducting power cables to move more MW into AI data centers

Power delivery (Microsoft): Microsoft is testing high-temperature superconductor (HTS) cables for AI data centers, citing a factory test and demo around a 3MW superconducting cable; the pitch is much higher power density (claims of ~10× smaller/lighter delivery) by eliminating resistive losses once cooled to around −200°C, as described in the HTS cable thread.

The trade-off highlighted in the same thread is operational: HTS shifts constraints from copper losses to cryogenic cooling reliability, maintenance, and failure handling. If it works, it’s a plausible lever for faster site power-ups and denser rack footprints without needing the same right-of-way and trenching as conventional transmission.

Microsoft is testing high-temperature superconductor (HTS) cables to move more power through AI datacenters with less loss and less physical bulk. The idea is that once an HTS cable is cooled to cryogenic temperatures around −200°C, current can flow with effectively 0 Show more

9:00 PM · Feb 11, 2026

xAI “Macrohard” recirculates as a GW-scale power-and-GPU buildout signal

xAI infrastructure: Posts are recirculating stats about xAI’s “Macrohard” compute site—framed as 1+ GW scale with 12 data halls, 27,000 GPUs, and 200,000+ fabric connections, as shown in the cluster tour clip.

A separate graphic making the rounds claims an even larger snapshot—“330K+ GPUs,” “>1GW nameplate power,” and “558 Megapacks = 2,293 MWh,” as shown in the stats graphic.

The numbers conflict across sources, so treat them as directional rather than audited; the consistent throughline is that power delivery and on-site energy storage are being discussed as core scaling primitives, not secondary facilities work.

Inside xAI’s “Macrohard”. The 1+ GW world’s largest compute cluster, spanning 12 data halls, each with 850+ miles of fibre, per Data Hall. The cluster runs 27,000 GPUs and 200,000+ connections across its fabric.

5:50 AM · Feb 12, 2026

Mistral plans a €1.2B AI infrastructure buildout in Sweden for 2027

Mistral (Bloomberg via rohanpaul_ai): Mistral AI is reported to be planning a €1.2B AI infrastructure facility in Sweden targeting 2027 operations, positioning for European government and enterprise demand, per the Bloomberg snippet.

For AI platform leaders, this is a “sovereign compute” signal: Europe is still trying to secure domestic capacity and procurement pathways, which can affect where models are trained/served (data residency) and how quickly regional inference capacity grows relative to US hyperscalers.

Mistral AI plans a €1.2 billion AI infrastructure facility in Sweden, targeting 2027 operations and positioning for European government and enterprise demand. --- bloomberg .com/news/articles/2026-02-11/mistral-invests-1-2-billion-in-swedish-ai-data-center-buildout

9:02 PM · Feb 11, 2026

💼 Enterprise adoption & capital signals around AI tools

Buyer behavior and capital flows relevant to engineering leaders: who’s paying for which models, overlap/churn, and large strategic investments. Excludes infra buildouts (infrastructure category).

Ramp AI Index shows Anthropic growth is mostly within existing OpenAI customers

Ramp AI Index (enterprise adoption): Ramp spend data shows Anthropic reached 19.5% of U.S. businesses with paid AI subscriptions (up from 16.7%) while OpenAI is at 35.9%; a key nuance is that 79% of Anthropic customers also pay OpenAI and churn is ~4% for both, per the Ramp index analysis.

• Buyer behavior: this reads less like vendor displacement and more like “second provider added” inside the same org, as argued in the Ramp index analysis.
• Planning implication: multi-model procurement looks normalizing (budgets split across vendors), which tends to push engineering leaders toward routing/benching and vendor redundancy rather than single-stack commitments, per the same Ramp index analysis.

Ara Kharazian

@arakharazian

1 in 5 businesses on Ramp now pay for Anthropic. A year ago, it was 1 in 25. Latest Ramp AI Index shows Anthropic surged from 16.7% to 19.5% of businesses while OpenAI slipped to 35.9%. The natural question: is Anthropic winning at OpenAI's expense? I think popular analyses Show more

5:29 PM · Feb 11, 2026

474

Blackstone reportedly increases Anthropic stake to about $1B at a ~$350B valuation

Anthropic funding signal (Blackstone): Blackstone is reportedly increasing its Anthropic investment to about $1B, with an estimated $350B valuation, according to a Reuters item shared in the Reuters screenshot.

This fits as an incremental datapoint on the broader late-stage funding appetite for frontier model providers—especially relevant for engineering leaders forecasting medium-term pricing stability, enterprise support capacity, and long-term model roadmap continuity.

Blackstone is increasing its investment in Anthropic to about $1B at an estimated $350 billion valuation Blackstone, the world's largest alternative asset manager is investing an additional $200m as part of Anthropic's ongoing funding round. --- reuters. Show more

8:29 PM · Feb 11, 2026

🎬 Generative video, image, and voice models: quality jumps and workflow stacks

High volume of creative-model evidence: Seedance 2.0 clips, realtime world/video claims, and voice latency improvements—useful for teams shipping media features. Excludes drug design/biomed topics.

SeeDance 2.0 clips dominate “text-to-video feels solved” chatter

SeeDance 2.0 (ByteDance): Following up on Hype questions (consistency/bias concerns), today’s feed is packed with “one-shot” anime-style outputs—people are explicitly calling it “passed the video Turing test” while highlighting the economics (a 10-minute clip taking ~8 hours and costing ~$60) in posts like Cost breakdown and Long clip example.

• Range of prompts: examples span manga→anime adaptation in a single go per Manga to anime claim, plus short comedic/character acting setups (otter sitcom variants) shown in Prompted sitcom clip.
• Production signal: users are framing this as a compute-demand shock (“explosion of demand for compute”) in Manga to anime claim, but there’s no official provider/SDK surface in the tweets to validate workflows beyond demos.

The dominant mood is excitement; the main missing piece is trustworthy access and repeatable tooling outside China.

Emad

@EMostaque

10 minute SeeDance 2 video made by a Chinese blogger, took 8 hours and cost $60 You'll have your agents doing this for you, checking quality, individual shots, frames, audio and more Knew this was coming around about now a few years ago, but still crazy to see Magic 🪄

xiaobeiLin(小北)

@linxiaobei888

这应该是我目前为止看过最牛逼的 seedance 2 视频了，如果我不知道 seedance 2 ，我很难相信这是AI做的。视频号博主 @星辰AI研究生花了 8 小时制作了这个10分钟的“嘿神话-前传2”，花了 18000 积分，成本在 400 块人民币，他说是首次使用，浪费了一半的积分，应该也是抽了不少卡。

9:56 AM · Feb 11, 2026

755

Read 43 replies

ElevenLabs adds Expressive Mode to ElevenAgents for more human calls

ElevenLabs ElevenAgents (ElevenLabs): ElevenLabs shipped “Expressive Mode” for its agent voice stack—positioned as more emotional, context-aware delivery and real-time turn-taking across 70+ languages in Expressive mode details. Separately, builders keep fixating on latency (“voice but especially latency”) in reactions like Latency reaction, and the company is doubling down on “voice replaces outdated interfaces” messaging in its summit keynote clip Summit keynote clip.

The engineering takeaway is that speech agents are getting judged less on raw fidelity and more on conversational timing and interruption behavior (where most stacks still feel brittle).

Wes Roth

@WesRoth

ElevenLabs has launched Expressive Mode for ElevenAgents, a major leap in AI voice that brings emotional intelligence to customer support. Built on Eleven v3 Conversational (their most context-aware text-to-speech model) and a new real-time turn-taking system, these agents now Show more

ElevenLabs

@elevenlabsio

Introducing Expressive Mode for ElevenAgents - voice agents so expressive, they blur the line between AI and human conversations. This is an unedited recording of an agent empathizing with a customer at peak frustration.

9:30 AM · Feb 11, 2026

SeeDance 2.0 access gets messy: scam warnings and “wrapper” claims

SeeDance 2.0 (ByteDance): Continuing Access notes (BytePlus+VPN access chatter), creators are now explicitly warning that “wrapper” platforms may falsely claim exclusive access and that people can get scammed, as argued in Wrapper scam warning. A recurring theme is that the model is “not currently available outside China,” while third parties advertise “unlimited access” anyway, per Wrapper site promo.

• Operational risk: the guidance is to wait for “trusted platforms” in Wrapper scam warning, which matters because the same workflows that make the clips look real also make phishing/fake-hosting easy.

Net: even if the model quality is real, distribution uncertainty is a practical blocker for teams trying to ship features with predictable uptime and terms.

proper

@ProperPrompter

friendly reminder to be careful and wait for trusted platforms to release seedance v2.0. it's a common strategy by "wrapper"-type companies with nothing to lose to claim a model is available exclusively on their platform. don't get scammed 🙏

2:47 PM · Feb 11, 2026

Local video generation stack: Nano Banana stills → LTX-2 animation

Local video workflow (LTX-2 + ComfyUI): A concrete “consumer GPU” stack is being shared: generate stills, then animate with LTX-2 locally, with reported generation times of ~6–10 minutes on a 4070 Ti in Local consumer GPU demo. The thread frames it as a repeatable loop—iterate on frames, reuse references, then run i2v—rather than a single prompt-and-pray approach.

This is the kind of workflow detail that matters more than headline model quality: it’s about how you actually amortize prompt/search time across multiple shots on local hardware.

TechHalla

@techhalla

we live in a time where we can create these videos locally, using regular consumer hardware. wanna learn how? I'll explain it right below 👇 (open source model used: LTX-2)

5:50 PM · Feb 11, 2026

Read 16 replies

PixVerse R1 surfaces as a “real-time interactive worlds” video model

PixVerse R1 (PixVerse): A new model branded as “real-time interactive worlds in 720P” is being circulated via a launch claim in RT launch blurb, with additional detail that it targets near-instant response by cutting sampling to 1–4 steps using an “Instantaneous Response Engine,” per the summary in Realtime pipeline notes.

This reads like an attempt to make video generation feel more like a game loop (latency-first), but the tweets don’t include a technical report, evals, or reproducible demos—so performance/quality tradeoffs vs longer-sample models are still unclear.

PixVerse

@PixVerse_

PixVerse R1 is here! Unlock the future of real-time interactive worlds in 720P. What's more? Experience endless possibilities for creation in R1 community. RT+Reply+Follow=300Creds (72H ONLY) + Invite Code (500 RANDOM WINNERS) in DMs

7:27 AM · Feb 11, 2026

711

Read 330 replies

Prompting pattern: shorter prompts can beat constraint-heavy ones in image editing

Prompt discipline (image models): A long practitioner note argues that adding many constraints to an image-edit prompt often makes results worse—producing “face-in-hole” artifacts—while a short schematic instruction can yield more natural outputs, based on a Nano Banana-style identity swap scenario in Prompt minimalism essay.

The core claim is that modern models already have strong defaults, and over-specification forces the model to optimize for satisfying every clause rather than realism; it’s framed as analogous to over-directing a skilled chef.

Jeffrey Emanuel

@doodlestein

I’ve mentioned this before, but I think it’s so revealing and important to understand that I want to convey it again: Suppose you have two images of different people and you want Nano Banana to take the clothing and pose and orientation of the first image but make it look like Show more

3:39 AM · Feb 12, 2026

Read 6 replies

Grok adds multi-reference image blending and web image display in voice

Grok (xAI): Two image-surface updates are being spotted: Grok web can combine 3 reference images into a new image per Three-reference feature, and Grok voice mode can display “real-time images from the internet” as shown in Voice web images.

This is a product signal that xAI is trying to collapse “search + show + speak” into one loop; the open question for builders is whether these are powered by a stable tool API or remain UI-only behaviors that can’t be integrated into agent workflows reliably.

Now you can combine 3 reference images on the Grok web into a new image.

Christopher Fryant

@cfryant

New update for Grok Imagine - add up to 3 images as references. Pretty impressive what you can create! My prompt: Bring all these concepts together into a single, incredible cinematic image.

7:31 PM · Feb 11, 2026

Qwen Chat patches a Qwen-Image 2.0 bug affecting ordering and consistency

Qwen-Image 2.0 (Alibaba/Qwen): Qwen says it patched a Qwen Chat bug that affected (1) ordering for classical Chinese poem image generation and (2) character consistency during image editing, with the fix announced in Bugfix announcement.

This kind of “small” fix matters operationally: it targets two common production pain points for image features—layout/sequence fidelity and identity consistency across edit passes—without implying a new model release.

Qwen

@Alibaba_Qwen

A quick update — we’ve fixed a Qwen-Image 2.0 bug in Qwen Chat that impacted: • Classical Chinese poem ordering in image generation • Character consistency during image editing ✅Patch is live now! chat.qwen.ai/?inputFeature=… Go test it out and drop us your feedback.

9:04 AM · Feb 11, 2026

341

📄 Research & technical writeups: agents for math, tiny GPTs, UI world models, and interpretability

Paper-and-implementation heavy posts today: math/science agents, minimal GPT implementations, GUI world modeling, and interpretability methods. Excludes bioscience/drug discovery items.

DeepMind’s Aletheia shows a verifier-driven loop for research-level math work

Aletheia (Google DeepMind): DeepMind shared results and workflows around an internal math research agent powered by an “advanced version of Gemini Deep Think,” emphasizing a generator→verifier loop (with reviser feedback) for research problems, as introduced in the DeepMind research post and the Aletheia paper announcement. This matters because it’s a concrete reference design for “research agents” that couple long-horizon search with explicit verification, not just prompting.

• Workflow pattern: The agent architecture is explicitly framed as generator/candidate solution→verifier with branches for “critically flawed” (loop back) vs “minor fixes” (reviser), as diagrammed in the DeepMind research post.
• Reported math performance: A shared leaderboard screenshot claims Aletheia 91.9% on IMO‑ProofBench Advanced with a breakdown that includes 100% on IMO 2024+, as shown in the leaderboard screenshot. Treat this as provisional until DeepMind publishes a canonical eval artifact, since the most detailed numbers in the feed are secondary commentary.
• Primary artifact: The Aletheia writeup is available via the Aletheia paper PDF, which is the cleanest place to verify the “open problems / publishable outputs” claims summarized in the Aletheia paper announcement.

Google DeepMind

@GoogleDeepMind