Qwen3‑TTS open-sources 0.6B and 1.8B models – 97ms latency claim

Stay in the loop

Free daily newsletter & Telegram daily report

Executive Summary

Qwen open-sourced the Qwen3‑TTS family (VoiceDesign, CustomVoice, Base), shipping weights/code/paper; the drop spans 5 models across 0.6B and ~1.7–1.8B sizes plus a 12Hz tokenizer; community recaps emphasize streaming-first behavior (first audio packet after 1 character) and ~97ms synthesis latency, but there’s no single independent benchmark artifact in today’s sources. Early hands-on chatter is positive on voice clone/design quality; the practical point is that “voice creation + cloning + full fine-tuning” is now in an open-weights bundle that can slot into local stacks.

• vLLM‑Omni: claims day‑0 native Qwen3‑TTS support; offline inference available now; online serving “coming soon.”
• Simon Willison: published a minimal CLI wrapper to generate WAVs from text + a voice instruction string; lowers the try-it barrier.
• Voice stack momentum: Chroma 1.0 markets <150ms speech-to-speech and voice cloning; Inworld TTS‑1.5 claims sub‑130ms (Mini) and $0.005/min—metrics remain unanchored without linked evals/docs.

Net signal: open TTS is converging on “deployable artifacts + serving substrate” rather than isolated demos; latency claims are loud, verification is still thin.

Cursor 2.4: subagents + image generation (parallel execution in-editor)

Cursor 2.4 makes multi-agent coding practical in a single editor: configurable parallel subagents (own context/tools/models) plus image generation. This shifts throughput and review patterns for teams shipping with agents.

Today’s dominant builder story is Cursor 2.4 shipping subagents for parallel work plus in-editor image generation. This category covers Cursor-specific workflow changes and excludes other coding tools (Claude Code/Codex) covered elsewhere.

Jump to Cursor 2.4: subagents + image generation (parallel execution in-editor) topics

🧩 Cursor 2.4: subagents + image generation (parallel execution in-editor)

Cursor 2.4 adds parallel subagents for faster task completion

Cursor 2.4 (Cursor): Cursor now spins up subagents to complete parts of a task in parallel, aiming to cut wall-clock time while keeping each worker’s context cleaner than a single giant thread—see the Subagents announcement for the core behavior.

• Longer-running work: Cursor frames subagents as enabling longer tasks by splitting work into independently running units, as described in the Subagents announcement.
• Practical use case: builders explicitly call out “spawning multiple browsers for research & QA” as a reason this matters, per the Parallel browsers use case.

Cursor

@cursor_ai

·Follow

Cursor now uses subagents to complete parts of a task in parallel. Subagents lead to faster overall execution and better context usage. They also let agents work on longer-running tasks. Also new: Cursor can generate images, ask clarifying questions, and more.

Watch on X

8:23 PM · Jan 22, 2026

2.8K

Read 146 replies

Cursor 2.4 adds in-editor image generation via Nano Banana Pro

Image generation (Cursor 2.4): Cursor can now generate images inside the editor, with Cursor explicitly tying the feature to Google’s Nano Banana Pro, as shown in the Image generation demo and called out in the main Subagents announcement.

The rollout is also summarized as “image generation powered by Nano Banana Pro” in the Version 2.4 recap.

Cursor

@cursor_ai

·Follow

Replying to @cursor_ai

Generate images from Cursor using Google’s Nano Banana Pro.

Watch on X

8:23 PM · Jan 22, 2026

580

Read 15 replies

Cursor 2.4 supports custom subagents invoked via /subagent-name

Custom subagents (Cursor 2.4): Cursor now lets you define your own subagents with custom prompts/tool access/models and call them by name in-chat, per the Subagents backstory and the original Subagents announcement.

• Configuration surface: subagents “can be configured with custom prompts, tool access, and models,” as restated in the Version 2.4 recap.
• Invocation model: Cursor highlights “invoke them with /subagent-name,” including mixing models within one workflow, according to the Subagents backstory.

Jediah Katz

@jediahkatz

·Follow

Really excited to finally launch subagents! Behind-the-scenes: I prototyped subagents for my Cursor interview in March, and we had them internally since May. But back then, nobody really enjoyed them and we didn't ship! There's a few reasons they're much better now: 1. Models Show more

Cursor

@cursor_ai

Watch on X

11:23 PM · Jan 22, 2026

204

Read 13 replies

Cursor 2.4: agents can ask clarifying questions without pausing work

Clarifying questions (Cursor 2.4): Cursor now supports agents asking clarifying questions mid-task “without pausing their work,” which changes how long-running agent loops can gather requirements without stopping execution, as shown in the Clarifying questions demo.

This capability is also bundled into the broader 2.4 feature drop described in the Subagents announcement.

Cursor

@cursor_ai

·Follow

Replying to @cursor_ai

Agents can now ask clarifying questions in any conversation without pausing their work.

Watch on X

8:23 PM · Jan 22, 2026

376

Read 14 replies

Cursor 2.4’s Explore agent writes fast research output to files

Explore agent (Cursor 2.4): Cursor’s new Explore subagent is described as extremely fast and produces its findings as a file you can reuse across chats, according to the Explore agent praise.

The same rollout notes tie Explore to a “fast subagent model” strategy (to reduce subagent latency), as explained in the Subagents backstory.

Numman Ali

@nummanali

·Follow

It’s not that they added sub agents It’s how The first one you'll notice is the Explore Agent. It's powered by Composer 1, and my god, it is so fast and coherent When it runs, you'll see its output in a file. So easy for you to understand and see, and then you can even Show more

Cursor

@cursor_ai

Watch on X

8:37 PM · Jan 22, 2026

101

Read 4 replies

Pattern: fast daily driver model plus slower verifier subagent in Cursor

Workflow pattern (Cursor subagents): Cursor users are explicitly describing a split where you “daily drive” a faster model and call a stronger/slower model as a verifier subagent for checks and reviews, as described in the Verifier subagent pattern.

This is framed as a first-class interaction: “invoke a smarter but slower GPT‑5.2 subagent to verify,” per the Verifier subagent pattern.

Jediah Katz

@jediahkatz

·Follow

Cursor

@cursor_ai

Watch on X

11:23 PM · Jan 22, 2026

204

Read 13 replies

Pattern: spawn multiple browser/research subagents for QA and investigation

Workflow pattern (Cursor subagents): Practitioners are calling out subagents as a way to run multiple research/QA threads at once—specifically “spawning multiple browsers for research & QA,” as noted in the Parallel browsers use case.

This use case is being discussed as a re-emergence (briefly available earlier, then pulled) rather than a brand-new idea, per the Parallel browsers use case.

eric zakariasson

@ericzakariasson

·Follow

subagents in cursor (again) we had this for a brief moment early last year, but it was not the time to ship it. glad to have them back, so many interesting use cases like spawning multiple browsers for research & QA

Cursor

@cursor_ai

Watch on X

8:31 PM · Jan 22, 2026

257

Read 19 replies

Why Cursor shipped subagents now: model gains + faster subagent model

Shipping rationale (Cursor subagents): Cursor leadership says subagents existed internally for months but weren’t enjoyable enough to ship; they’re claiming the inflection came from better frontier models delegating more effectively plus using a fast model (“Composer”) to reduce latency, per the Subagents backstory.

• Timing detail: “prototyped … in March” and “internally since May,” but held back due to user experience, as written in the Subagents backstory.
• Why it’s different now: “Models have improved” and a dedicated fast subagent model reduces the old latency penalty, according to the Subagents backstory.

Jediah Katz

@jediahkatz

·Follow

Cursor

@cursor_ai

Watch on X

11:23 PM · Jan 22, 2026

204

Read 13 replies

Cursor publishes 2.4 changelog with subagents + image generation details

Cursor changelog (2.4): Cursor published a dedicated changelog entry documenting subagents and image generation, including the claim that subagents run in parallel with their own context and can be customized, as linked in the Changelog link.

The canonical reference is the release notes in Subagents and image generation.

Cursor

@cursor_ai

·Follow

Replying to @cursor_ai

Learn about everything new in 2.4: cursor.com/changelog/2-4

8:23 PM · Jan 22, 2026

319

Read 10 replies

🧠 Claude Code & Cowork: task graphs, desktop Plan Mode, and stability fixes

Continues the Claude Code/Cowork tooling churn with concrete workflow changes: task/dependency primitives and desktop UX updates. Excludes Cursor 2.4 (feature).

Claude Code CLI 2.1.16 adds task management with dependency tracking

Claude Code CLI 2.1.16 (Anthropic): The CLI ships a new task management system with dependency tracking, as listed in the Changelog summary and repeated in the Changelog excerpt; community demos frame this as enabling parallel sub-agent execution where tasks can unblock each other instead of being manually shepherded.

• Task graph semantics: The headline change is explicit dependency tracking rather than a flat to-do list, as called out in the Changelog summary.

This lands as the first “task DAG” primitive inside Claude Code itself, not just in third-party orchestration wrappers.

Claude Code Changelog

@ClaudeCodeLog

·Follow

Claude Code 2.1.16 is out. 7 CLI, 2 flag, and 3 prompt changes. Details in thread ↓

8:17 PM · Jan 22, 2026

329

Read 6 replies

Claude Code Desktop adds Plan mode so Claude outlines before editing

Claude Code Desktop (Anthropic): Plan mode is now available in the desktop app, letting Claude “map out its approach before making any changes,” as described in the Desktop update post; this is a direct workflow change for longer edits where you want an explicit step-by-step plan before the first diff lands.

The post positions Plan mode as a guardrail against premature edits, especially when the agent’s first instinct would otherwise be to start patching without a clear path through the repo (and it pairs naturally with task decomposition features that showed up elsewhere today).

Boris Cherny

@bcherny

·Follow

Just shipped two cool updates for Claude Code in the Desktop app. 1. Plan mode is now available on Desktop. Have Claude map out its approach before making any changes

Watch on X

8:56 PM · Jan 22, 2026

4.1K

Read 206 replies

Claude Code 2.1.16 expands plan-to-execution controls for teammate spawning

Claude Code 2.1.16 (Anthropic): Prompt/schema changes add explicit controls for multi-agent execution: ExitPlanMode output now includes launchSwarm and teammateCount, and the Task tool can set spawned agent name, team_name, and mode (including permission/approval behavior), as detailed by the Schema diff summary and the Task tool controls.

The same change set also hardens a small but common git failure mode by instructing Claude not to run git rebase --no-edit, per the Git rebase tweak.

Claude Code Changelog

@ClaudeCodeLog

·Follow

Replying to @ClaudeCodeLog

2/3: Claude’s ExitPlanMode output schema now includes `launchSwarm` and `teammateCount`, enabling Claude to request spawning a multi-agent swarm to implement an approved plan instead of only optionally pushing the plan to a remote session. Diff: github.com/marckrenn/clau…

8:17 PM · Jan 22, 2026

Read 1 reply

Claude Code 2.1.16 improves VS Code plugin and session workflows

Claude Code 2.1.16 (Anthropic): The 2.1.16 changelog includes VS Code native plugin management plus OAuth users being able to browse/resume remote Claude sessions from the Sessions dialog, as captured in the Changelog summary and the Changelog excerpt.

This is a workflow-level shift for teams that rely on remote runs (or long-lived agent sessions) but want to reattach from the IDE without manually tracking session identifiers.

Claude Code Changelog

@ClaudeCodeLog

·Follow

Claude Code 2.1.16 is out. 7 CLI, 2 flag, and 3 prompt changes. Details in thread ↓

8:17 PM · Jan 22, 2026

329

Read 6 replies

Claude Code Desktop adds approval notifications for background runs

Claude Code Desktop (Anthropic): Desktop notifications now fire when Claude needs approval, so you can let the agent run in the background and only context-switch when a permission gate is hit, as shown in the Notifications clip that follows the broader desktop update thread.

This is another small but real “operator loop” improvement: it reduces idle watching during long tool runs and makes approval-driven workflows (shell/git/permissions) more tolerable in day-to-day use.

Boris Cherny

@bcherny

·Follow

Replying to @bcherny

2. Notifications. Claude Code Desktop now pings you whenever Claude needs approval, you can keep working while Claude runs in the background

Watch on X

8:56 PM · Jan 22, 2026

993

Read 58 replies

Claude Code reliability complaints persist: CPU spikes, MCP drops, odd read behavior

Claude Code (field reports): Users are still reporting high CPU usage and UI friction in recent Claude Code builds, including claims that the tool list can disappear during MCP connection failures and that some setups see persistent performance issues even when rolling back versions, per the CPU regression report.

A separate report flags Claude Code appending unexpected content to “every file read,” shown in the File read glitch report.

These posts are anecdotal (no single repro recipe in the tweets), but they line up with the broader “stability tax” that shows up once agents become long-running and tool-heavy.

eric provencher

@pvncher

·Follow

Replying to @trq212

FYI I reproed the same issue in 2.1.15, and it still is consuming tons of CPU use. Im about to rollback to 2.1.9 because I cant live like this

6:51 PM · Jan 22, 2026

Read 1 reply

Cowork upgrades Todos into Tasks for longer projects

Cowork (Anthropic): Anthropic says it upgraded “Todos ⇒ Tasks” to help Claude complete longer projects, per the Tasks upgrade note; the key claim is improved structure for multi-step work rather than a new model.

What’s not specified in the tweet is the exact surface area (UI vs API) and whether this is purely UX or includes new semantics (dependencies, ownership, status), but it’s being positioned as the next iteration of long-horizon task management in Cowork.

Boris Cherny

@bcherny

·Follow

We've upgraded Todos => Tasks to help Claude complete longer projects Let us know if you have any feedback!

Thariq

@trq212

x.com/i/article/2014…

11:47 PM · Jan 22, 2026

3.8K

Read 212 replies

Claude Code CLI 2.1.17 fixes non-AVX CPU crashes

Claude Code CLI 2.1.17 (Anthropic): 2.1.17 ships a single fix: resolving crashes on processors without AVX instruction support, as stated in the 2.1.17 changelog note and again in the Changelog excerpt, which links to the underlying changelog section via the Changelog section.

This is a narrow compatibility patch, but it’s the kind that matters for older hardware and some constrained enterprise environments.

Claude Code Changelog

@ClaudeCodeLog

·Follow

Claude Code 2.1.17 is out. 1 CLI change, no flag changes, no major prompt changes. Details in thread ↓

9:50 PM · Jan 22, 2026

101

Read 5 replies

Cowork demo turns a receipts folder into a categorized monthly spreadsheet

Cowork (Anthropic): A concrete workflow example shows Cowork taking a folder of receipts and producing a categorized spreadsheet with monthly breakdowns—“pointed it at a folder. That’s it,” as described in the Receipts automation demo.

For builders, this is a clean reference case for “messy document pile → structured artifact” without writing a bespoke ingestion pipeline, and it also hints at what Cowork’s file-handling and extraction loop is able to do reliably in practice.

Boris Cherny

@bcherny

·Follow

Cowork just organized 6 months of receipts into a categorized spreadsheet with monthly breakdowns. Pointed it at a folder. That’s it.

Watch on X

8:51 PM · Jan 22, 2026

6.2K

Read 235 replies

Community push: read Claude Code best practices directly, not summaries

Claude Code documentation (Anthropic): Multiple posts are nudging users to read the official best practices directly instead of relying on secondhand summaries, as argued in the Doc-first nudge and backed by a direct pointer to the Best practices doc in the Best practices link.

This is less about new features and more about process: treating the official guide as the canonical contract for how Anthropic expects users to run Plan→Act flows, manage context, and avoid common failure modes.

Lisan al Gaib

@scaling01

·Follow

the anthropic challenge goes a bit over my head, because I'm not a 5head and have never done anything like it before but there is a nice doc that explains it, it helped: trirpi.github.io/posts/anthropi… nonetheless, it still took me like 1-2 hours just to understand the whole setup, all Show more

Lisan al Gaib

@scaling01

im not gonna make it it's over for lisan

11:48 PM · Jan 22, 2026

234

Read 12 replies

🧰 OpenAI Codex surface area expands: JetBrains IDEs + subscription-based tool access

Codex is spreading into developer-native surfaces (IDEs and extensions) and tightening the eval loop for agent skills. Excludes Cursor 2.4 (feature).

Codex lands inside JetBrains IDEs for ChatGPT-plan users

Codex (OpenAI): Codex now runs inside JetBrains IDEs (IntelliJ, PyCharm, WebStorm, Rider), so you can plan/write/test/review without leaving the editor, as shown in the JetBrains IDE demo.

• Setup flow: the in-editor path is “update IDE → open AI Chat → pick Codex → sign in with ChatGPT or API key,” as outlined in the Setup steps and documented in the Codex IDE docs.
• Model + positioning: OpenAI frames this as “powered by GPT-5.2 Codex,” suggesting JetBrains becomes a first-class surface for the Codex agent loop rather than a chat-sidecar, per the JetBrains IDE demo.

OpenAI Developers

@OpenAIDevs

·Follow

Codex now runs inside @jetbrains IDEs Plan, write, test, review, and ship code without leaving your code editor Starting today, you can use Codex with your ChatGPT plan in IntelliJ, PyCharm, WebStorm, and Rider, powered by GPT-5.2 Codex

Watch on X

6:15 PM · Jan 22, 2026

620

Read 53 replies

Cline (Cline) + Codex (OpenAI): Cline now supports signing in with OpenAI so you can run via your existing ChatGPT/Codex subscription—pitched as “flat-rate pricing instead of per-token costs,” per the Launch post.

The setup is “provider = OpenAI Codex → Sign in with OpenAI,” as shown in the Step-by-step settings. This changes the procurement path for teams that want Codex-class models inside a local agent harness but don’t want to manage API keys.

Cline

@cline

·Follow

Bring your ChatGPT subscription to Cline for inference. We partnered with @OpenAI to let you use your existing subscription. Sign in and access all the models in your subscription. No API keys, flat-rate pricing instead of per-token costs. Here is how to enable this:

8:00 PM · Jan 22, 2026

462

Read 25 replies

OpenAI describes how to evaluate agent skills systematically with Evals

Skills evaluation (OpenAI): OpenAI published a practical guide on turning agent “skills” into testable artifacts and iterating with Evals, as introduced in the Evals for skills post and laid out in the OpenAI dev blog. The core claim is operational: skills aren’t just prompt snippets; they should have measurable success criteria and a scoring loop so changes don’t silently degrade behavior over time.

dominik kundel

@dkundel

·Follow

If you’ve ever tweaked a Codex skill and thought “this feels better”… this post is for you 😅 @gabrielchua and I wrote up how you can test and evaluate skills as you iterate: developers.openai.com/blog/eval-skil…

9:50 PM · Jan 22, 2026

151

Read 7 replies

Cline ships Jupyter-native commands for notebook cell generation and refactors

Cline (Cline): Cline added three notebook-oriented commands to generate, explain, and optimize Jupyter cells “without breaking your structure,” as announced in the Jupyter commands post and detailed in the Jupyter commands blog. This is a concrete shift from file/terminal-centric agent flows to cell-scoped work units (important for data teams that live in notebooks).

Cline

@cline

·Follow

Data scientists: Cline now speaks Jupyter. Three new commands to generate, explain, and optimize notebook cells without breaking your structure. cline.bot/blog/cline-now…

9:50 PM · Jan 22, 2026

Read 2 replies

GPT-5.2 Instant default personality updated to be more conversational

GPT-5.2 Instant (OpenAI): OpenAI is updating GPT-5.2 Instant’s default personality to be “more conversational” and better at contextual tone adaptation, per the Personality note and the Release notes entry. For teams shipping agentic UX on top of Instant, this is an upstream behavior change that can affect support-chat style, voice/assistant feel, and evaluation baselines (tone-related regressions/improvements).

Tibor Blaho

@btibor91

·Follow

OpenAI is updating GPT-5.2 Instant's default personality to be more conversational and better at adapting its tone contextually

9:57 PM · Jan 22, 2026

444

Read 51 replies

Codex team asks what to ship next before month-end

Codex roadmap (OpenAI): A Codex team member asked what users want shipped before month-end—“still time to redirect” the team—signaling near-term product surface expansion is still in flux, per the Feature request prompt. For engineers, this is a rare public knob on sequencing (IDE features, agent controls, review loops, or workflow primitives) rather than a finished release.

Tibo

@thsottiaux

·Follow

What do you hope to see Codex ship before end of month? Team is cooking with the power of a thousand agents and there is still time to redirect them.

3:59 AM · Jan 23, 2026

916

Read 562 replies

GPT-5.2 gets shared as a language-learning tool (early applied usage)

Applied use (GPT-5.2): A practitioner shared “GPT-5.2 for language learning,” per the Use-case link. There aren’t implementation details in the tweet, but it’s a clean example of how newer “Codex-era” model availability is spilling beyond coding into structured tutoring workflows (often the first place product teams notice tone, memory, and correction style issues).

Greg Brockman

@gdb

·Follow

GPT-5.2 for language learning: openai.com/index/praktika/

4:24 PM · Jan 22, 2026

956

Read 60 replies

🧱 AI app builders & design-to-code: v0, Lovable, and Figma→prototype flows

Tooling focused on going from idea/design to working product (often with agents) shows up heavily today. Excludes Cursor 2.4 (feature) and keeps this category on non-Cursor builders.

MagicPath launches Figma Connect for copy-paste Figma→interactive prototypes

Figma Connect (MagicPath): MagicPath launched Figma Connect, a copy/paste bridge where you copy a Figma design and paste into MagicPath to generate an interactive prototype while preserving pixels, layout, and assets, as described in the launch demo and reiterated in the now live post.

• Workflow change: It’s positioned as “no plugins” and “no MCP” overhead—designers stay in Figma, then move the artifact into a canvas/prototype environment via clipboard, per the launch demo.
• Fidelity promise: The product framing emphasizes “every pixel” and “every asset” being preserved, as stated in the launch demo, which is the part that tends to break in design→code toolchains.

What’s not shown in these tweets is the exact export target surface (frameworks, components, constraints), so the practical impact will hinge on how the generated prototype/code behaves under real design-system and responsive requirements.

Pietro Schirano

@skirano

·Follow

Introducing Figma Connect. The best way to turn your Figma designs into code. No MCP hell. No plugins. Just copy and paste your designs into MagicPath and turn them into interactive prototypes without compromising your craft. Every pixel. Every detail. Every asset. Preserved.

Watch on X

6:22 PM · Jan 22, 2026

3.6K

Read 179 replies

Lovable walkthrough shows a full competitor-analysis app built in ~25 minutes

Lovable (Lovable): A long-form walkthrough shows building a competitor analysis tool end-to-end—PRD, auth, database, hosting, and payments—in roughly 25 minutes, with a step-by-step timeline in the walkthrough timestamps.

• Stack composition: The flow explicitly includes Supabase for database/auth and Stripe for payments, per the walkthrough timestamps, which makes it more representative of real MVP plumbing than “single-page demos.”
• Operator pattern: The sequence starts by generating a PRD (including using ChatGPT), then feeding it into the builder, per the walkthrough timestamps, which is the emerging pattern for keeping scope bounded when the UI scaffold is cheap.

The demo is strong as a process artifact; it doesn’t include reliability metrics (deploy failures, iteration loops, test strategy), so treat it as speed proof rather than a quality bar.

Lovable

@Lovable

·Follow

This creator just built a competitor analysis tool in 25 minutes. Database, auth, custom design and hosting - all in Lovable. Here's exactly how he did it: Timestamps 0:00 - Introduction: Building AI apps with no coding experience 0:40 - Introduction to Lovable 1:06 - Creating Show more

Watch on X

1:37 PM · Jan 22, 2026

340

Read 25 replies

v0 UI hints point to Build mode, voice dictation, and PR management

v0 (Vercel): A UI screenshot shows v0 exposing a Build mode toggle (“Optimizes for building apps and coding”) and hints at voice dictation (mic icon) plus deeper Git/PR flows, as shown in the build mode screenshot.

• Mode split: The interface explicitly separates “Build” from “Ask” (text-only), which implies different agent policies, tool access, or execution paths, per the build mode screenshot.
• Workflow convergence: The left nav items (Chat/Design/Git/Connect/Vars/Rules) visible in the build mode screenshot suggest v0 is treating app-building as a single surface that spans code generation, environment config, and repo operations.

This is a UI breadcrumb rather than a spec; the tweets don’t confirm rollout timing or which tiers get these modes first.

TestingCatalog News 🗞

@testingcatalog

·Follow

Vercel is working on a new Build mode, voice dictation and PR management. The next level 👀

TestingCatalog News 🗞

@testingcatalog

Vercel opened a new waitlist for the upcoming v0 launch. "v0 is coming to take your job...to the next level. Sign up for early access."

9:41 PM · Jan 22, 2026

Read 3 replies

Vercel reopens the v0 waitlist ahead of its next launch

v0 (Vercel): Vercel opened a new waitlist for an upcoming v0 launch, pitching it as “coming to take your job…to the next level,” per the waitlist post with the signup link in the waitlist page.

• Go-to-market signal: The waitlist reopening suggests a gated rollout cadence rather than an in-place incremental update, aligning with the “important announcement” framing in the v0 announcement.

There aren’t technical details (APIs, supported stacks, export formats) in these tweets, so the actionable detail for teams is simply: access remains staged, and the public funnel is open again.

TestingCatalog News 🗞

@testingcatalog

·Follow

Vercel opened a new waitlist for the upcoming v0 launch. "v0 is coming to take your job...to the next level. Sign up for early access."

@v0

Important announcement from the v0 team

Watch on X

7:33 PM · Jan 22, 2026

Read 3 replies

“Design to code is solved” gets thrown around again, now tied to Figma Connect

Design→code positioning (MagicPath): The Figma Connect rollout is being explicitly framed as “design to code, it’s now solved,” as stated in the design-to-code claim and echoed in the craft-and-speed framing.

• What’s concrete vs implied: The concrete piece is an interaction-preserving prototype flow (copy from Figma; paste into MagicPath) shown in the copy and paste steps; the “solved” claim is a broader assertion that typically implies production-quality export under design-system constraints.
• Why this matters to builders: This kind of positioning tends to reset stakeholder expectations (design, PM, eng) about how much of UI implementation can be treated as a translation step versus an engineering step, which is why the exact boundaries of “prototype” vs “production-ready” output matter.

The tweets don’t provide a spec or compatibility matrix, so treat the “solved” framing as rhetoric until there’s clearer evidence on what code artifacts are emitted and how they map to real component libraries.

Pietro Schirano

@skirano

·Follow

Replying to @skirano

Figma Connect is now live. Try it and turn your designs into real, interactive prototypes in seconds. Design to code, it's now solved. Learn more: magicpath.ai/documentation/…

6:22 PM · Jan 22, 2026

106

Read 4 replies

Atoms pitches “idea → business loop” as the new builder workflow

Atoms (Atoms): Atoms is being pitched as a single-loop workflow where a half-formed idea becomes a coherent product plan plus implementation path (“structure, copy, flows, backend, revenue plan”) in one sitting, as described in the idea-to-business pitch.

• What’s notable: The framing is not “faster coding,” but reduced handoffs between research, planning, and building—“research → build → ship” in one place, per the loop description.

There’s no concrete technical release detail in the tweets (APIs, export formats, deployment targets), so this reads more as a workflow direction signal than a product spec.

AI Breakfast

@AiBreakfast

·Follow

This is what building feels like now. You sit down with a blank input. You type a half-formed idea. A few minutes later you’re staring at a real product: structure, copy, flows, backend, revenue plan, even a path to customers, all connected. No repo. No onboarding docs. No Show more

Watch on X

3:05 PM · Jan 22, 2026

Read 21 replies

Sekai launches an X bot that generates runnable mini-apps from tagged posts

Sekai (Sekai): Sekai launched an X bot where you tag @sekaiapp with an app idea and it generates a working mini-app that runs in the browser, positioning “software as a social content format,” according to the launch description.

• Distribution mechanic: The product claim is “build → share as a post,” skipping app-store style steps (“submit/wait/download”), per the launch description.

The tweets don’t include technical constraints (runtime, storage, auth, rate limits), so the key fact today is the distribution surface: app generation is being bound directly to a social posting workflow.

AshutoshShrivastava

@ai_for_success

·Follow

Sekai just launched a b0t on X that builds mini-apps instantly. You tag @sekaiapp with an idea in any post, and it generates a working app in seconds. Plays directly in browser – no download, no waitlist, nothing to install. The concept: software as a new social content format. Show more

Kevin Hartz

@kevinhartz

Taking startup pitches in a fun new way. Reply to this thread and tag @sekaiapp with your wildest idea. Sekai builds a real playable version instantly - right in the replies. Builders, let's see what emerges. Examples below👇

2:24 AM · Jan 23, 2026

Read 2 replies

✅ PR comprehension & verification: Devin Review, browser-based QA, and LLM-judge discipline

PR review is the bottleneck theme today: tools aim to reduce human diff-reading and add verification. Excludes Cursor 2.4 (feature).

Devin Review becomes a URL-swappable surface for AI-era PR comprehension

Devin Review (Cognition): Devin Review continues to spread as a “separate surface” for code review—open any GitHub PR by swapping the host and get an AI-organized review UI, positioned at the “nobody reads PRs anymore” bottleneck, as shown in the Demo clip.

• Access model: It’s pitched as working for both public and private repos and not requiring an account, per the URL swap tip and the Demo clip; the product docs are linked in the Docs page.
• Ecosystem implication: Builders are explicitly calling out how much this kind of URL-level review layer highlights “how vulnerable GitHub is,” according to the User reaction.

Cognition

@cognition

·Follow

Getting too many vibecoded PRs? Just swap "github" with "devinreview" in any GitHub PR link to activate Devin Review. Free for public and private PRs, no account required!

Watch on X

10:10 PM · Jan 22, 2026

167

Read 4 replies

MorphLLM launches Glance and BrowserBot to verify PRs by running the UI

Glance + BrowserBot (MorphLLM): MorphLLM introduced Glance, a browser agent trained with RL to test code changes, plus BrowserBot that posts a video of the agent exercising preview URLs directly inside GitHub PRs, as shown in the Launch demo.

• What’s new in the PR loop: The pitch is to replace “scrolling for 10 seconds” with a concrete artifact (a UI test video) embedded into the review flow, as described in the PR video framing.
• Grounding mechanism: Glance maps code diffs to UI targets by walking React’s Fiber tree to connect changed files → DOM elements → bounding boxes, per the Fiber mapping detail.
• Training signal: Rewarding coverage changes when a changed component enters the viewport, double reward for interacting, and reward for novel state discovery are called out in the Reward details, with more specifics in the Training writeup.

Morph

@morphllm

·Follow

Introducing Glance, a fast browser agent trained with RL to test code changes. We're releasing this with BrowserBot, which puts a video of it testing your app right in your GitHub PRs Frontier models are trained to be helpful. Our model is trained to be bad at using software.

Watch on X

9:46 PM · Jan 22, 2026

113

Read 14 replies

LLM-as-judge still needs human-label validation to be trustworthy

LLM judge validation (Evaluation practice): A reminder circulated that “verifying an LLM judge” is still classic ML evaluation—testing against human labels—and that shipping unverified LLM judges is risky, as stated in the LLM judge warning.

RepoPrompt 1.6.1 ships deeper review ergonomics for agent PRs

RepoPrompt 1.6.1 (RepoPrompt): RepoPrompt shipped an update aimed at making “deep review” workflows more practical across real repo layouts—adding JJ support for deep reviews, multi-root reviews for multi-repo workspaces, and support for reviews from a worktree, according to the Release notes.

• Token economics: The release also claims an “80% more token efficient” file_search tool, per the Release notes.
• Field signal: A maintainer notes they were able to abstract a git engine, add JJ support, and “battle-test it” in ~2 hours, attributing that speed to RepoPrompt’s review catching “paper cuts,” as described in the Developer feedback.

eric provencher

@pvncher

·Follow

Just released @RepoPrompt 1.6.1 - JJ support for new Deep reviews - Multi root reviews (workspaces with multiple repos!) - Support for reviews from a worktree - 80% more token efficient file_search tool!

10:10 PM · Jan 22, 2026

Read 5 replies

“Bash is all you need” gets reframed as an eval-design question

Agent eval design (Braintrust): Braintrust published an argument that “tool choice matters, but evals matter more” when comparing bash-only agents to richer harnesses, as described in the Head-to-head evals post.

• What this is really about: The emphasis is on evaluation methodology as the determinant of conclusions (not the specific tool surface), per the Head-to-head evals post and its linked Writeup.

Braintrust

@braintrust

·Follow

Is bash really all your agent needs? We worked with @vercel to run head-to-head evals. Result: tool choice matters, but evals matters more ↓ braintrustdata.link/bash-agent-eva…

8:10 PM · Jan 22, 2026

Ghostty tightens contribution rules for AI-assisted PRs

Ghostty (Policy change): Ghostty is updating its AI contribution policy so AI-assisted PRs are only allowed for accepted issues, with “drive-by” AI PRs to be closed, according to the Policy mention.

Mitchell Hashimoto

@mitchellh

·Follow

Ghostty is getting an updated AI policy. AI assisted PRs are now only allowed for accepted issues. Drive-by AI PRs will be closed without question. Bad AI drivers will be banned from all future contributions. If you're going to use AI, you better be good. github.com/ghostty-org/gh…

8:21 PM · Jan 22, 2026

2.1K

Read 53 replies

PR template checkboxes don’t reliably signal AI-generated code

Maintainer workflow (PR hygiene): A maintainer warns that adding a checkbox for “AI generated code” to PR templates does not work in practice—contributors often do not check it even when projects explicitly accept AI-assisted PRs, per the Maintainer note.

Will McGugan

@willmcgugan

·Follow

If you add a checkbox for AI generated code on a PR templates, folk will not check it. Even if you explicitly say you will accept AI PRs.

5:11 PM · Jan 22, 2026

Read 1 reply

🧭 Workflow patterns that actually ship: tracer bullets, context discipline, and feedback loops

High-signal practitioner techniques and mental models for getting reliable work out of agents (beyond any single tool). Excludes Cursor 2.4 (feature).

Sandbox-first agent doctrine: persistent state, low-level interfaces, benchmarks early

Workflow doctrine: A compact “sandbox everything” checklist is circulating as a practical spec for running agents reliably: sandboxed execution, no external DB access, garbage-y environments, run agents independent of user sessions, persist state explicitly, and define outcomes rather than procedures, as listed in the Sandbox doctrine list.

It also calls out “give agents direct, low‑level interfaces” and “avoid MCPs and overbuilt agent frameworks” alongside “introduce benchmarks early” and “plan for cost,” positioning harness design as the real control plane for long-running automation per the Sandbox doctrine list.

Chris Tate

@ctatedev

·Follow

→ Sandbox Everything → No Access to External Databases → Environment Garbage Is Real → Run Agents Independently of User Sessions → Define Outcomes, Not Procedures → Give Agents Direct, Low-Level Interfaces → Avoid MCPs and Overbuilt Agent Frameworks → Persist State Show more

Chris Tate

@ctatedev

x.com/i/article/2013…

2:55 PM · Jan 22, 2026

560

Read 34 replies

Tracer bullet prompting: force the smallest end-to-end slice to reduce agent slop

Workflow pattern: The “tracer bullet” prompt pattern is showing up as a concrete way to keep long agent runs from expanding into a messy rewrite—by explicitly forcing the agent to implement the smallest end‑to‑end slice that crosses layers, then iterating from there, as shown in the Tracer bullet example.

The key detail is that the agent is instructed to start with one demonstrable vertical slice (e.g., a backend endpoint wired into one UI location) before touching the rest of the surface area—see the stepwise breakdown in the Tracer bullet example.

Matt Pocock

@mattpocockuk

·Follow

"Tracer bullet" is a magic word to make the AI only do the smallest possible task which crosses all layers. Below is a recent Ralph readout where it explicitly only does the smallest demonstrable part of the feature. Here's a prompt to try: aihero.dev/s/FIgdb8

1:26 PM · Jan 22, 2026

364

Read 17 replies

Agent speed compression: MVP in hours, production hardening still dominates

Shipping reality: A simple but common framing is landing: agent workflows can compress “time to MVP” to hours, while making something truly production-ready still takes days—captured bluntly as “4 hours” vs “4 days” in the MVP vs production timing.

The implied delta is that reliability work (testing, edge cases, deployment hygiene, maintenance) remains the time sink even as initial implementation gets faster, per the MVP vs production timing.

Ian Nuttall

@iannuttall

·Follow

time to vibe code an mvp app: 4 hours time to make it ACTUALLY production ready: 4 days still way faster than ever before, but this shit is still hard!

5:01 PM · Jan 22, 2026

Read 25 replies

Bottleneck shift: AI makes code cheap, customer adoption becomes the limiter

Product feedback loop: A clear “what changes now” framing is spreading: if AI makes producing code close to free, the rate-limiter becomes how quickly customers can adopt what you ship and generate the next round of business learnings, as argued in the Adoption bottleneck note.

This is an explicit pushback on measuring agent impact via typing/output volume; the claim is that the binding constraint is still the real-world feedback loop (“once a customer has implemented the first thing”), per the Adoption bottleneck note.

Aaron Levie

@levie

·Follow

Here’s what often gets missed when people think about AI coding. The code may now be free, but the learnings of what to build (and what to build *next*) and how to build it are bottlenecked by the same set of factors as before. Martin hits on this perfectly: “Changes are the Show more

martin_casado

@martin_casado

I work with multiple companies where nearly all code is AI generated now. However, the productivity probably has only increased 20-30%. Why? I suspect because writing code is really running code. Changes are the result of a business learnings. Or an operational learnings. For

4:49 PM · Jan 22, 2026

463

Read 52 replies

Default-model inertia: most users never switch models, “two clicks” changes outcomes

Usage reality check: Watching real users, even experienced ones, suggests “essentially zero percent” change the default model selection; the claim is that a trivial UI change (“clicking twice”) can materially increase perceived value because most people never explore the model picker, per the Default model behavior and reiterated in the Follow-up link.

This matters operationally because product-level defaults (not just model quality) determine what the median user actually experiences, as implied by the Default model behavior.

Ethan Mollick

@emollick

·Follow

Watching enough AI users, even experienced ones, use chatbots leads to the revelation that essentially zero percent of people change the default model and you can significantly increase the value of AI to them by clicking twice.

8:39 PM · Jan 22, 2026

312

Read 31 replies

“Accumulating AI skillset”: users learn model limits and failure modes over time

Human-in-the-loop skill: One repeated observation is that “AI skill” compounds: people get better results as they internalize what models can do, how to work with them, and how they fail—an intuition that changes more gradually (and more predictably) than many expect, per the Accumulating AI skillset.

This frames “prompting” less as a single trick and more as lived calibration—knowing when to constrain scope, when to verify, and when to switch approaches, as stated in the Accumulating AI skillset.

Ethan Mollick

@emollick

·Follow

There is definitely an accumulating AI skillset that comes with experience using it. You learn what models can do, how to work with them and when & how they will make mistakes. That knowledge changes more gradually and, with enough experience, predictably, than you might expect.

Simon Willison

@simonw

"Catching up takes a day, not month" I don't think that's true. I see so many people throwing their hands up saying "I don't get why you have good results from this stuff while I find it impossible to get decent code that works" The difference is I've spent 3+ years with it!

5:34 PM · Jan 22, 2026

480

Read 51 replies

Developer efficiency isn’t typing speed: measurement shift in the agent era

Measurement shift: An Atlassian CEO clip is being shared with a direct claim that “how quickly you write code” is a poor metric for developer efficiency, explicitly aligning with an agent era where code production is decoupled from individual typing speed, per the Atlassian CEO clip.

The point is the measurement target is moving up-stack (impact, outcomes, delivery), and the clip is being used as an anchor for that shift in the Atlassian CEO clip.

Weights & Biases

@wandb

·Follow

“How quickly you write code is a poor way to measure developer efficiency.” So what’s the better way? @Atlassian CEO @mcannonbrookes explains the metric you should be tracking instead.

Watch on X

2:58 PM · Jan 22, 2026

Read 1 reply

Preview agent-made web changes live via GitHub Pages while the agent is still working

Workflow pattern: A practical “stay unblocked while the agent runs” technique is to have a web branch auto-published so you can review UI changes from a phone while the agent continues iterating; Simon Willison describes doing this on iPhone using GitHub Pages, per the iPhone preview tip with setup details in the TIL post.

The pattern is specifically about tightening the visual feedback loop without waiting for the agent session to end, as described in the iPhone preview tip.

Simon Willison

@simonw

·Follow

TIL on how I use Claude Code on my iPhone with GitHub Pages to preview Claude's changes while it's still working on them til.simonwillison.net/claude-code/pr…

5:50 PM · Jan 22, 2026

1.5K

Read 32 replies

🔗 MCP & web-agent interoperability: embedded apps, browser agents, and tool plumbing

Interoperability and “agent can use the web/software” primitives showing up as MCP-style integrations or adjacent web-agent tooling. Excludes Cursor 2.4 (feature).

CopilotKit ships MCP Apps ↔ AG-UI bridge for returning mini-apps in chat

CopilotKit (CopilotKit): CopilotKit added first-client support for the MCP Apps extension via AG-UI middleware, so agents can return interactive “mini-apps” to users (via iframes) with bidirectional communication between the app and the MCP server, as described in the integration thread.

• Interoperability angle: The pitch is “frontend tools” that work across agent backends (framework-agnostic) and let application developers embed MCP-returned UIs into their own agentic products, as shown in the integration thread.
• Pointers: CopilotKit includes a hands-on walkthrough in the MCP Apps tutorial and a runnable example in the Interactive demo.

CopilotKit🪁

@CopilotKit

·Follow

MCP Apps 🤝 AG-UI Now you can bring MCP Apps into your OWN agentic applications! MCP Apps lets agents return mini-apps to users via MCP servers (and iFrames) inside the chat. CopilotKit (via AG-UI) is the first client to allow developers to easily bring this capability into Show more

Watch on X

4:08 PM · Jan 22, 2026

Read 8 replies

Browser Use expands access (500 users) as it positions its web-agent CLI

Browser Use (browser_use): Browser Use approved 500 new users from its waitlist, per the waitlist update, alongside continued positioning as a primary “browser use CLI” for automation workflows in the CLI endorsement and “close the local development loop” framing in the workflow line.

• Why it matters for tool plumbing: The steady push is toward a reusable, CLI-shaped primitive for “agent uses a browser” tasks, with distribution happening via staged access (waitlist approvals) in the waitlist update.

Browser Use

@browser_use

·Follow

We’ve approved 500 new users from the BU waitlist! Did you get in? Comment "BU" for access.

Watch on X

12:33 AM · Jan 23, 2026

141

Read 135 replies

OSS Coding Agent template adds Browser Mode powered by agent-browser

Browser Mode (ctatedev): The open-source Coding Agent template shipped a “Browser Mode” that’s explicitly powered by agent-browser, positioning it as a drop-in way to add web navigation and testing to a coding-agent scaffold, per the Browser Mode demo.

• What’s concrete: The feature is already live in the template and demoed end-to-end, with the template entry point linked in the Template site.

Chris Tate

@ctatedev

·Follow

Browser Mode 🟢 Powered by agent-browser Now live in the OSS Coding Agent template coding-agent-platform.vercel.sh

Watch on X

1:30 AM · Jan 23, 2026

292

Read 11 replies

Hyperbrowser open-sources HyperAgent to augment Playwright with AI

HyperAgent (Hyperbrowser): Hyperbrowser introduced HyperAgent, an open-source web-agent designed to “supercharge Playwright with AI,” according to the HyperAgent mention.

Details like task format, action model, and evaluation loop aren’t in the tweet text, so treat this as a launch signal pending docs and examples beyond the HyperAgent mention.

Hyperbrowser

@hyperbrowser

·Follow

Meet HyperAgent. Our open-source web-agent that supercharges Playwright with AI. Here's how it works ↓

5:00 PM · Jan 14, 2026

405

Read 10 replies

OpenRouter docs add one-click “copy as Markdown” and “open in Claude/ChatGPT/Cursor”

Docs-to-agent handoff (OpenRouter): OpenRouter is making docs more “AI-friendly” by adding UI actions to copy a page as Markdown for LLMs, open in Claude, open in ChatGPT, and connect to Cursor, as shown in the docs actions menu.

This is a small but direct interoperability move: it treats documentation pages as structured context artifacts that can be transferred into an agent session with minimal friction, per the docs actions menu.

OpenRouter

@OpenRouterAI

·Follow

Making our docs even more AI-friendly

4:54 PM · Jan 22, 2026

Read 2 replies

🔌 Skills & installables: Railway deploy, agent-browse, and “skills as artifacts you can eval”

Installable extensions/skills that change what coding agents can do, plus emerging best practices for testing those skills. Excludes Cursor 2.4 (feature).

OpenAI publishes a skills→evals playbook for systematic iteration

Skill evaluation (OpenAI): OpenAI published guidance on turning “agent skills” into artifacts you can test, score, and improve over time, positioning Evals as the backbone for iteration rather than relying on gut feel; the post is pointed to in the [announcement]Evals blog post and echoed in a [share link]Share link.

In practice, this frames skills as an interface contract: if you can’t measure a skill’s behavior across tasks, you can’t safely refactor prompts/tools without regressions, as laid out in the post linked via the [OpenAI blog post]OpenAI blog post.

dominik kundel

@dkundel

·Follow

9:50 PM · Jan 22, 2026

151

Read 7 replies

Browserbase agent-browse skill lets Claude Code browse and test web apps

agent-browse skill (Browserbase): Browserbase published a Claude Code skill that wires a browser CLI into an agent loop—positioned as letting Claude “generate and test your code itself” via web navigation, installed with npx skills add browserbase/agent-browse in the [install command]Install command. Details and the code are linked in the repo referenced by the [GitHub repo link]GitHub repo.

What it enables: The pitch is closing “local dev → preview URL → browser verification” loops without switching tools, as described in the [enable loop note]Enable loop note.

Kyle Jeong

@kylejeong

·Follow

Claude Code can generate and test your code itself using our browser CLI and skill Add it easily with skills CLI. npx skills add browserbase/agent-browse

Vercel

@vercel

Skills.sh is an open ecosystem for finding and sharing agent skills. Add a skill to any agent with: ▲ ~/ npx skills add <owner/repo>

12:45 AM · Jan 23, 2026

215

Read 8 replies

Railway skill for Claude Code adds deploy, logs, env vars, health checks

Railway skill for Claude Code (mshumer): A new installable Claude Code skill wraps Railway project operations—deploys with verification, log inspection, env var management (with redaction), and DB shell access—installed via npx add-skill mshumer/claude-skill-railway as shown in the [install snippet]Install snippet and the [feature list screenshot]Feature list screenshot.

• Operational surface area: The skill exposes status/health checks and natural-language log filtering (“errors”, “last hour”), which shifts Railway from “manual deploy UI” to “agent-callable” tooling per the [feature list screenshot]Feature list screenshot.

Matt Shumer

@mattshumer_

·Follow

What are the most useful Claude Skills you have installed?

5:22 PM · Jan 22, 2026

554

Read 111 replies

Kilo’s skill scoping pattern: repo-shared standards vs user-local prefs

Skills scoping pattern (Kilo): Kilo shared a concrete convention for separating “team standards” from “personal preferences” by scoping skills to either a project directory (checked into git) or a user home directory, as described in the [skills tip]Skills tip.

This is explicitly framed as a context-engineering move—treating skills as structured markdown/context packages—reinforced by the [context reminder]Context reminder and expanded in the writeup linked from the [blog pointer]Blog post.

Kilo

@kilocode

·Follow

Skills tip: use project-level skills for team standards, global skills for personal preferences. Example: API patterns in .kilocode/skills/ (shared via git), your keyboard shortcut mnemonics in ~/.kilocode/skills/ (just for you).

7:01 PM · Jan 22, 2026

Read 1 reply

SuperDesignDev skill adds “design OS” workflows for coding agents

Superdesign skill (SuperDesignDev): A new installable skill is framed as a “design OS for coding agents,” extracting style/UI/user-journey context from an existing codebase and operating on an infinite canvas; installation is shown in the [skill intro]Skill intro and the [install steps]Install steps.

• Parallel exploration angle: The tool explicitly leans into running multiple design explorations in parallel on the same canvas, as demonstrated in the [skill intro]Skill intro.

Jason Zhou

@jasonzhou1993

·Follow

The Design OS for your coding agents $ npx skills add superdesigndev/superdesign-skill - Design w/ full codebase context - Infinite canvas exploration - Built in prompt library Here is how it works 🧵👇

Watch on X

9:34 AM · Jan 22, 2026

786

Read 24 replies

Hyperbrowser adds /docs fetch to pull live docs into Claude Code (cached)

/docs fetch in Claude Code (Hyperbrowser): Hyperbrowser added a Claude Code command, /docs fetch <url>, to ingest live docs from arbitrary sites and cache them for reuse, as described in the [feature blurb]Feature blurb.

This is a concrete “docs-as-context” primitive: it turns web docs into something agents can pull on demand rather than relying on stale local copies, per the [feature blurb]Feature blurb.

Hyperbrowser

@hyperbrowser

·Follow

Claude Code is already incredible. We gave it one more superpower. /docs fetch <url> Live docs from any site. Cached in your repo. Right in your terminal. No more coding against outdated docs. Powered by Hyperbrowser MCP.

Watch on X

8:47 PM · Jan 4, 2026

2.3K

Read 63 replies

SkillsBento’s X/Twitter Stats Analyzer skill turns CSV exports into insights

X/Twitter Stats Analyzer (SkillsBento): A Claude skill workflow is circulating for analyzing engagement by uploading X analytics CSV exports and running a dedicated “Stats Analyzer” skill, with the end-to-end flow shown in the [how-to thread]How-to thread and a second example in the [results share]Results share.

The skill artifact itself is referenced via the skill page linked in the [skill listing]Skill page.

Melvin Vivas

@donvito

·Follow

Grow your X by understanding your stats Download analytics data csv from here x.com/i/account_anal… Then upload to Claude and let "X Twitter Stats Analyzer" skill do its job Download here skillsbento.com/skills/4/x-twi…

Watch on X

8:35 AM · Jan 22, 2026

Read 2 replies

🧬 Agent builders & platforms: LangChain templates, Deep Agents memory, and white-box RAG tooling

Framework-layer updates for people building agents (not just using them): templates, memory primitives, and debuggable pipelines. Excludes Cursor 2.4 (feature).

Deep Agents adds /remember: persistent memory stored in AGENTS.md + skills/

Deep Agents CLI (LangChain OSS): Deep Agents shipped a new /remember primitive that injects a reflection step, then writes durable learnings to disk—specifically into AGENTS.md (preferences) and skills/ (workflows)—so future runs automatically get the updated context, as shown in the Remember feature thread.

• What it changes in practice: instead of “fix it again next session,” the agent can be corrected once (example: switching a Python HTTP library) and the correction persists via the filesystem, as demonstrated in the Video walkthrough.

• Docs and quickstart: the team points to setup via Anthropic API key and the uvx deepagents-cli entrypoint, as described in the Docs quickstart.

Viv

@Vtrivedy10

·Follow

🧠Shipping /remember in the Deep Agents CLI: a primitive for persistent agent memory How it works: - inject a reflection prompt into your conversation thread - agent analyzes full context + identifies patterns - writes learnings to filesystem (AGENTS .md for preferences, skills/ Show more

6:49 PM · Jan 22, 2026

Read 6 replies

UltraRAG 3.0 turns RAG into a debuggable “white box” with a WYSIWYG builder

UltraRAG 3.0 (OpenBMB/THUNLP et al.): UltraRAG 3.0 ships a WYSIWYG Canvas + Code pipeline builder (live-synced) plus a “Show Thinking” panel that visualizes retrieval, loops/branches, and tool calls to debug hallucinations against retrieved chunks, per the UltraRAG 3.0 release.

• Why it’s different from typical RAG frameworks: the pitch is explicit “white-box” debugging—seeing the full inference trajectory rather than guessing why a run failed—along with a built-in assistant to generate configs/prompts, as described in the UltraRAG 3.0 release.

• Where to inspect artifacts: code is in the GitHub repo, with an end-to-end demo shown in the UltraRAG 3.0 release.

OpenBMB

@OpenBMB

·Follow

🚀 Introducing UltraRAG 3.0: Reject "Black Box" Development. Make Every Line of Inference Logic Visible! UltraRAG 3.0 solves the "Last Mile" problem in RAG development, developed by THUNLP, NEUIR, OpenBMB & AI9Stars. 🔗 GitHub: github.com/OpenBMB/UltraR… 📚 Tutorial: Show more

Watch on X

3:00 PM · Jan 22, 2026

210

Read 2 replies

Gemini Interactions API cookbook: one endpoint to multi-turn + tools + Deep Research

Gemini Interactions API (Google): a new “Getting Started” cookbook notebook walks from a single model request to multi-turn conversation state, function calling, built-in tools like Google Search, and running the specialized Deep Research agent—all via one endpoint, per the Cookbook announcement.

• Reference artifacts: the walkthrough is provided as a runnable notebook in the Colab quickstart alongside a written guide in the Blog quickstart.

This reads less like a model announcement and more like a concrete integration recipe for teams that don’t want to manage chat history client-side, as described in the Cookbook announcement.

Philipp Schmid

@_philschmid

·Follow

Want to learn about Gemini Interactions API but don't know where to start? 🛑 We published a "Getting Started" cookbook taking you from model request to Deep Research agent in a single notebook. 👀 Gemini Interactions API is a unified interface for building with Gemini models Show more

4:23 PM · Jan 22, 2026

102

Read 5 replies

StackAI + Weaviate push “production RAG” framing: permissions, audit trails, milliseconds

Enterprise RAG architecture (StackAI + Weaviate): Weaviate and StackAI are pitching a no-code/low-code path to production RAG that emphasizes permissioning, auditability, and compliance (SOC 2/HIPAA/GDPR), with Weaviate as the retrieval layer and StackAI as the orchestration layer, per the Enterprise RAG guide.

• Workflow shape: multiple knowledge base sources feed a Weaviate index, then a StackAI flow routes through retrieval + LLM nodes into domain agents (e.g., compliance chatbot, claim triage), as shown in the Enterprise RAG guide.

This is a “governance-first RAG” framing—less about new retrieval algorithms, more about making retrieval systems deployable inside regulated orgs, as described in the Enterprise RAG guide.

Weaviate AI Database

@weaviate_io

·Follow

What if your AI agents could search 10,000 internal documents in milliseconds with full HIPAA compliance? 𝗦𝘁𝗮𝗰𝗸𝗔𝗜 + 𝗪𝗲𝗮𝘃𝗶𝗮𝘁𝗲 just made it possible without writing code. @stackai handles the orchestration letting teams build AI agents visually with drag-and-drop Show more

6:00 PM · Jan 22, 2026

Read 4 replies

🕹️ Running agent fleets: task DAGs, command allowlists, and long-running automation

Operational tooling and practices for running many agents reliably (permission gates, task systems, and background automations). Excludes Cursor 2.4 (feature).

Clawdbot adds command allow-lists and interactive approval dialogs

Clawdbot (steipete): The next Clawdbot version adds command allow-lists so unknown shell commands trigger an explicit approval dialog (allow once / always allow / deny), as shown in the [allowlist preview](t:151|allowlist preview).

This tightens the “agent can run shell commands” surface without needing to remove autonomy entirely.

• Operator UX: the dialog includes working directory, executable, host, and security mode fields, as visible in the [dialog screenshot](t:151|allowlist preview).
• Still supports unrestricted mode: the author notes “full madness mode is still possible,” in the same [preview](t:151|allowlist preview).

Conductor 0.32.0 adds GitHub issue import and Graphite stack support

Conductor 0.32.0 (Conductor): Conductor shipped a batch of operator features for agent-heavy workflows—import GitHub issues, Graphite stack support, “update Claude memory” in one click, and headless-oriented improvements—per the [0.32.0 announcement](t:160|0.32.0 announcement).

The through-line is giving a single operator a better surface for coordinating many parallel branches and agent sessions.

• Stacked-branch awareness: Graphite stacks show up as first-class UI, as shown in the [Graphite screenshot](t:166|Graphite support screenshot).
• Memory as an explicit action: the release frames “update Claude’s memory” as a single-step operation, as listed in the [release clip](t:160|0.32.0 announcement).

Cua-Bench open-sourced: a self-hostable eval suite for computer-use agents

Cua-Bench (trycua): Cua-Bench is now open source, packaging 15 public tasks with 40 variations plus adapters for OSWorld and Windows Agent Arena, positioned as a single CLI that teams can run in-house to evaluate every computer-use agent they deploy, per the [launch post](t:265|Open-source announcement).

This fits the “fleet ops” problem: once you have multiple agents running UI automation, you need repeatable checks that don’t depend on manual screen recording.

The repo is linked directly in the open-source code referenced by GitHub repo, with a separate getting started guide in Getting started guide.

“Tracer bullet” prompting to keep autonomous runs small and testable

Tracer bullet prompting (pattern): A concrete control technique for long agent runs is to explicitly demand the smallest end-to-end slice that crosses all layers, then expand; the prompt framing and a real task breakdown are shown in the [example screenshot](t:75|Tracer bullet example).

The core operational value is reducing “agent wandered into a big refactor” by forcing one demonstrable vertical slice first.

The same author positions “tracer bullet” as a keyword that reliably nudges models toward minimal scope, as explained in the [prompt note](t:75|Tracer bullet example).

AFK Ralph bash loop restores streaming output for unattended agent runs

Ralph / AFK coding (pattern): Following up on AFK streaming (unattended runs), a practical fix is circulating for the common pain point that “AFK means no streaming to the terminal by default,” using a bash script that captures stream-json and renders partial output live, as described in the [script walkthrough](t:28|streaming script) and the write-up linked in Script write-up.

This is a small detail, but it changes how tolerable “run agents for hours” feels—because you can actually see progress and intervene when it stalls.

Cowork workflow: point at a receipts folder, get a categorized monthly spreadsheet

Cowork (Anthropic): A concrete “document ops” pattern shows Cowork taking a folder of receipts and producing a categorized spreadsheet with monthly breakdowns, with essentially no setup besides pointing it at the folder, as shown in the [demo post](t:11|receipts spreadsheet demo).

This is the shape of work where long-running agents start to look like a replacement for small internal ETL and finance-ops scripts.

One implication is that “spreadsheet as output format” remains a stable interface for autonomous document pipelines, even when the inputs are messy and unstructured.

Deep Agents CLI ships /remember for persistent filesystem memory

Deep Agents CLI (LangChain OSS): Deep Agents added a /remember primitive that injects a reflection prompt, extracts durable learnings, and writes them to the filesystem (AGENTS.md for preferences; skills/ for workflows) so future threads load them automatically, as shown in the [feature post](t:324|Remember overview).

This is a direct attempt to make long-running agent work compound over days instead of re-learning the same project quirks every session.

A demo of correcting a library choice once (“requests→httpx”) and having it stick is referenced in the [YouTube walkthrough](link:324:0|Demo video).

RepoBar 0.2.0 ships “GitHub in your menubar” for repo ops

RepoBar 0.2.0 (steipete): RepoBar shipped an updated macOS menubar UI that surfaces repo status (issues/PRs/releases/CI runs) as a lightweight operator console, as shown in the [release screenshot](t:149|RepoBar UI) and the release notes linked in Release notes.

This kind of surface tends to matter more once agents are generating lots of small PRs and issues and the bottleneck becomes “keeping the queue moving.”

Sandbox-first doctrine for long-running agents: outcomes, explicit state, benchmarks

Agent ops doctrine (pattern): A concise checklist is making the rounds that argues for sandboxing everything, persisting state explicitly, defining outcomes not procedures, and planning for cost early, as listed in the [ops checklist](t:61|Agent ops checklist).

This is less about any single tool and more about how teams avoid operational dead-ends when agents run independently of user sessions.

It also reflects a shift toward treating agent runs like distributed jobs: ephemeral environments and implicit state stop working quickly.

Claude Code 2.1.17 fixes non-AVX CPU crashes

Claude Code CLI 2.1.17 (Anthropic): A small operational release fixes crashes on processors without AVX support, as stated in the [2.1.17 note](t:325|2.1.17 note) and the changelog referenced in Changelog.

This is a deployment footnote, but it matters for teams running agents on older bare-metal, CI runners, or cost-optimized fleet machines where AVX isn’t guaranteed.

🛠️ Dev utilities & knowledge surfaces: monitors, summarizers, and company search APIs

Non-assistant developer tools that feed or supervise agents: monitoring APIs, summarization utilities, and structured company/search products. Excludes Cursor 2.4 (feature).

OpenRouter adds regional provider performance views and endpoint stats

Provider performance telemetry (OpenRouter): OpenRouter now exposes performance by provider and geography (“track any LLM’s performance by provider in any global region”), as shown in the Regional performance demo.

It also highlights an endpoint stats API that surfaces uptime plus p50 latency and p50 throughput, with an example table showing one provider marked degraded and others healthy in the Endpoint stats screenshot.

This matters because routing across providers is increasingly an availability/cost control plane; the table in the Endpoint stats screenshot makes the “which endpoint should we hit right now?” question operational instead of anecdotal.

OpenRouter

@OpenRouterAI

·Follow

NEW: Track any LLM's performance by provider in any global region 🌎 Useful when you know where your servers are

Watch on X

4:05 PM · Jan 22, 2026

Read 2 replies

Exa launches semantic search over 60M companies with structured results and an eval

Company search (Exa): Exa says it now supports semantic search over 60M+ companies and returns structured attributes (traffic, headcount, financials, etc.), as described in the Company search launch. It also published a benchmark/eval so others can measure and compare approaches, per the Benchmarks and skill links.

This matters because “company lookup” is a recurring need in sales ops, recruiting, and market research agents—and structured outputs reduce brittle scraping.

• Evaluation artifact: Exa points to a public evaluation in the Benchmarks post, which makes it easier to compare provider quality beyond anecdotes.
• Agent integration surface: Exa also ships a Claude-oriented integration guide in the Claude skill docs, positioning this as a callable tool inside agent workflows.

Exa

@ExaAILabs

·Follow

Introducing the most powerful company search: You can now semantically search over 60M+ companies and get structured information on each (web traffic, headcount, financials, and more). Try it: exa.ai

6:09 PM · Jan 22, 2026

220

Read 11 replies

Parallel Monitor API adds schema-based structured outputs

Parallel Monitors (Parallel): Monitors—always-on web searches that notify on new information—can now return structured outputs shaped by a schema you define, rather than just freeform text, as announced in the Structured outputs launch.

This matters for engineering teams because it turns “web monitoring” into a directly ingestible upstream for agents and pipelines (alerts → JSON → automated triage), instead of a human-in-the-loop parsing step; the example schema for funding announcements (company, round, amount, lead investors, announced date) is shown in the Structured outputs launch.

Parallel Web Systems

@p0

·Follow

Introducing structured outputs for the Parallel Monitor API: Monitors are always-on web searches that notify you when new information becomes available on the web. With this release, you can now configure your monitors to deliver structured responses with a schema of your Show more

Parallel Web Systems

@p0

Today, we're launching the Parallel Monitor API. Define a query once and receive ongoing updates whenever new related information appears on the web. It's like a web search that's always on, or a webhook for the entire web. parallel.ai/blog/monitor-a…

Watch on X

3:15 PM · Jan 22, 2026

170

Read 7 replies

OpenRouter docs add “copy as Markdown” and open-in-assistant actions

Docs handoff UI (OpenRouter): OpenRouter added an “AI-friendly” docs menu with actions like copy page as Markdown for LLMs, view as Markdown, plus one-click open in Claude/ChatGPT and connect to Cursor, as shown in the Docs menu screenshot.

This matters because it standardizes a common workflow: turning vendor docs into model-ready context without manual cleanup, and shortening the path from “reading docs” to “asking an agent about them” via the same UI surface.

OpenRouter

@OpenRouterAI

·Follow

Making our docs even more AI-friendly

4:54 PM · Jan 22, 2026

Read 2 replies

Summarize 0.10.0 adds slides support and an agent mode

Summarize 0.10.0 (steipete): The Summarize tool (browser extensions + terminal) shipped v0.10.0 with broader inputs (“any website, YouTube, podcast, or file format”) and adds slides support plus an agent mode, as announced in the 0.10.0 release note and detailed in the GitHub release.

This matters as a pragmatic “context preprocessor” for agents: it’s a standalone summarization surface that can turn messy media/files into compact text before you feed it into a coding or research run.

Peter Steinberger 🦞

@steipete

·Follow

Summarize 0.10.0 is out! Get the gist out of any website, YouTube video, podcast or really any file format. Chrome, Firefox and terminal. Now even slides and an agent mode. github.com/steipete/summa…

12:34 PM · Jan 22, 2026

219

Read 5 replies

Mastra crosses 20k GitHub stars as TS agent framework adoption signal

Framework adoption (Mastra): Mastra reports hitting 20k GitHub stars, framing it as a milestone for the project’s traction, as shown in the 20k stars post.

This matters to engineering leads mainly as a signal: TypeScript-first agent stacks are consolidating around a smaller set of frameworks, and repo-scale adoption tends to pull ecosystem tooling (examples, integrations, eval harnesses) along with it; the “now 1.0” claim is also called out in the 1.0 note.

Mastra

@mastra

·Follow

⭐ i just hit 20k stars on github

3:22 PM · Jan 22, 2026

Read 3 replies

📏 Evals & observability: agent task suites, model indexes, and arena dynamics

Benchmark and eval artifacts that help teams choose models/tools and measure agent performance. Excludes Cursor 2.4 (feature).

Artificial Analysis: GLM-4.7-Flash (Reasoning) leads open-weights under 100B on its Index

GLM-4.7-Flash (Reasoning) (Z.ai): Artificial Analysis says GLM-4.7-Flash (Reasoning) is now the top “open weights <100B params” model on its Intelligence Index with a score of 30, describing it as a 31B/3B total/active MoE that can run on 1× H100 (BF16), per the Artificial Analysis breakdown.

• Agentic/task results: The writeup calls out ~99% on τ²-Bench Telecom and 22% on Terminal-Bench Hard, as reported in the Artificial Analysis breakdown.
• Where it’s weaker: It’s described as lagging on knowledge with -60 on the Omniscience Index and 0.3% on CritPt, again per the Artificial Analysis breakdown.

For model selection, the key takeaway is the split between strong “agentic execution” scores and weaker “research assistant / knowledge” scores, as summarized in the Artificial Analysis breakdown.

Artificial Analysis

@ArtificialAnlys

·Follow

GLM-4.7-Flash (Reasoning) is now the most intelligent open weights model under 100B total parameters on the Artificial Analysis Intelligence Index GLM-4.7-Flash (Reasoning) scores 30 on the Artificial Analysis Intelligence Index. GLM-4.7-Flash is 31B/3B total/active parameters Show more

1:59 AM · Jan 23, 2026

803

Read 16 replies

Cua open-sources Cua-Bench: 15 GUI tasks, 40 variations, OSWorld + Windows adapters

Cua-Bench (Cua): Cua open-sourced Cua-Bench, describing it as the internal harness they’ve used “for the last few months” to evaluate computer-use agents before deployment, with 15 public tasks and 40 variations, plus adapters for OSWorld and Windows Agent Arena, per the Open-source eval suite.

This lands as a practical “bring-your-own-agent” benchmark artifact: a single CLI + self-hostable setup meant to standardize how teams measure GUI automation reliability across OS targets, as stated in the Open-source eval suite.

Cua

@trycua

·Follow

We've been using Cua-Bench internally—and with customers—for the last few months to evaluate every computer-use agent we deploy. Today it's open-source. 15 public tasks, 40 variations, adapters for OSWorld and Windows Agent Arena. One CLI, self-hostable.

Watch on X

6:36 PM · Jan 22, 2026

Read 15 replies

OpenRouter adds an endpoint stats API with uptime, p50 latency, and throughput

OpenRouter (routing observability): OpenRouter’s endpoint stats API surfaces per-provider status, uptime, p50 latency, and p50 throughput for a given model—illustrated with Anthropic vs Bedrock vs Google endpoints in the Endpoint stats output.

The practical relevance is that this turns “which provider is degraded right now?” into something automatable (routing based on live latency/throughput) rather than anecdotal, as shown in the Endpoint stats output.

OpenRouter

@OpenRouterAI

·Follow

Related tip: you can fetch the performance stats on any model via API Here's Claude Code parsing the curl results:

OpenRouter

@OpenRouterAI

NEW: Track any LLM's performance by provider in any global region 🌎 Useful when you know where your servers are

Watch on X

4:17 PM · Jan 22, 2026

Read 2 replies

Terminal-Bench paper lands as a failure-focused eval for terminal agents

Terminal-Bench (agent eval): The Terminal-Bench paper is now out, framed explicitly around “where frontier models still fail” on realistic terminal tasks, per the Paper announcement.

The value for builders is that it’s positioned as an eval suite for end-to-end terminal work (not just coding snippets), and the public release signals that more teams are trying to measure long-horizon tool-use failures rather than prompt quality alone, as implied by the Paper announcement.

Mike A. Merrill

@Mike_A_Merrill

·Follow

The Terminal-Bench paper is here! Read it to learn where frontier models still fail and the secrets of how we sourced hundreds of high quality environments from our open source community. 🧵

6:11 PM · Jan 22, 2026

452

Read 23 replies

Snowbunny tops Heiroglyph lateral reasoning with 16/20 vs GPT-5 high at 11/20

Heiroglyph benchmark (community eval): Two unreleased Gemini variants codenamed Snowbunny (“raw” and “less raw”) score 16/20 (80%) on Heiroglyph’s lateral-reasoning test, ahead of GPT-5 (high) at 11/20 (55%), as shown in the Heiroglyph results post.

This matters because it’s one of the clearer “reasoning style” signals circulating (lateral puzzles vs. math/coding), and it’s being used to infer how far internal checkpoints may be from public releases—though the chart is still a single benchmark snapshot, and the models are not publicly accessible per the Heiroglyph results post.

Wes Roth

@WesRoth

·Follow

Two unreleased versions of Google's upcoming Gemini model, codenamed Snowbunny, have achieved state-of-the-art performance on Heiroglyph, a benchmark designed to test lateral reasoning.

leo 🐾

@synthwavedd

Two versions of an upcoming Gemini model - codename Snowbunny - both achieve a state-of-the-art performance on Heiroglyph, my benchmark testing lateral reasoning capabilities. This model is in a league of its own for these tasks. Looks like I'm gonna have to start working on V2.

11:00 AM · Jan 22, 2026

Read 5 replies

GLM-4.7-Flash enters LM Arena Text Arena for head-to-head comparisons

Text Arena (LM Arena): LM Arena says GLM-4.7-Flash is now live in its Text Arena battle mode (noting it as a smaller variant of GLM-4.7), inviting users to compare it against frontier models via the arena workflow described in the Arena listing.

This matters mainly as an evaluation surface: it’s one more route for gathering preference-style head-to-head outcomes that can complement index-style benchmarking (like Artificial Analysis’ ranking), as implied by the Arena listing.

Arena.ai

@arena

·Follow

🚨 GLM-4.7-Flash by @Zai_org is now live in the Text Arena. At 30B, it is a smaller variant of GLM-4.7. Jump into Battle mode and see how it performs against today’s top frontier models. 🔥

Z.ai

@Zai_org

Introducing GLM-4.7-Flash: Your local coding and agentic assistant. Setting a new standard for the 30B class, GLM-4.7-Flash balances high performance with efficiency, making it the perfect lightweight deployment option. Beyond coding, it is also recommended for creative writing,

10:24 PM · Jan 22, 2026

175

Read 5 replies

📦 Model releases watch: open TTS, Chinese frontier churn, and leaked codenames

Material model availability changes and credible leaklets. Excludes Cursor 2.4 (feature).

Qwen open-sources Qwen3‑TTS with voice design, cloning, and full fine-tuning

Qwen3‑TTS (Alibaba/Qwen): Qwen open-sourced the full Qwen3‑TTS family—VoiceDesign, CustomVoice, and Base—shipping weights, code, and a paper in the launch thread; the release spans 5 models across 0.6B and ~1.7–1.8B sizes plus a 12Hz tokenizer, and it’s positioned as “disruptive” for open TTS by enabling both free-form voice creation and cloning with full fine-tuning support, as described in the launch thread.

• Builder-relevant surface area: The repo and artifacts are live via the GitHub repo and the model collection, which makes this immediately runnable in local stacks and deployable via common model hubs.
• Latency & streaming claim: Community summaries highlight streaming-first behavior with “first audio packet after 1 character” and ~97ms synthesis latency, as described in the architecture summary.

Early user reaction is positive on voice clone/design quality, per a hands-on note in the early usage reaction.

Qwen

@Alibaba_Qwen

·Follow

Qwen3-TTS is officially live. We’ve open-sourced the full family—VoiceDesign, CustomVoice, and Base—bringing high quality to the open community. - 5 models (0.6B & 1.8B) - Free-form voice design & cloning - Support for 10 languages - SOTA 12Hz tokenizer for high compression - Show more

1:16 PM · Jan 22, 2026

6.2K

Read 193 replies

Gemini “Snowbunny” leak shows 16/20 on Heiroglyph lateral reasoning

Snowbunny (Google/Gemini): Two unreleased Gemini variants codenamed Snowbunny are shown scoring 16/20 on the Heiroglyph lateral reasoning benchmark, following up on AI Studio tests (Snowbunny spotted in A/B) with a quantified result surfaced in the Heiroglyph results post.

• What’s new vs. the earlier sightings: The chart explicitly lists “snowbunny (raw)” and “snowbunny (less raw)” at the top, while placing “gpt‑5 (high)” at 11/20, as shown in the Heiroglyph results post.
• Early qualitative demos: Separately, a “Snowbunny” demo clip claims strong one-shot UI recreation behavior (Windows-like UI), along with the recurring “compute availability” caveat, as shown in the Snowbunny demo clip.

No public availability, API details, or model card are present in today’s tweets, so this remains a capability signal rather than a shipping surface.

Wes Roth

@WesRoth

·Follow

Two unreleased versions of Google's upcoming Gemini model, codenamed Snowbunny, have achieved state-of-the-art performance on Heiroglyph, a benchmark designed to test lateral reasoning.

leo 🐾

@synthwavedd

11:00 AM · Jan 22, 2026

Read 5 replies

Baidu’s ERNIE 5.0 is reported released, with benchmark charts circulating

ERNIE 5.0 (Baidu): A release claim for ERNIE 5.0 is circulating, describing it as a 2.4T-parameter multimodal model with strong benchmark results, per the release claim.

The most concrete artifact in these tweets is a benchmark bar chart that includes ERNIE‑5.0 alongside GPT‑5 and Gemini variants, as shown in the model comparison chart; treat the chart as provisional here because the tweets don’t include a single canonical eval report or official model card to anchor methodology.

AI Leaks and News

@AILeaksAndNews

·Follow

Baidu have released ERNIE 5.0 The 2.4T parameter multimodal model boasts impressive benchmark scores Yet another strong AI release out of China

Baidu Inc.

@Baidu_Inc

ERNIE 5.0 is officially live! As a native omni-modal large model, it is built on end-to-end architecture to enable unified multimodal understanding and generation. With a 2.4T-parameter MoE architecture and under 3% active parameters per inference, ERNIE 5.0 balances strong

2:29 PM · Jan 22, 2026

Read 1 reply

ByteDance’s “Giga‑Potato” Doubao model is being tested with 256k context

Doubao (ByteDance): ByteDance is reportedly testing a new Doubao model inside Kilo Code under the nickname “Giga‑Potato,” with claimed 256k context and 32k max output, and an emphasis on strict system prompt adherence for long-context coding tasks, per the Kilo Code description.

A follow-up note says it also appeared on LM Arena under an unknown alias, which makes the current evidence mostly “leaklet + tester chatter,” as described in the LM Arena note.

AiBattle

@AiBattle_

·Follow

ByteDance has been testing its new Doubao model for a week now in Kilo Code under the name "Giga-Potato" Description from Kilo Code: "In our internal benchmarks, it has outperformed nearly every open-weight model we’ve tested on long-context coding tasks - Context Window: 256k Show more

3:37 PM · Jan 22, 2026

194

Read 13 replies

vLLM‑Omni lands day‑0 offline inference for Qwen3‑TTS

vLLM‑Omni (vLLM Project): The vLLM team says vLLM‑Omni has day‑0 support for running Qwen3‑TTS features (voice cloning + voice design) “natively,” with offline inference available now and online serving “coming soon,” as announced in the support post.

This matters if you’re already standardizing on vLLM for inference and want TTS to share the same serving substrate; the post includes concrete entrypoints for running end-to-end samples locally, as shown in the support post.

vLLM

@vllm_project

·Follow

Congrats to the @Alibaba_Qwen team on Qwen3-TTS! 🎉 vLLM-Omni is ready with day-0 support – voice cloning, voice design, and natural language control for emotion & prosody, all running natively. Offline inference available now via PR #895, online serving coming soon. 🔗 Show more

Qwen

@Alibaba_Qwen

2:45 PM · Jan 22, 2026

319

Read 8 replies

A practical local CLI workflow for Qwen3‑TTS voice cloning

Qwen3‑TTS (hands-on): A concrete “try it locally” recipe is circulating: Simon Willison reports Qwen3‑TTS voice cloning works well in practice and shares a minimal CLI wrapper so you can generate audio from text + a voice instruction string, as shown in the hands-on notes.

The wrapper example uses uv run to execute a hosted Python script and emit a WAV ("pirate.wav"), and it’s linked directly from the CLI script link, which makes it easy to reproduce without building a full pipeline first.

Simon Willison

@simonw

·Follow

Qwen released a new Apache 2.0 licensed text-to-speech model, with full voice cloning abilities, and it's really effective - my notes from trying it out here: simonwillison.net/2026/Jan/22/qw…

5:51 PM · Jan 22, 2026

640

Read 19 replies

🧪 Training & reasoning methods: test-time learning, multiplex CoT, and judge-free RL

Research that changes how models/agents are trained or made more reliable at inference time. Excludes Cursor 2.4 (feature).

TTT-Discover shows “learn while solving” test-time RL with LoRA updates

TTT-Discover (research): A new approach updates a model’s weights at test time—running RL rollouts, scoring with a checker, then applying LoRA updates—aimed at producing one excellent solution per instance instead of broad generalization, as summarized in the paper preview and further unpacked in the method notes.

• Why it’s different: Rather than pure sampling/search, it uses test-time training loops (e.g., LoRA updates after batches of rollouts) so the model “learns” from what just worked, as described in the method notes.
• Quantified results called out: The writeups cite wins on tasks like Erdős-style optimization and GPU kernel engineering (e.g., a TriMul kernel runtime improvement to 1161μs vs 1371μs for best human), as reported in the method notes.

What’s still unclear from the tweets is how broadly this transfers beyond domains with fast, trustworthy checkers.

Agentic Reasoning survey formalizes “thought + action” as a unified paradigm

Agentic Reasoning survey (research): A 135+ page survey reframes LLM reasoning around interaction—planning, tool use, search, memory, and feedback—organized across foundational single-agent methods, self-evolving loops, and multi-agent collaboration, as shown in the paper screenshot.

• Taxonomy that maps to builders’ systems: It explicitly separates in-context orchestration (inference-time search/orchestration) from post-training reasoning (RL/SFT), per the survey overview.

The underlying document is linked in the ArXiv entry, and the tweets suggest it’s meant as a roadmap more than a single new technique.

Latent-GRPO removes the judge by rewarding hidden-state clustering

Latent-GRPO (“Silence the Judge”): A paper proposes training reasoning with RL without external judges by clustering last-token hidden states of sampled solutions and rewarding proximity to a robust centroid—replacing brittle 0/1 judge signals with a smoother internal reward, as summarized in the paper thread.

• Claimed speed: It reports over 2× faster training versus judge-based GRPO setups, per the paper thread.
• Core mechanism: Uses an iterative robust centroid estimation (IRCE) procedure on hidden states to downweight outliers and define reward geometry, as described in the paper thread.

The tweets don’t include an ablation table or code pointer, so treat the “judge-free” stability and generality claims as unverified here.

Multiplex Thinking compresses branching CoT into “multiplex tokens”

Multiplex Thinking (research): Instead of expanding a chain-of-thought with many branches, it samples K discrete tokens at each step and merges them into a single continuous “multiplex token,” enabling exploration without longer sequences, as explained in the method breakdown.

• Reported performance: The thread claims gains across 6 math benchmarks, including up to 50.7% Pass@1 and stronger Pass@1024, while generating shorter/denser outputs, per the method breakdown.
• Training compatibility: Because sampled tokens are independent (log-probs add), the setup is described as fitting naturally with RL optimization, as noted in the method explanation.

The actual paper is linked via the ArXiv entry but the tweets don’t include implementation details or code availability.

Small-batch LM training argues batch size 1 can be stable by retuning Adam

Small-batch training (research): A paper argues language models can train stably at batch size 1 by tuning Adam’s β2 based on token count (keeping the optimizer’s “memory” constant in tokens), and claims gradient accumulation can be wasteful for LMs, as summarized in the paper notes.

• Concrete claims: Evaluations span batch sizes 1–4096; it also claims vanilla SGD can be competitive up to ~1.3B parameters under the proposed tuning, per the paper notes.

The tweet frames this as practical for low-memory full fine-tuning (including Adafactor), but doesn’t include direct reproducibility artifacts beyond the arXiv pointer in the text.

Study claims spoken language is drifting toward ChatGPT-favored wording

Language drift (discussion + paper): Analysis of ~280,000 transcripts of academic talks/presentations claims an increasing use of words that are “favorites of ChatGPT,” raising concerns about cultural feedback loops ("model collapse, except for humans"), as described in the paper callout.

The underlying preprint is linked via the ArXiv PDF, but the tweets don’t surface which tokens/phrases drive the effect or how robust the attribution is to topic shifts and platform changes.

⚡ Compute, energy, and supply constraints that shape the AI race

Infrastructure constraints were a recurring thread: energy, memory supply, GPU availability, and export controls. Excludes Cursor 2.4 (feature).

Energy, not chips, becomes the bottleneck framing for AI scaling

Energy constraint (AI infrastructure): Multiple high-visibility voices converge on “electricity availability” as the limiting factor for frontier AI scaling; Elon Musk contrasts exponential AI chip production with electricity supply growing only ~3–4%/year as described in the WEF quote clip, while Demis Hassabis similarly calls energy the “real bottleneck” on the road to AGI in the energy bottleneck clip. The same theme gets politicized in claims that the AI race is now about energy (and that Europe will be sidelined), as stated in the energy race claim.

• Why engineers feel this first: power and grid buildout becomes a gating item for both training and inference capacity planning (site selection, interconnect lead times, capex sequencing), not just GPU procurement, per the on-stage framing in Davos clip and energy bottleneck clip.

Rohan Paul

@rohanpaul_ai

·Follow

"Maybe by the end of this year, we’ll (USA) be producing more chips than we can turn on, except for China. China’s growth in electricity is tremendous." ~ Elon Musk at World Economic Forum "We’re seeing the rate of AI chip production increase exponentially, but the rate of Show more

Watch on X

6:08 PM · Jan 22, 2026

Read 5 replies

AI server demand is driving a memory price crunch into 2026–2027

Memory supply (DRAM/NAND): Reporting and circulated projections argue AI datacenter buildouts are absorbing enough DRAM and SSD/NAND capacity to move the entire memory market; one thread cites Q1 memory pricing potentially up 40–50% after a ~50% surge last year, with some specific parts reported as far higher, per the Reuters memory squeeze thread. Trend projections also show a sharp revenue ramp tied to AI servers, as visualized in the TrendForce revenue chart.

• Knock-on effects: the same reporting ties memory allocation and spot-price volatility to weaker shipment outlooks for phones/PCs/consoles, as described in the Reuters memory squeeze thread.

Rohan Paul

@rohanpaul_ai

·Follow

Memory chips are turning into a massive bottleneck, and the knock-on effect is that phones, PCs, and game consoles are getting pricier right when demand is already shaky. Counterpoint estimates memory prices could rise 40% to 50% in Q1 after a 50% surge last year, and a Show more

3:37 PM · Jan 22, 2026

Read 6 replies

Jensen Huang’s “rent a GPU” test highlights persistent scarcity

GPU availability (NVIDIA): Jensen Huang argues a simple “AI bubble test” is whether you can rent an NVIDIA GPU, implying demand is so high that even older generations are seeing spikes, as summarized in the GPU rental scarcity clip. The subtext is that real-world access constraints remain visible even when public narratives swing between “bubble” and “slowdown.”

Haider.

@slow_developer

·Follow

Jensen Huang says the AI bubble test is simple: try renting an NVIDIA GPU NVIDIA GPUs are in every cloud, and demand is so high that even two-generation-old hardware is seeing a spike The investments are massive because we're building the infrastructure for every layer of AI Show more

Watch on X

9:00 AM · Jan 22, 2026

627

Read 50 replies

US data-center pipeline implies ~10× growth, but grid queues and turbines gate it

US datacenter buildout (Reuters): Reuters reports filed projects could imply ~1,000% growth in US datacenter capacity versus just under ~15 GW today, but warns many filings are aspirational and constrained by utility interconnection queues and long lead times for gas turbines, as described in the Reuters pipeline summary. Separately, Reuters notes residential power prices are already up 16% on average across the 15 states with the largest pipelines, per the power price follow-up.

Rohan Paul

@rohanpaul_ai

·Follow

Planned US data center capacity could rise by ~1,000% if every filed project gets built, although many filings are aspirational and could slip due to grid and siting delays. Total current capacity is just under 15 GW. Potential roadblocks include clogged utility Show more

3:42 PM · Jan 22, 2026

Read 2 replies

New bill targets Nvidia H200 export licenses with Congressional review

Export controls (US ↔ China): A reported policy fight centers on whether the US should license exports of Nvidia’s H200 AI chips to China; a proposed House bill (“The AI Overwatch Act”) would add a 30-day Congressional committee sign-off window for covered licenses and could pause/revoke approvals, as summarized in the CNBC bill summary. The same report notes China may be slowing/blocking imports at customs even when US approval exists, per the CNBC bill summary.

Rohan Paul

@rohanpaul_ai

·Follow

Trump’s team is moving to license Nvidia’s H200 artificial intelligence chips for export to China, and a bloc of China hawks in Congress is trying to put that power back in lawmakers’ hands. The older setup was Commerce Department licenses plus executive branch discretion, which Show more

3:02 PM · Jan 22, 2026

Read 8 replies

💼 Enterprise economics & GTM: ARR spikes, mega-rounds, and outcome-based pricing debates

Business signals centered on OpenAI’s revenue acceleration and capital needs, plus new pricing ideas and SaaS market repricing narratives. Excludes Cursor 2.4 (feature).

OpenAI says API revenue added $1B+ ARR in a single month

OpenAI API (OpenAI): OpenAI CEO Sam Altman says the company added more than $1B of ARR in the last month from its API business, emphasizing that OpenAI is “mostly thought of as ChatGPT” even while API growth is doing the heavy lifting, per the API ARR claim. This matters for engineering leaders because it’s a strong signal that model consumption is continuing to migrate into product backends (not just end-user chat), which typically means more pressure on reliability, latency, and throughput.

• Scale context: A separate summary claims OpenAI’s total ARR surpassed $20B by end of 2025 with large cash burn, framing the growth-vs-cost tension for buyers and vendors, as described in the Industry revenue snapshot.

Sam Altman

@sama

·Follow

We have added more than $1B of ARR in the last month just from our API business. People think of us mostly as ChatGPT, but the API team is doing amazing work!

6:06 PM · Jan 22, 2026

11.9K

Read 1.4K replies

OpenAI’s reported $50B raise is now tied to a 1GW UAE cluster plan

OpenAI funding (OpenAI): Bloomberg reporting says Sam Altman is pitching state-backed Middle East investors on a $50B+ round valuing OpenAI at ~$750B–$830B, and explicitly ties it to regional infrastructure—OpenAI’s announced UAE “Stargate” plan for a 1GW cluster in Abu Dhabi with 200MW expected online in 2026, per the Bloomberg fundraising details following up on funding rumor (the round size/valuation chatter).

The practical engineering read-through is that the “capital raise” story is also an “energy + datacenter siting” story; the 1GW/200MW numbers set expectations for how quickly additional inference capacity could plausibly come online.

Rohan Paul

@rohanpaul_ai

·Follow

Bloomberg reported that Sam Altman is pitching Middle East investors on a fresh OpenAI funding round of at least $50B. It would value OpenAI around $750B to $830B and the round could close by the end of Q1 2026. What is new here is the scale and the investor mix, with Altman Show more

2:28 AM · Jan 23, 2026

140

Read 19 replies

OpenAI floats outcome-based licensing for AI-aided discoveries; backlash follows

Outcome-based pricing (OpenAI): Discussion spikes around OpenAI exploring “licensing, IP-based agreements and outcome-based pricing” where enterprise customers could agree to revenue share on downstream wins (example given: drug discovery sales share), with the key clarification that it’s positioned as an optional enterprise deal, not “coming after random people,” per the Clarification thread.

• Why it’s controversial: Critics frame it as OpenAI “taking a cut” of customer breakthroughs and argue it undermines the original nonprofit narrative, as reflected in the Profit-share criticism and the Skeptic response.
• How proponents frame it: Supporters argue this is a sign models are becoming a “discovery engine” worth outcome pricing, as described in the Discovery engine framing.

What’s still unclear from the tweets is how such contracts would be operationalized (measurement, attribution, auditability) without creating perverse incentives or procurement dead-ends.

prinz

@deredleritt3r

·Follow

Let's cut through the untrue sensationalist reporting. - OpenAI is NOT going to come after random people and demand that they share profits from discoveries made with the aid of ChatGPT. - OpenAI is NOT going to "force" anyone to hand over their profits. Here's what's actually Show more

*Walter Bloomberg

@DeItaone

OPENAI PLANS TO TAKE A CUT OF CUSTOMERS’ AI-AIDED DISCOVERIES

11:16 PM · Jan 22, 2026

424

Read 67 replies

AI agent narratives drive SaaS repricing: per-seat revenue looks shakier

SaaS repricing (Market signal): A Bloomberg-style summary argues that as AI agents do “glue work” (turning messy inputs into spreadsheets/drafts), investors are repricing traditional per-seat SaaS—citing a Morgan Stanley basket down ~15% in 2026, and pointing to drops like Intuit (-16%) and Adobe/Salesforce (-11%+), per the SaaS selloff summary.

The concrete mechanism described is that if internal agents can build “good enough” bespoke tools and run projects continuously, seat growth and net retention assumptions get weaker, so multiples compress even when the underlying vendors’ near-term fundamentals haven’t yet visibly deteriorated.

Rohan Paul

@rohanpaul_ai

·Follow

AI fears deepen selloff in software stocks despite bargain valuations. AI is doing the same “glue work” that many office apps sell, like taking messy inputs and turning them into a usable spreadsheet or a first draft report. e.g. Anthropic’s Claude Cowork service, released as a Show more

Rohan Paul

@rohanpaul_ai

Satya Nadela is basically describing the death of the traditional SaaS model. Explains the AI agentic future, and where the "value" lives. Because business logic is moving from the software application to the AI agents. Currently, you buy software for its specific features and

Watch on X

4:53 PM · Jan 22, 2026

Read 8 replies

OpenAI reorganizes: Barret Zoph leads enterprise push; GM roles across major bets

OpenAI org (OpenAI): A reported internal reshuffle moves Barret Zoph to lead the enterprise AI sales push, while COO Brad Lightcap shifts away from running enterprise product/engineering; OpenAI is also rolling out a “general manager” structure across big product lines (ChatGPT, enterprise, Codex, ads) to tighten the research→product loop, as summarized in the Reorg summary.

This matters operationally because it’s an explicit signal that enterprise adoption and monetization are being treated as a first-class product surface—typically a precursor to more packaging, contract structure changes, and uptime/SLA focus.

Chubby♨️

@kimmonismus

·Follow

OpenAI just hit the reorganize button: Barret Zoph is now leading the enterprise AI sales push, while COO Brad Lightcap shifts out of running enterprise product/engineering. They’re also rolling out a “general manager” lineup across big product bets - ChatGPT, enterprise, Codex, Show more

Stephanie Palazzolo

@steph_palazzolo

Breaking w/ @erinkwoo: Fidji Simo just announced a reorg at OpenAI: - Barret Zoph running enterprise - Brad Lightcap focusing on commercial functions - Vijaye Raji running ads With the changes, Simo wants to better align research/product/eng, she said: theinformation.com/articles/opena…

10:54 AM · Jan 22, 2026

157

Read 21 replies

🛡️ Safety, governance, and failure modes in agentic systems

Safety work today skewed toward practical audits and governance: open tools for alignment testing, plus papers on epistemic failure in tool-using agents. Excludes Cursor 2.4 (feature).

Anthropic releases Petri 2.0 alignment-audit suite with eval-awareness mitigations

Petri 2.0 (Anthropic): Anthropic shipped Petri 2.0, its open tool for automated alignment/behavior audits; the update targets eval-awareness (models “gaming” audits), expands scenario seeds to cover more behaviors, and refreshes comparisons against newer frontier models, as announced in the release thread and detailed on the Alignment blog post.

For safety teams, the practical change is better out-of-the-box coverage (more scenarios) plus more realistic auditing when models have started learning the shape of popular evals—see the audit update note for what was revised and why.

Anthropic

@AnthropicAI

·Follow

Since release, Petri, our open-source tool for automated alignment audits, has been adopted by research groups and trialed by other AI developers. We're now releasing Petri 2.0, with improvements to counter eval-awareness and expanded seeds covering a wider range of behaviors.

Anthropic

@AnthropicAI

It’s called Petri: Parallel Exploration Tool for Risky Interactions. It uses automated agents to audit models across diverse scenarios. Describe a scenario, and Petri handles the environment simulation, conversations, and analyses in minutes. Read more: anthropic.com/research/petri…

12:08 AM · Jan 23, 2026

787

Read 57 replies

Semantic laundering paper argues tool boundaries don’t make outputs trustworthy

Semantic laundering (agent epistemics): A new paper argues that many agent architectures accidentally treat LLM-generated content as if it were evidence once it crosses a “tool” boundary—creating false confidence via “observations” that are really rephrased model guesses, as summarized in the paper summary.

A concrete mitigation proposed in the same paper summary is to label tools by evidence role (e.g., observer vs computation vs generator) so downstream reasoning can’t quietly upgrade “generated” outputs into “ground truth.”

Rohan Paul

@rohanpaul_ai

·Follow

This paper shows why tool using LLM agents can mistake tool outputs for evidence, and how to prevent that. That many agent systems can look grounded while they are mostly recycling and approving their own guesses. The problem is that many agent frameworks save any tool result Show more

3:24 AM · Jan 23, 2026

Read 1 reply

South Korea passes AI Basic Act defining “high-risk AI” and deepfake/disinfo duties

AI regulation (South Korea): South Korea passed the AI Basic Act, described as targeting deepfakes/disinformation responsibilities and introducing obligations around “high-risk AI” systems that could significantly affect safety/lives, according to the law summary.

Operationally, the framing in the law summary points to deployer responsibilities (warnings, investigations, fines) rather than purely model-builder rules, which is a direct pressure point for teams shipping agentic products into Korea.

Rohan Paul

@rohanpaul_ai

·Follow

South Korea becomes first in the world to pass law on safe use of AI. he "Basic Act on the Development of Artificial Intelligence and the Establishment of a Foundation for Trust" (AI Basic Act). The AI Basic Act focuses on requiring companies and AI developers to take greater Show more

1:35 AM · Jan 23, 2026

Read 3 replies

700+ creators back campaign calling for licensed AI training inputs

Training-data licensing pressure (creators): A new industry statement backed by 700+ actors/writers/creators calls for AI developers to use licensing deals and partnerships (rather than unlicensed web-scale data) as the default path, per the campaign summary.

This is a governance signal more than a technical change: the campaign summary frames dataset provenance as auditable contracts, which maps directly onto enterprise procurement and “rights-clean” model sourcing.

Rohan Paul

@rohanpaul_ai

·Follow

More than 700 actors, writers, and other creators signed a new anti-AI statement saying tech firms are building AI by taking copyrighted work without permission or pay. It asks AI developers to use licensing deals and partnerships instead. Scarlett Johansson, Cate Blanchett, Show more

1:40 AM · Jan 23, 2026

Read 11 replies

Long-running agents raise “intent drift” accountability and liability questions

Agent liability (intent drift): A legal-risk thread highlights that long-running agents can change behavior over time (“intent drifts”), making it hard to pin accountability on the builder vs deployer vs the agent’s evolving behavior, as laid out in the intent drift thread.

The claim in the intent drift thread is that existing legal concepts assume relatively static intent, which doesn’t map cleanly onto agents that persist, accumulate context, and act over long horizons.

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭

@elder_plinius

·Follow

Long-running AI agents are gonna be a legal nightmare. When intent drifts over time, accountability quickly becomes a clusterfuck… was it the builder’s will? the deployer’s training? or the AI’s emergent mind? The law assumes static intent, but AI agents are anything but. Show more

3:22 PM · Jan 22, 2026

531

Read 81 replies

🗣️ Voice agents: realtime speech-to-speech, ultra-low-latency TTS, and platform momentum

Voice progress continues with low-latency models and platform funding signals. Excludes Cursor 2.4 (feature) and keeps Qwen3‑TTS in model releases.

LiveKit raises $100M to push voice-agent infrastructure up the stack

LiveKit (Voice infra): LiveKit says it raised $100M to make building voice AI “as easy as a web app,” positioning voice as the most natural interface and signaling more capital flowing into realtime agent plumbing rather than just models, as stated in the funding announcement and elaborated in the funding blog at funding blog.

For engineers, this is mostly about tooling maturity: better turnkey building blocks for realtime audio transport, turn-taking, and deployment ergonomics—areas that typically become the bottleneck once a team moves beyond toy demos.

LiveKit

@livekit

·Follow

We learn to speak before we learn to read. Voice is the most natural interface we have. We just raised a $100M to make building voice AI as easy as a web app.

Watch on X

5:02 PM · Jan 22, 2026

640

Read 78 replies

Chroma 1.0 claims sub-150ms open speech-to-speech with personalized cloning

Chroma 1.0 (FlashLabs): Following up on Chroma launch (open speech-to-speech), FlashLabs’ Chroma 1.0 is described as an open, native speech-to-speech model (skipping a speech→text→LLM→text→speech pipeline) with <150ms latency claims and personalized voice cloning, plus a reported similarity score of 0.817, as summarized in the model overview clip.

Treat the metrics as provisional—there’s no linked eval artifact in the tweets—but the direction is clear: pushing low-latency voice agents via end-to-end speech modeling rather than stitched pipelines.

Wes Roth

@WesRoth

·Follow

FlashLabs just released Chroma 1.0, the world’s first open-source, real-time, speech-to-speech model with personalized voice cloning. Unlike traditional pipelines that go from speech → text → LLM → text → speech, Chroma performs native speech-to-speech processing with Show more

Watch on X

FlashLabs

@flashlabsdotai

Today we’re releasing Chroma 1.0 → the world first open-source, end-to-end, real-time speech-to-speech model → with personalized voice cloning Trained by FlashLabs. Deployed on FlashAI👉 flashlabs.ai/flashai-voice-… An open research-grade alternative to the @OpenAI Realtime

Watch on X

12:00 PM · Jan 22, 2026

209

Read 7 replies

Inworld TTS-1.5 adds 15-language coverage and cloning on top of low latency

TTS-1.5 (Inworld): Building on TTS-1.5 launch (sub-250ms voice pricing/latency), Inworld’s TTS-1.5 is now described with sub-250ms (Max) and sub-130ms (Mini) latency, support for 15 languages, “affordable voice cloning via API,” and “on‑prem enterprise options,” alongside a cost claim of $0.005/min, as summarized in the product spec recap.

No API docs or benchmarks are linked in the tweet, so rollout details (limits, streaming protocol, and pricing granularity) remain unclear from today’s sources.

Wes Roth

@WesRoth

·Follow

Inworld just launched TTS-1.5, the new state-of-the-art text-to-speech model that's fast, expressive, stable and radically cheap. With sub-250ms latency (Max) and sub-130ms (Mini), it's now 4x faster, making voice agents feel instant and human. Quality-wise, it delivers 30% Show more

Inworld AI

@inworld_ai

Inworld TTS-1.5 releases today. The #1 TTS on Artificial Analysis now offers realtime latency under 250ms and optimized expression and stability for user engagement, and costs half a cent per minute. Some voice models are fast, some are expressive, some are affordable. We

11:30 AM · Jan 22, 2026

Praktika reports +24% Day-1 retention from a multi-agent voice tutoring stack

Praktika (Voice tutoring workflow): Praktika is described as treating voice as a coordinated multi-agent system—adapting lessons mid-conversation, pulling context dynamically, and adjusting flow in real time—built on OpenAI models, with a reported 24% lift in Day‑1 retention, per the case study note.

The post is light on implementation specifics (turn-taking, barge-in handling, memory layout), but it reinforces a common engineering pattern: retention gains come from system behavior (timing, corrections, continuity), not just higher-quality TTS.

Adam.GPT

@TheRealAdamG

·Follow

openai.com/index/praktika Praktika treated voice not as a feature, but as a coordinated system of agents that has to hold up in real conversations. They built their AI tutor on OpenAI models using a multi-agent setup that adapts lessons in real time as learners speak, pulling in Show more

1:15 PM · Jan 22, 2026

Read 2 replies

ElevenLabs shows up at Davos amid Europe “tech sovereignty” talk

ElevenLabs (Policy & market signal): ElevenLabs highlights its first Davos appearance as part of the WEF Innovator Community, with its co-founder slated for a panel on “Is Europe’s Tech Sovereignty Feasible?”—a reminder that voice AI vendors are now directly in the conversation about regional dependence and procurement posture, as posted in the Davos announcement.

This is more geopolitical signaling than product detail, but it tends to shape enterprise deal dynamics (on-prem demands, residency, and vendor diversification) over the next few quarters.

ElevenLabs

@elevenlabsio

·Follow

ElevenLabs’ first time at Davos 2026 as part of the WEF Innovator Community. The focus on AI this year highlights that we are living through a defining moment in how humans and technology interact. Today, our Co-Founder @matiii joins a main session with @ChrstnKlein (SAP), Show more

11:31 AM · Jan 22, 2026

Read 2 replies

📚 Community, meetups, and live demos: camps, workshops, and office hours

The social distribution layer for agentic building is strong today: livestreams, workshops, and office hours centered on hands-on building. Excludes Cursor 2.4 (feature).

Vibe Code Camp pulls thousands live, with an agent-heavy guest lineup

Vibe Code Camp (Every): Following up on Vibe camp (all-day agent workflow marathon), the stream hit “almost 7k people watching live” about two hours in, according to the Viewership update; it’s a concrete signal that long-form, hands-on agent ops is becoming a mainstream learning format. The guest schedule also explicitly mixes “how I build” demos with toolmaker appearances (Notion/Anthropic/etc.), as laid out in the Run of show post.

• Distribution mechanics: The hosting view shows multiple concurrent sessions with large join deltas (e.g., “+11.1K”), as captured in the Hosting screenshot, which hints at “many parallel rooms” being part of the format rather than a single stage.

• Where to find it: The live stream link is shared directly in the YouTube livestream, which matters because it makes the content watchable asynchronously for teams that treat these as internal training material.

Dan Shipper 📧

@danshipper

·Follow

only 2 hours in and almost 7k people watching live! we'll be live all day: youtube.com/watch?v=5YBjll…

5:11 PM · Jan 22, 2026

Read 1 reply

Matt Pocock’s Ralph workshop sells out quickly as AFK coding spreads

Ralph / AFK coding (AI Hero): A live, hands-on Ralph workshop (Feb 11, 9AM–1PM PST) was announced with a 40-attendee cap in the Workshop announcement, then quickly flipped to “Sold out!” in the Sold out note. This is a clean demand signal for “run agents unattended” operator patterns rather than one-off prompting.

• What’s being taught: The positioning is explicitly “totally AFK, closing GitHub issues while I work,” as shown in the AFK setup post, which frames Ralph less as a coding assistant and more as a background worker.

• Funnel details: The registration surface is linked from the Workshop page, with the tweet thread showing seats dropping fast (e.g., “10 seats left”) in the Seats remaining update.

Matt Pocock

@mattpocockuk

·Follow

Replying to @mattpocockuk

I'm running a live workshop! 11th Feb 9AM-1PM PST 40 attendees max Learn how to get the most out of Ralph aihero.dev/s/dL50ew

1:57 PM · Jan 22, 2026

Read 9 replies

A weekly SF “AI Vibe Check” meetup series kicks off with livestreams

AI Vibe Check (community meetup): A new weekly SF-area event series was announced as “AI Vibe Check,” with an RSVP + livestream pipeline described in the Series announcement. It’s an explicit attempt to turn demos and operator workflows into a recurring, local distribution layer.

• Cadence + format: The post frames it as “fully checked each Thursday” with an on-site meetup plus livestream, as stated in the Series announcement and reinforced by the Livestream timing note.

• Where it routes: The livestream episode link is posted in the Livestream link post, which makes it easy for teams outside SF to track what patterns and tools are getting demoed first.

Ray Fernando

@RayFernando1337

·Follow

Tonight is our first ever AI Vibe Check. AI Vibes will be fully checked each Thursday in the SF area - keep an eye out & rsvp here for the livestream and future events 👉🏼 rfer.me/aivibecheck

5:16 PM · Jan 22, 2026

Read 6 replies

Braintrust’s Trace event advertises agent observability at scale (Feb 25, SF)

Trace (Braintrust): A one-day event talk at Replit (Feb 25) was announced around agent observability at scale, with speakers named and a clear “come in person” hook in the Event announcement. This is one of the few community posts here that explicitly centers observability as the technical theme.

The event’s destination page is linked in the Trace event page, which positions it as an in-person knowledge exchange rather than a product launch.

Braintrust

@braintrust

·Follow

Trace talk → @replit On Feb 25, @pirroh takes the stage to discuss agent observability at scale with @ankrgyl. If you're in SF, come: braintrust.dev/trace

Watch on X

5:45 PM · Jan 22, 2026

Firecrawl forms a builder program for early integrations and feedback loops

Firestarters Program (Firecrawl): Firecrawl launched a small builder community offering “early access to new features,” a free plan, and direct team access, as described in the Program announcement. This is a community-layer move: it’s explicitly about accelerating integrations and answering implementation questions.

The application entry point is linked in the Program page, and the follow-up post reiterates the call to apply in the Apply reminder.

Firecrawl

@firecrawl

·Follow

Introducing the Firestarters Program 🔥 A small community where top builders create integrations, answer questions, and get early access to new features. You'll get a free plan, swag, and direct access to the team to help shape what we build next.

4:58 PM · Jan 22, 2026

Read 14 replies

SGLang schedules an Office Hour on multi-turn RL rollouts for LLMs/VLMs

SGLang Office Hour (LMSYS/SGLang): An office hour session is scheduled for Jan 27 (7 PM PST) on “Seamless Multi-Turn RL for LLM and VLM,” per the Office hour post. It’s a community teaching surface specifically about training/inference systems plumbing, not app-level prompting.

The same post also ties the talk to production performance work (TTFT/TPOT optimization on H200 clusters) as context, as described in the Office hour post.

LMSYS Org

@lmsysorg

·Follow

📢 Announcing our upcoming Office Hour！ Topic: One Rollout to Rule Them All: Seamless Multi-Turn RL for LLM and VLM 📅 Date: Tuesday, 01/27, 7 PM PST 📌 Location (Discord): discord.gg/NDVj78VS 🎙️ Host: Our Core Contributor, Chenyang Zhao @GenAI_is_real We'll cover: 20min Show more

7:11 PM · Jan 22, 2026

Read 2 replies

vLLM-Omni sets an in-person meetup at AAAI 2026 for its omni serving stack

vLLM-Omni (vLLM project): The team announced an in-person meetup at AAAI 2026 in Singapore (Expo Hall 3, Booth A50; Jan 24, 11:30–12:30) in the AAAI booth post. For engineers, it’s one of the few signals in this feed that focuses on “how to serve” (LLM + vision + diffusion) rather than model releases.

The post frames the content as an overview of “unifying LLM, vision, and diffusion workloads into a single inference stack,” per the AAAI booth post, with a roadmap teaser rather than a single release drop.

A W&B office hangout forms around building self-improving agents

Self-improving agents meetup (W&B / community): A small SF in-person hangout at the Weights & Biases office was floated as a build session for “self-improving agents,” with a stated “couple hundred” attendance expectation in the Office hangout note. It’s a lightweight but specific signal that agent training/feedback-loop builders are clustering in person, not just online.

Kilo Code runs an Anthropic webinar and ties attendance to credits

Kilo Code webinar (Kilo × Anthropic): Kilo Code promoted a live webinar with Anthropic’s Applied AI team and attached a “$1k in credits” giveaway mechanic, as stated in the Webinar giveaway. It’s another example of tooling vendors using live sessions to onboard teams into their agent workflow.

The registration endpoint is provided via the Webinar registration surfaced in the follow-up post Registration post.

🧠 Developer culture shifts: slop backlash, UI/CLI pendulum, and “agents change the job” narratives

Culture discourse is itself the news today: what counts as productivity, how people feel about agent-built output, and where “craft” moves. Excludes Cursor 2.4 (feature).

“Accumulating AI skillset”: experience matters more than people expect

User skill gradient (Model usage): There’s an explicit claim that an “accumulating AI skillset” develops with practice—knowing what models can do, how they fail, and when to trust them—framed as more gradual and predictable than people assume in skillset accumulates.

This is a cultural counterweight to one-shot “model X is magic” discourse: operator experience becomes part of the system.

“MVP in 4 hours, production in 4 days” becomes a common agent-era framing

Shipping reality (Agent-assisted dev): A concise framing is spreading: “time to vibe code an mvp app: 4 hours; time to make it ACTUALLY production ready: 4 days,” as stated in mvp vs prod timeline.

This lands as a cultural correction: agents compress the first draft, but hardening (edge cases, testing, deploy reliability) still dominates calendar time.

Role reframing: “Programming is customer service” for learning PM/arch skills

Skill development (Work in the agent era): The “higher-level skills matter more than syntax” argument gets a concrete prescription: build something for a real person to learn product/architecture/PM skills, not for a hypothetical user, as laid out in build for real customer.

The point is that agentic coding may reduce time spent typing, but it doesn’t remove the need to learn through user adoption failures and iteration.

UI pendulum: “GUIs are back” framing spreads as agents run longer

UI pendulum (Developer tooling): The “CLI is the Stone Age… GUIs are back” quote is getting airtime as a shorthand for how agent supervision is shifting from command entry to managing long-running work and approvals, as captured in GUI back quote.

The subtext is that once agents can run for hours, the bottleneck becomes coordination surfaces (state, review, interruptibility), not the terminal itself.

arXiv “slop” backlash grows as paper volume ramps

Research quality (Publishing): Frustration about low-signal paper output is getting more explicit, with “Level of slop on arxiv is ridiculous” in arXiv slop complaint.

For engineers who treat papers as implementation specs, this raises the cost of separating usable methods from noise—especially when repos and eval artifacts aren’t shipped alongside the claims.

Atlassian CEO: typing speed is a bad proxy for developer efficiency

Productivity measurement (Management): A clip arguing “How quickly you write code is a poor way to measure developer efficiency” is circulating via efficiency metric clip.

In an agent-heavy workflow, this frames the cultural shift: measurement moves toward outcomes and iteration speed, not keystrokes.

Citation hygiene is deteriorating (wrong refs show up in papers)

Research hygiene (Citations): One concrete example claims “9 wrong citations in a single page” in wrong citations post, with follow-on notes describing how advisors now explicitly gate citation formatting and canonical versions as a routine check in citation checklist.

This matters because LLM-assisted writing can propagate plausible-but-wrong bib entries, which then contaminates downstream literature review and benchmarking summaries.

Most users never change default model; “two clicks” can raise outcomes

Model choice behavior (Product UX): Watching real users suggests “essentially zero percent of people change the default model,” and that “clicking twice” can materially improve results, as stated in default model behavior.

This turns model selection from a power-user feature into a mainstream UX concern: defaults quietly define perceived capability.

“10× engineers” discourse returns, now with “AI has created 100×” claims

Talent narratives (Dev culture): The old “10× engineer” argument is resurfacing with an updated twist—claims that AI amplifies output by an order of magnitude beyond that, per the re-shared quote in 10x engineer retweet.

The practical implication is cultural: hiring and performance conversations are being reframed around leverage and orchestration, not raw output volume.

LinkedIn “slop fest” complaints tie into DevRel role shifts

Slop backlash (Social platforms): “LinkedIn is becoming a slop fest” is being used as a proxy complaint about low-effort LLM content flooding professional feeds in LinkedIn slop.

The same thread frames DevRel as especially exposed because a lot of “connector content” is now “a prompt away,” raising the baseline for what counts as useful, per DevRel shift and connector content observation.

🤖 Embodied/world-model progress: 4D perception, VLA+ learning, and real-world autonomy signals

Embodied AI today clustered around perception-to-action and world modeling, with multiple lab updates on 4D/robotics capabilities. Excludes Cursor 2.4 (feature).

DeepMind’s D4RT turns video into 4D scene representations 18×–300× faster

D4RT (Google DeepMind): DeepMind introduced D4RT, a unified model that encodes video into a compressed representation and supports multiple 4D reconstruction queries (space + time) via a lightweight decoder, with claimed 18×–300× speedups and “~1-minute video in ~5 seconds on a single TPU,” as described in the D4RT launch thread and expanded in the performance claim thread.

• What this unlocks: D4RT is positioned as one model for several 4D tasks—predicting per-pixel 3D trajectories and producing “freeze-time” 3D structure—using one representation rather than fragmented pipelines, as outlined in the trajectory and freeze-time post.
• Why it matters for embodied stacks: The pitch is a faster, more scalable motion+geometry substrate for robotics/AR/world-modeling workloads, with the main framing and examples collected in the DeepMind blog post.

Microsoft’s Rho-alpha “VLA+” adds tactile sensing and post-deploy online learning

Rho-alpha (Microsoft Research): Microsoft Research’s Rho-alpha (ρα) is being framed as a VLA+ model—extending vision-language-action by adding tactile sensing and online learning from human corrections after deployment, as summarized in the VLA plus overview.

• Capability surface: The description claims control of dual-arm setups for tasks like BusyBox manipulation, plug insertion, and bimanual packing/arrangement, as listed in the VLA plus overview.
• Why the “plus” matters: The distinguishing bet is adaptability after shipping (teleop corrections → immediate improvement) rather than treating policies as static artifacts, per the VLA plus overview.

Tesla begins unsupervised Robotaxi rides in Austin (no in-car safety monitors)

Robotaxi (Tesla): A report circulating in the tweets says Tesla has started unsupervised Robotaxi rides in Austin, explicitly described as no safety driver/operator in the car, per the launch claim.

This is a concrete autonomy deployment signal (regardless of scale); the tweets don’t include operational details like geofence size, fleet count, disengagement policy, or incident rates.

Physical Intelligence “Robot Olympics” follow-up argues tasks mislead about capability

Robot Olympics evaluation (Physical Intelligence): A response thread highlights why “Olympics”-style robot task showcases can be misleading about capability, and discusses what makes tasks hard under today’s learning methods, per the follow-up discussion.

• Benchmark interpretation: The core point is about aligning task design with what’s actually difficult for current systems (and what’s merely brittle), with the original PI context linked in the PI Olympics post.

This is less about any single model result and more about how teams should read—and build—embodied benchmarks when systems are still patchy across environments and reset conditions.

Motion 3-to-4 proposes 3D motion reconstruction for downstream 4D synthesis

Motion 3-to-4 (research): A new method titled “3D Motion Reconstruction for 4D Synthesis” is shared as “Motion 3-to-4,” positioned around reconstructing 3D motion to enable downstream 4D generation/synthesis tasks, per the paper demo post.

The tweet is light on specs/benchmarks, but the framing matches the current push to turn video into manipulable intermediate representations (motion + geometry) rather than only producing pixels.

🎥 Generative media & creative pipelines: image models, audio→video, and control knobs

Generative media remained active today (image/video/audio tooling), but it’s not the central engineer story versus coding agents. Excludes Cursor 2.4 (feature).

ComfyUI adds Vidu Q2 with multi-reference subject control and faster generation

Vidu Q2 (ComfyUI): ComfyUI says Vidu Q2 is now available with emphasis on character consistency, roughly “~3× faster generation,” and workflows that can use “up to 7 reference subjects,” according to the ComfyUI release post.

• Control surface: “Up to 7 reference subjects” suggests the intended workflow is multi-entity conditioning in a single graph (characters/props/outfits), as stated in the ComfyUI release post.
• Throughput signal: the “~3× faster” claim is directional (no benchmark artifact in the tweet), but it’s a notable knob for teams doing iterative storyboard passes or multi-variant renders, per the ComfyUI release post.

Gemini app leak suggests a music generation tool is being wired into “My Stuff”

Gemini app (Google): A Gemini Android build appears to include a MUSIC_GENERATION_AS_TOOL capability flag plus a TYPE_MY_STUFF_GENERATED_MUSIC content type, implying music outputs could be stored alongside generated images/videos/audio in the “My Stuff” area, as shown in the App strings leak.

The tweets don’t show a public UI or rollout date; what’s concrete here is the internal wiring (tool enum + storage taxonomy) visible in the App strings leak, which usually precedes feature gating/experiments.

LTX Audio-to-Video: creators converge on song-splitting and storyboard grids

LTX Audio-to-Video (LTX Studio): Creators are documenting a repeatable workflow for LTX’s audio-conditioned video generation—pairing prompts+images with segmented audio tracks to drive scene structure—shown in the Workflow walkthrough and extended with “split the song into short tracks” guidance in the Step-by-step setup.

• Pipeline shape: the approach described in the Step-by-step setup is to break a song into shorter stems/clips, upload each with an image, then optionally add prompts per segment.
• Output style: LTX’s most visible “win” in these examples is rhythm/beat alignment and scene coherence tied to the audio track, including instrument/visual sync shown in the Instrument sync clip.

Gemini’s Nano Banana Pro gets a Prompt Off contest and a street-fashion prompt gallery

Nano Banana Pro (Gemini): Google is leaning into community-driven prompt discovery for Nano Banana Pro with a “Prompt Off” image competition in the Gemini Discord, as described in the Prompt Off invite, while also curating “street fashion portraits” as a de facto reference style guide in the Street portrait roundup. This is mostly a signal about which looks are currently stable and repeatable in public access, rather than a new model capability.

• What’s new for builders: the Prompt Off creates a shared prompt+output corpus voted by peers, which tends to converge on reusable prompt patterns (lighting, styling, camera framing) faster than ad hoc tweeting, per the Prompt Off invite.
• What it implies: Gemini’s own highlight reel of outputs becomes an unofficial “known good” distribution for what Nano Banana Pro is expected to handle without post-processing, as shown in the Street portrait roundup.

fal runs a Wan video contest with Alibaba Cloud ahead of the 2026 Winter Olympics

Wan video generation (fal): fal is running a fan-creation contest with Alibaba Cloud where submissions must be 5–15s videos with Wan as the primary model, with a Jan 26 deadline and prizes tied to Milano Cortina 2026 tickets, as described in the Contest announcement.

The operational detail that matters here is the constraint envelope (short clips, landscape 16:9, specific sports prompts), which effectively defines a small “benchmark slice” for how Wan behaves under public-facing creative constraints, per the Contest announcement.

OpenAI PostgreSQL scales to 800M users – nearly 50 read replicas

# 139 · Fri, Jan 23, 2026

Anthropic Claude Constitution released CC0 at ~35k tokens – training behavior spec

# 137 · Wed, Jan 21, 2026

Qwen3‑TTS open-sources 0.6B and 1.8B models – 97ms latency claim

Executive Summary

Top links today

Cursor 2.4: subagents + image generation (parallel execution in-editor)

Table of Contents

🧩 Cursor 2.4: subagents + image generation (parallel execution in-editor)

Cursor 2.4 adds parallel subagents for faster task completion

Cursor 2.4 adds in-editor image generation via Nano Banana Pro

Cursor 2.4 supports custom subagents invoked via /subagent-name

Cursor 2.4: agents can ask clarifying questions without pausing work

Cursor 2.4’s Explore agent writes fast research output to files

Pattern: fast daily driver model plus slower verifier subagent in Cursor

Pattern: spawn multiple browser/research subagents for QA and investigation

Why Cursor shipped subagents now: model gains + faster subagent model

Cursor publishes 2.4 changelog with subagents + image generation details

🧠 Claude Code & Cowork: task graphs, desktop Plan Mode, and stability fixes

Claude Code CLI 2.1.16 adds task management with dependency tracking

Claude Code Desktop adds Plan mode so Claude outlines before editing

Claude Code 2.1.16 expands plan-to-execution controls for teammate spawning

Claude Code 2.1.16 improves VS Code plugin and session workflows

Claude Code Desktop adds approval notifications for background runs

Claude Code reliability complaints persist: CPU spikes, MCP drops, odd read behavior

Cowork upgrades Todos into Tasks for longer projects

Claude Code CLI 2.1.17 fixes non-AVX CPU crashes

Cowork demo turns a receipts folder into a categorized monthly spreadsheet

Community push: read Claude Code best practices directly, not summaries

🧰 OpenAI Codex surface area expands: JetBrains IDEs + subscription-based tool access

Codex lands inside JetBrains IDEs for ChatGPT-plan users

Cline adds OpenAI sign-in to use your ChatGPT/Codex subscription (no API key)

OpenAI describes how to evaluate agent skills systematically with Evals

Cline ships Jupyter-native commands for notebook cell generation and refactors

GPT-5.2 Instant default personality updated to be more conversational

Codex team asks what to ship next before month-end

GPT-5.2 gets shared as a language-learning tool (early applied usage)

🧱 AI app builders & design-to-code: v0, Lovable, and Figma→prototype flows

MagicPath launches Figma Connect for copy-paste Figma→interactive prototypes

Lovable walkthrough shows a full competitor-analysis app built in ~25 minutes

v0 UI hints point to Build mode, voice dictation, and PR management

Vercel reopens the v0 waitlist ahead of its next launch

“Design to code is solved” gets thrown around again, now tied to Figma Connect

Atoms pitches “idea → business loop” as the new builder workflow

Sekai launches an X bot that generates runnable mini-apps from tagged posts

✅ PR comprehension & verification: Devin Review, browser-based QA, and LLM-judge discipline

Devin Review becomes a URL-swappable surface for AI-era PR comprehension

MorphLLM launches Glance and BrowserBot to verify PRs by running the UI

LLM-as-judge still needs human-label validation to be trustworthy

RepoPrompt 1.6.1 ships deeper review ergonomics for agent PRs

“Bash is all you need” gets reframed as an eval-design question

Ghostty tightens contribution rules for AI-assisted PRs

PR template checkboxes don’t reliably signal AI-generated code

🧭 Workflow patterns that actually ship: tracer bullets, context discipline, and feedback loops

Sandbox-first agent doctrine: persistent state, low-level interfaces, benchmarks early

Tracer bullet prompting: force the smallest end-to-end slice to reduce agent slop

Agent speed compression: MVP in hours, production hardening still dominates

Bottleneck shift: AI makes code cheap, customer adoption becomes the limiter

Default-model inertia: most users never switch models, “two clicks” changes outcomes

“Accumulating AI skillset”: users learn model limits and failure modes over time

Developer efficiency isn’t typing speed: measurement shift in the agent era

Preview agent-made web changes live via GitHub Pages while the agent is still working

🔗 MCP & web-agent interoperability: embedded apps, browser agents, and tool plumbing

CopilotKit ships MCP Apps ↔ AG-UI bridge for returning mini-apps in chat

Browser Use expands access (500 users) as it positions its web-agent CLI

OSS Coding Agent template adds Browser Mode powered by agent-browser

Hyperbrowser open-sources HyperAgent to augment Playwright with AI

OpenRouter docs add one-click “copy as Markdown” and “open in Claude/ChatGPT/Cursor”

🔌 Skills & installables: Railway deploy, agent-browse, and “skills as artifacts you can eval”

OpenAI publishes a skills→evals playbook for systematic iteration

Browserbase agent-browse skill lets Claude Code browse and test web apps

Railway skill for Claude Code adds deploy, logs, env vars, health checks

Kilo’s skill scoping pattern: repo-shared standards vs user-local prefs

SuperDesignDev skill adds “design OS” workflows for coding agents

Hyperbrowser adds /docs fetch to pull live docs into Claude Code (cached)

SkillsBento’s X/Twitter Stats Analyzer skill turns CSV exports into insights

🧬 Agent builders & platforms: LangChain templates, Deep Agents memory, and white-box RAG tooling

Deep Agents adds /remember: persistent memory stored in AGENTS.md + skills/

UltraRAG 3.0 turns RAG into a debuggable “white box” with a WYSIWYG builder

Gemini Interactions API cookbook: one endpoint to multi-turn + tools + Deep Research

StackAI + Weaviate push “production RAG” framing: permissions, audit trails, milliseconds

🕹️ Running agent fleets: task DAGs, command allowlists, and long-running automation

Clawdbot adds command allow-lists and interactive approval dialogs