Where people deep in AI come to stay current.
Category
Tags
Sakana AI launched Fugu and Fugu Ultra as OpenAI-compatible orchestration models that route, verify, and synthesize across multiple models. The release matters because Sakana is selling multi-agent coordination as a single endpoint, but it has not fully disclosed model mix or pass-through costs.
A new Human-on-the-Bridge paper argued for front-loading expert judgment into reusable evaluation assets, while practitioners also shared double-run and multi-model review setups. The cluster matters because teams tuning agent harnesses need repeatable ways to measure behavior beyond one-off benchmark scores or subjective PR review.
Morph said its code-serving stack now exposes Qwen, GLM-5.2, MiniMax M3, and DeepSeek v4 with code-tuned speculative decoding. It claims 20-35% higher acceptance than Eagle 3.1 or DFlash, plus kernels for cheaper hardware.
Simon Willison released the first sqlite-utils 4.0 release candidate with a built-in migrations system and nested transactions. The RC adds minor backward incompatibilities while expanding SQLite workflow automation for scripts and apps.
Hermes Agent can now self-host Mem0, and the desktop client can attach to headless Hermes instances or start one with the hermes desktop command. The change expands always-on memory and remote control setups outside a laptop session.
jakubkrehel's make-interfaces-feel-better skill passes 30k installs
The skill pack bundles UI, animation, and perf tips into a reusable prompt stack. At 30k installs, it looks like a real community playbook for front-end taste.
Pew says ChatGPT use hit 44% as chatbot adoption goes mainstream
Pew's survey shows chatbot use has gone mainstream, with ChatGPT at 44% and AI search summaries already shaping how people consume information.
dolphin-summarize starts extracting GLM-5.2 architecture from GGUF too
A model-introspection helper that started on safetensors is now guessing architecture details from GGUF too, which broadens its usefulness for model archaeology.
OpenAI appears to be preparing ChatGPT Library for Codex
OpenAI appears to be preparing ChatGPT Library for Codex. If it lands, saved context and code work will be much easier to connect.
PAI turns personal health data and notes into a Claude Code life dashboard
PAI blends Oura, DNA, labs, and notes into a Claude Code life dashboard. The pitch is that AI can make personal data useful without manual assembly.
BrennerBot turns scientific method into a reusable agent workflow
BrennerBot turns the scientific method into a reusable agent workflow, with digital twins, operators, and traceable experiments instead of one-off chats.
Dax Raad says OpenCode won by optimizing terminal feel, not avoiding framework work
Dax Raad says OpenCode won by optimizing terminal feel, not by avoiding framework work. The lesson is that quality often comes from irrational choices.
Anthropic partner slug suggests Claude Sonnet 5 is close
A partner-side slug suggests Claude Sonnet 5 may be close. It's still rumor territory, but the signal is strong enough that people are watching next week closely.
Recovery Lab simulates disconnects, evictions, and mid-tool-call failures
Recovery Lab simulates disconnects, isolate eviction, mid-tool-call failure, and deploy rollovers so you can test persistence instead of hoping for it.
NickAdobos says let models ask a user question without blocking the loop
A small but useful loop trick: let the model ask a user question even while it keeps working on non-blocking parts. That avoids idle time in agent flows.
ClickUp Brain can now propose dedicated agents for recurring bugs
ClickUp Brain can now propose dedicated agents for recurring bugs, with triggers, rules, and scope baked in. That's a shift from helper to workflow owner.
Hugging Face gets LocalLaws, a 2.2M-law U.S. legal corpus
LocalLaws gives Hugging Face a 2.2M-law U.S. legal corpus built from OCR and automation. That's a big new substrate for legal RAG and analysis.
Cursor's /automate skill turns one task into a trigger-based workflow
Cursor's /automate skill turns a task into a trigger-based workflow, so repetitive code chores can start running without more hand-holding.
LiteParse parses a SpaceX PDF faster than a screen-record zoom
LiteParse can chew through a SpaceX PDF fast enough to beat a screen-record zoom. That makes it a strong first-pass parser before heavier VLMs.
Samsung says ChatGPT and Codex are rolling out to employees at scale
Samsung says it is rolling ChatGPT and Codex out to employees at scale. That is a strong signal that code assistants are becoming standard enterprise tooling.
Meta AI adds an Artifacts tab for chats, docs, web pages, and slides
Artifacts looks like Meta's attempt to store documents and creations in one workspace. That could make chat outputs easier to reuse instead of losing them in history.
Cursor says Composer got 10 to 20x more compute to train a GPT-sized model
Cursor says Composer got 10 to 20x more compute, enough to train a GPT-sized model from scratch. That's a serious bet on in-house model capability.
Techhalla shares a Grok Imagine prompt recipe for consistent music-video cuts
The prompt uses multiple reference images and shot-by-shot direction to keep Grok Imagine outputs coherent. It's a practical control pattern for video generation.
Aaron Levie says headless agent traffic needs guardrails, logs, and source truth
Levie's point is that headless agent traffic will dwarf human traffic, so logging, source-of-truth data, and access controls become the platform layer.
LangChain shows how to build a Claude Code-style agent with Deep Agents
The article is a concrete recipe for building a Claude Code-like agent with Deep Agents. Useful if you want the harness pattern without copying a whole product.
Yacine MTB says reverse engineering is the most verifiable task for models
The argument is that reverse engineering is unusually verifiable, because the binary and the exploit path either hold up or they don't. Good model fit for code archaeology.
Avichawla's BM25 explainer shows why lexical search still matters in RAG
The guide is a good reminder that embeddings miss exact terms, so BM25 still anchors RAG systems when precision matters.
Aravind Srinivas says the model is no longer the product, the harness is
The quote frames models as a substrate and the harness as the actual product, which matches how coding-agent differentiation is happening in practice.
rauchg's team tunes a page by optimizing layout, WebGPU shaders, and scripts
The team says it optimized layout, shaders, and blocking scripts together, then plans to publish the lessons. That's a useful reminder that perf wins are often cross-stack.
Codex loops keep a JetBlue refund chase running until the case closes
Codex can now keep a refund chase alive as a long-running goal, which is useful for any slow external process that needs repeated follow-up.
ctatedev's agent loop toolkit bundles browser checks, worktrees, and API emulators
The thread is a practical shopping list for agent loops: browser verification, worktree isolation, API mocking, and CLI media generation.
Steipete maps the iPhone side button to Ask ChatGPT About My Screen
The iPhone shortcut turns screen-aware help into a one-tap action, which is exactly the kind of friction cut multimodal assistants need.
Emollick says coding agents are too software-brained for knowledge work
He argues coding agents work because code is the source of truth, while research and strategy need process artifacts the current harnesses lose after compaction.
Theo's Codex loop uses Opus as a second opinion after API design
Using Opus as a second opinion after API design improves the first Codex pass. It's a simple loop that catches design mistakes before they ossify into code.
ClawRouter adds a Cloudflare Rust WASM gateway for multi-provider model access
DAIR AI rounds up the open-source tools in the agent RL stack
A repeating loss spike every 500 steps points to a training pathology
Ukraine drone-footage corpus opens 500K hours for multimodal training
Epoch's chart says open-weight models trail the frontier by about four months
Together AI and 5C deploy GB300 NVL72 racks for next-gen inference
pratyushmaini says a 30B MoE can train on 10T tokens for about $500K
Report says Anthropic embedded engineers inside NSA for Mythos cyber ops
Economist excerpt says Mythos breached classified systems in hours
Scott Stevenson says every company needs offensive agents running nonstop
Knowledge comics (知识漫画): educational, biography, tutorial.
Generate images, video, and audio with ComfyUI — install, launch, manage nodes/models, run workflows with parameter injection. Uses the official comfy-cli for lifecycle and direct REST/WebSocket API for execution.
Create HTML-based video compositions, animated title cards, social overlays, captioned talking-head videos, audio-reactive visuals, and shader transitions using HyperFrames. HTML is the source of truth for video. Use when the user wants a rendered MP4/WebM from an HTML composition, wants to animate text/logos/charts over media, needs captions synced to audio, wants TTS narration, or wants to convert a website into a video.
Independent results put GLM-5.2 at the top of the open-model DeepSWE board and near the top on debate and post-train evals. Watch token use and long reasoning traces, which can offset its headline price advantage.
BrowserCode, Hyper, OpenCode, Together, and other vendors added GLM-5.2 soon after release. That turns the open model into a deployable option across coding, browser automation, and hosted chat.
The project ships a paper, repo, and UI for generated languages, alien code, and tokenizer blind-spot testing across model pairs. Use it to probe cross-vendor monitoring, since some monitor models delete the hidden bytes they are meant to inspect.
Hermes now offers a setup path that starts with only a provider, model, file operations, and terminal access. The smaller base gives users a minimal install they can extend manually.
Ollama said it doubled GPU capacity for GLM-5.2 cloud usage and said the model is currently hosted only in the US. The rollout adds capacity as open-model demand climbs, so users should check hosting and privacy details before deploying.
Wafer said its GLM-5.2 deployment leads Artificial Analysis on throughput and latency, and priced usage at $1.20 input and $4.10 output per million tokens. Compare serverless and dedicated endpoints if you need speed at scale.
Independent tests put GLM-5.2 near Opus 4.8 and GPT-5.5 on planning and coding, and users shared Claude Code, BrowserCode, dcode, and local-serving recipes. It matters because many engineers are treating it as a daily-driver option for text-heavy coding, though teams still report weaker vision and provider limits.
ComputeSDK published results from its 2026 100k Scale Invitational after weeks of reruns and infra tuning across Modal, Tensorlake, Northflank, Declaw AI, E2B, and Isorun. It matters because sandbox and agent infra claims now have a shared public concurrency target instead of vendor-specific load demos.
lift-pdf released an open-source 9B model for schema-constrained document extraction, with code, pip install, playground access, and a 90.2% score on the team's 225-document bench. It matters because the model claims near-Gemini 3.5 Flash accuracy at 9.5s p50, though coverage is still skewed toward Latin-language docs and commercial-use limits remain.
Builders released reusable loop artifacts this week, including a Loop Library Skill, repo templates, and published control-loop definitions for docs sweeps, onboarding checks, and error triage. It matters because teams are turning one-shot prompting into persistent agent runs with explicit stop conditions and shared repo state.