Qwen3โ€‘VL 4B/8B launch FP8 builds โ€“ approach 72B performance

Stay in the loop

Free daily newsletter & Telegram daily report

Join Telegram Channel

Executive Summary

Alibaba just shrunk its flagshipโ€‘style VLM into laptopโ€‘friendly sizes: Qwen3โ€‘VL arrives in 4B and 8B โ€œInstructโ€ and โ€œThinkingโ€ variants with FP8 builds and lowโ€‘VRAM footprints. Why this matters: the team claims these compact models beat Gemini 2.5 Flash Lite and GPTโ€‘5 Nano across STEM, VQA, OCR, video, and agent tasks, and often approach Qwen2.5โ€‘VLโ€‘72B qualityโ€”bringing nearโ€‘flagship vision to everyday hardware without a dataโ€‘center bill.

The rollout nails developer ergonomics. Dayโ€‘0 support in MLXโ€‘VLM and LM Studio means Apple silicon users can run dense and MoE locally on NPU/GPU/CPU, while vLLM adds a productionโ€‘grade serving path with strong JSONโ€‘structured outputs and high throughput, according to early adopters. The 8B variants are already battling on LMArena in both Text and Vision modes, a timely public read after Qwenโ€™s strong visual standings last week. Cookbooks ship with taskโ€‘level recipes for OCR, grounding, multiโ€‘image/video, and agent loops, and TRL notebooks show SFT/GRPO fineโ€‘tuning on the 4B model in a free Colabโ€”plus Hugging Face Spaces and a lightweight inโ€‘browser app to poke the models fast. Kaggle entries round out quick benchmarking.

If you need a bigger yardstick, Ollama Cloud now offers Qwen3โ€‘VLโ€‘235B for free, making it easy to compare footprint vs capability before standardizing a tier.

Feature Spotlight

Feature: Qwen3โ€‘VL goes compact (4B/8B) with nearโ€‘flagship VLM

Qwen3โ€‘VL 4B/8B deliver nearโ€‘flagship multimodal ability at a fraction of VRAM, with FP8, cookbooks, and dayโ€‘0 ecosystem supportโ€”pushing serious VLMs onto laptops/phones and enabling agentic vision at edge costs.

Crossโ€‘account launch dominates today: Alibabaโ€™s Qwen3โ€‘VL 4B/8B (Instruct/Thinking) lands with low VRAM, FP8, cookbooks, MLX/LM Studio support, and leaderboard exposureโ€”mostly model release + developer enablement coverage.

Jump to Feature: Qwen3โ€‘VL goes compact (4B/8B) with nearโ€‘flagship VLM topics

Table of Contents

๐Ÿงฉ Feature: Qwen3โ€‘VL goes compact (4B/8B) with nearโ€‘flagship VLM

Crossโ€‘account launch dominates today: Alibabaโ€™s Qwen3โ€‘VL 4B/8B (Instruct/Thinking) lands with low VRAM, FP8, cookbooks, MLX/LM Studio support, and leaderboard exposureโ€”mostly model release + developer enablement coverage.

Qwen3โ€‘VL 4B/8B launch: compact, FP8-ready models rival larger VLMs

Alibaba released Qwen3โ€‘VL in 4B and 8B "Instruct" and "Thinking" variants that run in lower VRAM while retaining full capabilities, with FP8 builds for efficient deployment. The team claims the compact models beat Gemini 2.5 Flash Lite and GPTโ€‘5 Nano across STEM, VQA, OCR, video and agent tasks, and often approach Qwen2.5โ€‘VLโ€‘72B performance release thread.

Hugging Face and ModelScope collections, API docs, and cookbooks are live for immediate use Hugging Face collection, ModelScope collection, Thinking API docs, Instruct API docs, Cookbooks. A Kaggle entry rounds out access for benchmarking and demos Kaggle models.

Cookbooks, TRL notebooks and Spaces shorten timeโ€‘toโ€‘value for Qwen3โ€‘VL

Developer cookbooks cover OCR, object grounding, multiโ€‘image/video, and agent workflows, with the GitHub README pointing straight to task templates cookbook note, GitHub readme. Community notebooks show dayโ€‘0 fineโ€‘tuning (SFT/GRPO) on the 4B model in free Colab, plus a Hugging Face Space to compare against moondream3 with object detection hf fineโ€‘tune, Colab notebook, HF Space demo. A lightweight Hugging Face app for the 4B Instruct model offers a quick way to poke the compact VLM inโ€‘browser hf app.

Dayโ€‘0 Apple silicon path: MLXโ€‘VLM and LM Studio run Qwen3โ€‘VL locally

Qwen3โ€‘VL (Dense & MoE) is now supported in MLXโ€‘VLM, with maintainers noting optional devices (MLXโ€‘CUDA, CPU) and onโ€‘device readiness; community reports LM Studio + MLX runs on Apple silicon as well mlx release, mac usage. Alibaba amplified the "dayโ€‘0 on NPU, GPU, CPU" positioning, signaling a practical local dev loop for Mac users npu support.

vLLM adds Qwen3โ€‘VL, signaling productionโ€‘grade serving path

vLLM highlighted Qwen3โ€‘VL as one of its most popular multimodal models, making the compact VLMs easy to serve at scale in existing Python inference stacks vLLM note. Early adopters also point out strong JSONโ€‘structured outputs and high throughput, useful for APIโ€‘style multimodal pipelines json claim.

Qwen3โ€‘VLโ€‘235B free on Ollama Cloud complements compact family

Ollama made the 235B Qwen3โ€‘VL available free on its cloud with readyโ€‘toโ€‘copy examples (OCR on menus, math with image context), and says smaller models and full onโ€‘device runs are coming soon cloud launch, Ollama blog. This provides a largeโ€‘model counterpart to Alibabaโ€™s new compact 4B/8B releases for teams testing capability vs. footprint tradeโ€‘offs.

Qwen3โ€‘VLโ€‘8B joins LMArenaโ€™s Text & Vision battles

The 8B Thinking and Instruct variants are now live in LMArenaโ€™s Text and Vision modes for headโ€‘toโ€‘head prompting and community voting arena entry, arena listing. This comes after Qwen3โ€‘VLโ€™s strong standing on visual tracks, providing a lighter option for sideโ€‘byโ€‘side comparisons Arena standing.


๐Ÿ†• Search models: GPTโ€‘5 web stack lands in the API

Fresh model release focused on web search in dev workflows; excludes Qwen3โ€‘VL compact launch which is covered as the feature.

OpenAI ships gptโ€‘5โ€‘searchโ€‘api: 60% cheaper web search in Chat Completions with domain filtering

OpenAI added a dedicated web search model (gptโ€‘5โ€‘searchโ€‘api) to Chat Completions, priced at $10 per 1,000 callsโ€”about 60% lower than beforeโ€”and it supports domain filtering like the Responses tool pricing update. The updated docs outline three modesโ€”nonโ€‘reasoning lookups, agentic search with reasoning, and deep researchโ€”with copyโ€‘paste snippets for JS, Python, Bash and C# OpenAI docs.

Engineers are already seeing both the undated and dated variants (โ€œgptโ€‘5โ€‘searchโ€‘apiโ€‘2025โ€‘10โ€‘14โ€) in the platform model selector, confirming availability beyond docs model list screenshot. Thirdโ€‘party trackers also flagged the new IDs surfacing across the ecosystem, reinforcing the rollout signal model finder alert.

  • Endpoint parity means domain allowlists/blocks now work the same in Chat Completions as in Responses, simplifying agent plumbing for grounded search.
  • The 60% price drop encourages broader use of agentic and deepโ€‘research traces; teams should still watch token budgets and throttle strategies in long sessions.

๐Ÿ› ๏ธ Agentic coding: Codex, subโ€‘agents, and endโ€‘toโ€‘end delegation

Heavy practitioner activity today: Codex CLI education, planโ€‘mode previews, Claude Code subโ€‘agents, Cursor longโ€‘run build, and Factory 1.8 integrations.

Claude Code subโ€‘agents emerge as a best practice for deep repository work

Practitioners report big quality and speed wins by orchestrating Claude Code subโ€‘agentsโ€”repo mappers, analyzers, API readers, writersโ€”in parallel, then merging their outputs, outperforming singleโ€‘loop agents on codebase mapping and authoring subagent demo. The broader โ€œDeep Agentsโ€ pattern emphasizes structured planning, orchestrator + specialist subโ€‘agents, agentic search, hybrid memory, and verification to tackle multiโ€‘hour tasks design blueprint.

Takeaway: split concerns, keep each agentโ€™s context clean, and compose resultsโ€”especially for docs generation, migrations, and new feature scaffolding.

Factory 1.8: delegate Linear tickets, spin headless โ€œdroid execโ€, and close incidents via Sentryร—Linear

Factoryโ€™s 1.8 release plugs Droids into more of the dev stack: assign Linear issues for endโ€‘toโ€‘end completion release thread, mention @Factory in Slack threads to spin a headless session and track progress Slack handโ€‘off, run nonโ€‘interactive batch jobs with droid exec for large refactors/tests headless mode, and loop incidents through Sentry with fixes handed back into Linear incident flow. Release notes outline autonomy controls and output formats for CI/CD use release notes. A justโ€‘inโ€‘time permissions prompt in terminal tightens safety for file edits and reads permission UI.

Net: Factory is evolving from a chatโ€‘agent to a delegatable dev assistant embedded across planning, code, and onโ€‘call.

OpenAI kicks off Codex video series and CLI howโ€‘to for GPTโ€‘5โ€‘Codex

OpenAI Developers launched a multiโ€‘part video series to help teams get productive with the Codex CLI and GPTโ€‘5โ€‘Codex, including install and usage patterns (npm i -g @openai/codex) series intro, with a mirrored post reiterating the CLI workflow install recap. For broader context on how Codex is used across real products and events, see OpenAIโ€™s behindโ€‘theโ€‘scenes writeโ€‘up on using Codex to run DevDay demos and tooling DevDay blog, and the central docs hub for deeper guides and examples Codex hub.

Anthropicโ€™s code_execution tool lands in the AI SDK with typed I/O for bash and text edits

A new Anthropic code_execution_20250825 tool with bash execution and text editing is being wired into the AI SDK, with fully typed inputs/outputs to simplify UI surface creation and safer agent loops feature brief. A companion example shows the typed schema intended to streamline frontends and logging typed I/O.

Implication: agent builders can give Claude more reliable, auditable action surfaces for repo ops, scripts, and patchingโ€”without bespoke adapters.

Codex โ€œPlan Modeโ€ preview surfaces with iterative planning and planner model toggle

A Plan Mode PR shows readโ€‘only, iterative planning (Shift+Tab), a configurable planner model (/plan-model), and checklisted plan updatesโ€”hinting at a firstโ€‘class planning pass before code edits PR details. Although that PR was later closed without merge, maintainers signal Plan Mode is still coming PR status note. This lands as teams double down on planningโ€‘first loops, following up on plan mode adoption in Cursor that cut backโ€‘andโ€‘forth and spawned background jobs.

Expectations: a dedicated planner improves longโ€‘horizon tasks (refactors, migrations) and lets orgs standardize plan quality independently of the coding model.

Codex CLI v0.45 trims tokens by ~10% at equal quality; new usage videos share workflows

OpenAI engineers said Codex CLI v0.45 made interactions ~10% more tokenโ€‘efficient at the same result quality efficiency note. A senior IC also shared a personal walkthrough on dayโ€‘toโ€‘day usage workflow video, and the DevDay writeโ€‘up details broader Codex applications across stage demos and app development DevDay blog. Teams shipping with Codex can expect lower cost per loop and clearer patterns for multiโ€‘tab, multiโ€‘task workflows.

Cursor agent runs 24 hours straight to ship a working project management app

A developer ran Cursorโ€™s agent in a continuous loop with Sonnet 4.5 (CLI), minimal seed context, and autonomous progress tracking until โ€œcompleted,โ€ yielding a functional PM app and surprisingly navigable code structure overnight run, with notes on minimal upfront scope and what improved with a bit more initial context run notes. Screens show the resulting UI and repo layout app snapshot.

Why it matters: endโ€‘toโ€‘end feasibility for greenfield apps is crossing from demo to practice; the next frontier is hardeningโ€”tests, security, and production CI/CD.

Vercel ships a Slack Agent Template and Bolt library to build agents inside Slack

Vercel demonstrated v0 running directly in Slack at Dreamforce and released a Slack Bolt library plus a starter template, letting teams query data, build dashboards, and ship agent actions without leaving chat stage demo, with a template to scaffold Slack agents quickly agent starter.

This reduces โ€œglue codeโ€ to bring LLM agents where work already happensโ€”channels, threads, and DMsโ€”with a path to production deploys on Vercel.
โ€ข Resources: Slack Bolt library, and Agent template.

Braintrust adds remote evals so you can benchmark local agents without moving workloads

Braintrust showed how to hook locally running agents to remote evaluations, so teams can iterate on tasks, datasets, and scorers without redeploying infrastructure howโ€‘to guide. This is useful when you want reproducible evals (goldens, regressions) while keeping heavy data and tooling on your box or VPC.

Vercel AI SDK adds Anthropic memory tool integration for agent state management

Vercel noted it is shipping support for Anthropicโ€™s memory tool in the AI SDK, making it easier to persist, inspect, and version agent memories in product UIs memory support. For coding agents, this reduces prompt bloat and gives users recovery points when long sessions meander or break.


๐Ÿš€ Serving speed: GB200 tokens/sec and adaptive speculation

Runtime perf stories concentrate on Blackwell throughput and specโ€‘decoding; mostly systems knobs rather than silicon announcements.

SGLang hits ~26k input / 13k output tok/s per GPU on GB200 NVL72

LMSYS reports SGLang sustaining ~26K input and ~13K output tokens per GPU per second on NVIDIA GB200 NVL72, with up to 4ร— generation speedups over Hopper in InferenceMAX runs and SGLang set as the default DeepSeek engine on both NVIDIA and AMD. Following up on tokens per MW lead, GB200โ€™s efficiency story now comes with concrete perโ€‘GPU throughput measurements. benchmarks overview LMSYS blog post

These results reflect joint systemโ€‘level optimizations (prefill/decode disaggregation, expert parallelism, FP8 attention, NVFP4 GEMMs) and Blackwellโ€™s higherโ€‘bandwidth interconnects. The adoption as the default engine in the SemiAnalysis InferenceMAX benchmark underlines runtime maturity beyond a single hardware stack. collaboration note

Together AIโ€™s ATLAS speculator delivers up to 4ร— faster inference and ~500 TPS on DeepSeekโ€‘V3.1

Together AI unveiled ATLAS, an adaptive speculative decoding system that learns from live traffic to accelerate inference; they show up to 4ร— faster throughput vs baseline and around 500 TPS on DeepSeekโ€‘V3.1, nearly 2ร— faster than their prior Turbo speculator. results overview ATLAS blog post

Because ATLAS adapts to workload acceptance rates and latency profiles at runtime, it avoids static speculator brittleness and continues improving under production mixesโ€”relevant for teams pushing longโ€‘context and multiโ€‘turn agent workloads where token budgets dominate cost and latency.

DGX Spark runtime clarified: read vs generation speeds, and where it lands vs 5090/M4 Pro

Simon Willison updated his DGX Spark notes to separate token read (prefill) from token generation speeds, highlighting earlyโ€‘ecosystem wrinkles on ARM64 CUDA and containers for inference workflows. blog update handsโ€‘on notes

Independent charts circulating today place Sparkโ€™s output tokens/sec close to Appleโ€™s Mac Mini M4 Pro and below RTX 5090/5080 desktop GPUs; at ~$4,000, it targets smallโ€‘model local serving rather than topโ€‘end desktop cards. This framing helps teams choose hardware based on actual decode throughput rather than mixed prefill+decode numbers. benchmarks chart


๐Ÿงฑ Desktops and racks: DGX Spark and AMD Helios

Concrete hardware updates span desktop Blackwell boxes and rackโ€‘scale MI450 systems; excludes runtime perf which sits under Serving speed.

DGX Spark lands on desks: 1โ€‘PFLOP Blackwell box ~$4k, early buyers flag ARM64 gaps

NVIDIAโ€™s DGX Spark is now in customersโ€™ handsโ€”Jensen Huang even handโ€‘delivered a unitโ€”with users calling out ~1 PFLOP in a tiny form factor and an expected ~$4,000 price bracket, following up on early runtime coverage that focused on capability demos. OpenAIโ€™s Greg Brockman highlighted the sizeโ€‘toโ€‘compute leap, while a community review details 128 GB RAM, ~3.7 TB NVMe, and an ARM64 stack that works best today via NVIDIAโ€™s official containers as the CUDA-onโ€‘ARM ecosystem matures hand delivery and blog post. A comparative chart situates Spark near Appleโ€™s Mac Mini M4 Pro on smaller models yet below RTX 5090 class boards for raw headroom, framing Spark as a developer desktop rather than a training rig comparison chart.

For engineers, the takeaway is clear: strong local prototyping hardware, but plan on containerized toolchains and watch for fastโ€‘moving ARM64 library support as the ecosystem catches up update note.

Oracle to deploy 50,000 AMD MI450 GPUs on OCI from Q3 2026

Oracle Cloud Infrastructure will roll out 50,000 AMD Instinct MI450 accelerators starting in Q3โ€™26, with further expansion in 2027+, giving enterprises a sizable nonโ€‘NVIDIA public cloud option for training and inference. The deployment uses AMDโ€™s Helios rack designโ€”MI450 + nextโ€‘gen EPYC + Pensando networkingโ€”offering preโ€‘wired rack blocks that target scale and serviceability deployment plan.

For AI platform teams, this strengthens a multiโ€‘vendor strategy: OCIโ€™s MI450 footprint could improve supply diversification and pricing leverage while testing the maturity of ROCmโ€‘based toolchains at rack scale; evaluation should include memoryโ€‘bound workloads that benefit from MI450โ€™s HBM4 capacity and bandwidth.

AMD Helios racks: 72 MI450s, ~1.4 exaFLOPS FP8 and 31 TB HBM4 per cabinet

AMDโ€™s new Helios rackโ€‘scale platform packages 72 Instinct MI450 GPUs into a serviceable cabinet with quickโ€‘disconnect liquid cooling, delivering roughly 1.4 exaFLOPS FP8 and ~31 TB of HBM4 per rack. Each MI450 pairs 432 GB HBM4 with ~19.6 TB/s bandwidth, with UALink for inโ€‘node scaleโ€‘up and UEC Ethernet for crossโ€‘rack scaleโ€‘out; the design follows Open Rack Wide conventions to simplify deployment and field service platform details.

For leaders scoping nonโ€‘NVIDIA capacity, Heliosโ€™ emphasis on memory capacity per GPU, open fabrics, and rackโ€‘level serviceability is the notable angle; it positions AMDโ€™s stack as a credible alternative for large training and highโ€‘context inference footprints.


๐Ÿ—๏ธ AI factories and demand math

Macro infra signals today are investment and usage economics; few pure capex filings, but meaningful scale commitments and tokenโ€‘demand curves.

Hidden โ€œthinking tokensโ€ push usage to ~1.3 quadrillion per month even as unit prices fall

Usage is exploding while price per token drops: WSJ highlights that modelsโ€™ hidden reasoning traces (โ€œthinking tokensโ€) are swelling tokens per answer, driving spend despite cheaper rates WSJ analysis. The ecosystem view shows monthly usage jumping to roughly 1,300T tokens (~1.3 quadrillion) by Oct โ€™25, up from ~9.7T in May, as deep inference and selfโ€‘checking take hold ecosystem map.

For AI leaders, the economics bifurcate into quick vs deep inference. Capacity planning now hinges on acceptance rates for speculative decoding, toolโ€‘use loops, and explicit โ€œthought budgets,โ€ not just list prices.

OpenAIโ€™s compute roadmap now totals ~26 GW across Nvidia, Broadcom and AMD

OpenAIโ€™s disclosed arrangements add up to roughly 26 GW of dataโ€‘center capacity equivalents: a 10 GW Broadcom customโ€‘accelerator program, a 10 GW Nvidia systems LOI (with an anticipated ~$100B Nvidia investment as rollouts happen), and 6 GW of AMD capacity paired with a warrant for up to 160M AMD shares at $0.01 tied to deployments compute summary, following up on 10 GW term sheet.

The plan targets first racks shipping in 2Hโ€‘2026 and multiโ€‘year buildโ€‘outs through 2029, implying a fullโ€‘stack bet (silicon, memory hierarchy, compiler/runtime) to raise perf/W and reduce cost per token at scale.

Google to build $15B AI hub in India with gigawatt-scale data center and subsea landing

Google will invest $15B from 2026โ€“2030 to stand up its first AI hub in India (Visakhapatnam), including a gigawattโ€‘scale AI data center built with AdaniConneX and Airtel and a new subsea cable landing that ties into Googleโ€™s global network announcement recap.

This adds multiโ€‘GWโ€‘class capacity in South Asia and shortens latency paths for Indiaโ€™s fastโ€‘growing AI workloads. The bundle (DC, subsea, and "full AI stack") signals a vertically integrated approach that can insulate critical AI services from regional power and network bottlenecks.

Oracle to deploy 50,000 AMD MI450s on OCI starting Q3 2026, expanding in 2027+

Oracle Cloud Infrastructure will add 50,000 AMD Instinct MI450 GPUs beginning in Q3โ€‘2026, with further expansion planned into 2027 and beyond deployment note. The move strengthens a nonโ€‘Nvidia publicโ€‘cloud lane for largeโ€‘scale training and inference and offers enterprises vendor diversification during a supplyโ€‘constrained cycle.

Expect customers to weigh ROCm maturity, cluster fabrics, and model portability alongside price/perf as they evaluate multiโ€‘vendor pipelines.

AMD unveils Helios rack: 72 MI450s, 31 TB HBM4 and ~1.4 exaFLOPS FP8 per cabinet

AMDโ€™s Helios rack aims to simplify serviceability at AIโ€‘factory scale: 72 MI450 GPUs per cabinet, ~31 TB of HBM4 and ~1.4 exaFLOPS FP8, with UALink for inโ€‘node GPU memory sharing, UEC Ethernet for scaleโ€‘out, and quickโ€‘disconnect liquid cooling platform brief.

Heliosโ€™ memoryโ€‘heavy design targets throughput on memoryโ€‘bound training and largeโ€‘context inference while leaning on open fabrics, a useful counterweight to proprietary interconnect stacks.

Brookfield commits up to $5B to Bloom Energy to finance fuelโ€‘cell AI data centers

Infrastructure financier Brookfield Asset Management will provide up to $5B to Bloom Energy to fund fuelโ€‘cellโ€‘powered AI data centers, expanding onโ€‘site generation options beyond grid ties and traditional PPAs funding update.

Fuel cells can reduce interconnect delays, cut transmission risk, and improve siting flexibility for AI factories, though cost curves will hinge on fuel pricing and fleetโ€‘scale deployment economics.

AI is lifting GDP via investment before productivity, says WSJ

WSJ reports that AIโ€™s contribution is showing up first in capex, not broad productivity gains: roughly twoโ€‘thirds of U.S. GDP growth in early 2025 came from software and compute investment, while measured worker productivity impacts remain mixed; about 10% of firms report AI use WSJ summary.

For AI planners, this implies nearโ€‘term demand will keep tracking capex cycles (chips, racks, power, data) even as productivity dividends arrive more slowly and unevenly across sectors.


๐Ÿ›ก๏ธ Wellโ€‘being guardrails, ageโ€‘gating and privacy signals

Policy/safety moves dominate discourse: OpenAIโ€™s wellโ€‘being council and adultโ€‘content stance; Californiaโ€™s oneโ€‘click privacy control. Excludes model launches.

OpenAI will relax mentalโ€‘health refusals and allow erotica for verified adults in December

Sam Altman said ChatGPTโ€™s conservative stance on mentalโ€‘health topics will be eased as new safeguards mature, and that erotica will be permitted for verified adults starting in December as ageโ€‘gating rolls out broadly Altman thread. The shift emphasizes user choice (โ€œtreat adult users like adultsโ€) and optional, more expressive personalities, while maintaining crisis protections for atโ€‘risk users.

OpenAI creates Expert Council on Wellโ€‘Being and AI to inform youth safeguards and product design

OpenAI introduced an eightโ€‘member Expert Council spanning psychology, psychiatry, and HCI to guide ChatGPT and Sora on healthy interactions across age groups, building on prior work like parental controls and teen distress notifications OpenAI announcement, and outlining scope and members in detail OpenAI blog post. The council is intended to continuously advise on features that affect emotions, motivation, and mental health.

California AB566 mandates oneโ€‘time browser โ€œdo not sell/shareโ€ signal and statewide deletion system

California approved AB566 requiring browsers to ship a builtโ€‘in privacy control that broadcasts a oneโ€‘click โ€œdo not sell/shareโ€ signal, plus a statewide dataโ€‘broker deletion request flow law details.

  • Browser signal deadline: mandatory by Janโ€‘2027; deletion system begins Janโ€‘2026 with 45โ€‘day broker checks thereafter law details.
  • Practical impact: crossโ€‘site ad targeting and broker feeds will face wider optโ€‘outs by default, tightening data available to train and target AIโ€‘powered ad systems.

California signs first U.S. chatbot law requiring safeguards; enables lawsuits for harms

Governor Newsom signed SB 243, billed as the first U.S. law regulating AI chatbots, mandating operator safeguards for vulnerable users and allowing lawsuits when failures cause harm news coverage, following up on SB243 that required bots to disclose nonโ€‘human status and address selfโ€‘harm protocols. The signature moves the mandate from passage to enforceable law, raising compliance stakes for AI assistants deployed at scale.


๐ŸŽฌ Video/3D tools: Sora 2 vs Veo 3.1, deflicker, and imageโ†’3D

Large creative stack chatter today (video arena moves, deflicker tool, ComfyUI workflows, imageโ†’3D). Separate from the Qwen VLM feature.

Sora 2 Pro ties Veo 3 for #1 on Video Arena; Sora 2 (with audio) ranks #3

LMArenaโ€™s organizers added OpenAIโ€™s Sora 2 and Sora 2 Pro to their Textโ€‘toโ€‘Video leaderboard, noting Sora 2 Pro is the first to tie Veo 3 variants for the top spot while Sora 2 lands at #3 and is praised for synchronized audio generation leaderboard update.

This is the first broad, headโ€‘toโ€‘head signal that Soraโ€™s audio+video pipeline is competitive with Veo 3 on overall quality, and it raises the bar for integrated sound in T2V workflows (see the invite for direct prompting and voting in Discord) Discord invite.

Gemini UI surfaces Veo 3.1 banners and a โ€œfastโ€ model, pointing to imminent release

Screenshots show โ€œNew! Video generation just got better with Veo 3.1โ€ banners in Gemini with a Try Now entry point, and a Model Finder card listing veoโ€‘3.1 and veoโ€‘3.1โ€‘fast preview endpoints; one user notes availability appears limited to the U.S. for now Gemini banner, model finder card, UI screenshot. This follows yesterdayโ€™s thirdโ€‘party endpoint sightings without Google confirmation Veo 3.1 hint.

If this ships as indicated, expect longer durations and better motion consistency to narrow gaps vs Sora 2 on multiโ€‘scene coherence (and give Gemini Studio users a native path for video+audio).

Higgsfield launches Enhancer to kill flicker; adds Sora 2 MAX/Pro MAX and a freeโ€‘run promo

Higgsfield introduced Enhancer, a universal deflicker that cleans up AIโ€‘generated and legacy footage, alongside Sora 2 MAX (higher fidelity) and Sora 2 Pro MAX (multiโ€‘scene at MAX quality), plus a 1080p Upscale Previewโ€”available with โ€œUnlimitedโ€ usage through the end of the week release thread. Creators are already pushing teaser workflows on the platform and highlighting the unlimited window for Ultimate/Creator plans creator promo.

Imageโ†’3D in ~2โ€“5 minutes: Hitem3D cuts modeling to ~$0.30โ€“$1.40 per asset with watertight meshes

Hitem3Dโ€”built atop Sparc3D geometry and ULTRA3D speedโ€”turns one or a few reference photos into studioโ€‘ready, watertight meshes (up to 1536ยณ detail) in about 2โ€“3 minutes per view, with typical jobs completing in ~2โ€“5 minutes and costing roughly $0.30โ€“$1.40 per model feature breakdown, ArXiv paper. A followโ€‘up thread walks through the view expansion, alignment, triangulation, and texturing path, and calls out a faces/hair variant for characterโ€‘centric assets how-to thread, official note.

Runway debuts Apps for everyday video work: remove, reshoot, add dialogue, upscale to 4K, restyle

Runway released a set of five AI video Appsโ€”Remove from Video, Reshoot Product, Add Dialogue (with tone), Upscale Video (oneโ€‘click to 4K), and Change Image Style (incl. relighting/retexturing)โ€”rolling out on the web with more coming weekly and an open call for ideas apps overview, feature list. For production teams, this bundles common cleanup and repurposing tasks without leaving the browser, potentially reducing roundโ€‘trips to NLEs or specialized plugins.

ComfyUIโ€™s 3โ€‘minute guide to WAN 2.2 Animate and a new character replacement workflow

For creators assembling animation pipelines in nodes, ComfyUI shared a concise WAN 2.2 Animate tutorial to get character motion running fast, plus a separate walkthrough on character replacement inside the same graph tutorial, character replacement. The team is soliciting feedback on the workflow ergonomics to refine defaults and examples feedback request.


๐Ÿ“Š Leaderboards and puzzlers (nonโ€‘video)

Smaller eval pulse today: coding/vibe arena standings and NYT Connections deltas. Excludes Qwen leaderboard items (covered in the feature) and video arena (in Media).

DeepSeek-V3.2-Exp leads new Vibe Coding Arena

DeepSeekโ€‘V3.2โ€‘Exp now tops the BigCodeArena โ€œvibe codingโ€ board, reflecting executionโ€‘first, userโ€‘voted preferences; following up on execution evals that introduced the arenaโ€™s runโ€‘andโ€‘judge method, todayโ€™s snapshot shows a clear pecking order driven by real usage. See the board preview in leaderboard image.

  • Top 3 today: DeepSeekโ€‘V3.2โ€‘Exp (โ‰ˆ1107), DeepSeekโ€‘V3.1โ€‘Terminus (โ‰ˆ1080.9), and qwen3โ€‘235bโ€‘a22bโ€‘instructโ€‘2507 (โ‰ˆ1069.8), all based on raw headโ€‘toโ€‘head user votes and execution outcomes leaderboard image.

NYT Connections update: GPTโ€‘5 Pro 83.9; DeepSeek V3.2 Exp 59.4

A fresh NYT Connections snapshot (last 100 puzzles) puts GPTโ€‘5 Pro at 83.9, with DeepSeek V3.2 Exp at 59.4; Claude Sonnet 4.5 Thinking 16K rises to 48.2 while its nonโ€‘reasoning mode reaches 46.1; GLMโ€‘4.6 sits at 24.2. The update also notes GPTโ€‘5 Pro trails an earlier o3โ€‘pro marker (87.3) in this cut score rundown, and confirms the scope is the most recent 100 puzzles method note, with the benchmark and scripts available in the maintainerโ€™s repo GitHub repo.


๐Ÿ’ผ Enterprise moves: funding, assistants and commerce apps

Notable GTM signals span a large Series B, Salesforce+OpenAI/Slack integrations, a Walmart ChatGPT app, and AIโ€‘driven layoffs commentary.

Salesforce brings Agentforce 360 into ChatGPT; $100M support savings cited

Salesforce expanded its OpenAI partnership so Agentforce 360 apps can run inside ChatGPT, alongside deeper Slack and Codex tieโ€‘ins shown at Dreamforce partnership summary, integration graphic, and a Salesforce post company update. Internally, Salesforce says AI support agents are saving ~$100M annually and Reddit reports 84% faster resolution (46% deflection, 8.9โ†’1.4 min) savings stats. Full partner details are in Salesforceโ€™s release Salesforce press release, following up on Slack apps which shipped ChatGPT for Slack.

For enterprises, this tightens the loop between CRM data, Slack workflows, and ChatGPT distributionโ€”lowering friction to deploy agentic flows where users already work.

Goldman Sachs signals job cuts as AI adoption accelerates under โ€œOneGS 3.0โ€

Goldman told staff it will constrain headcount growth and make limited reductions this year as part of an AIโ€‘driven efficiency push (โ€œOneGS 3.0โ€), even as overall headcount remains above 2024 levels ai plan memo, bloomberg snapshot.

For AI leaders and HR, itโ€™s a visible example of largeโ€‘bank operational redesign around automation in onboarding, compliance, lending, and vendor flowsโ€”shifting workforce mix while scalability improves.

Walmart launches a shopping app inside ChatGPT

Walmart rolled out a ChatGPT app that lets users browse and buy across Walmart and Samโ€™s Club assortments directly in ChatGPT, signaling mainstream retailโ€™s move into agentic commerce walmart details, Bloomberg report, deal roundup.
For commerce leaders, this tests conversion in conversational channels and sets a template for catalog search, bundling, and checkout flows embedded in generalโ€‘purpose assistants.

Reducto raises $75M Series B, surpasses 1B pages processed

Document AI startup Reducto closed a $75M Series B led by a16z, taking total funding to $108M, and says it has now processed 1B+ pages (~6ร— since its Series A) with plans to double down on model research and productionโ€‘grade tooling funding note.

For AI leaders, this marks accelerating enterprise demand for highโ€‘accuracy OCR+VLM pipelines in regulated workflows (charts, tables, multiโ€‘page docs) and a wellโ€‘funded competitor in the document automation stack.

Slack turns Slackbot into a full AI assistant with private AWS processing

Slack is piloting a revamped Slackbot that plans meetings, summarizes threads, finds files via natural language, and coordinates calendars (Google/Outlook), with AI running in a private AWS environment and wider rollout targeted by endโ€‘2025 product brief.

This positions Slackโ€™s native assistant as an enterprise option alongside thirdโ€‘party botsโ€”important for CIOs weighing data residency, privacy, and changeโ€‘management for knowledge work assistants.

Vercel ships Slack Agent Template and Bolt library to build agents inside Slack

Vercel released a Slack Agent Template plus a Bolt library to quickly build, test and deploy Slack agents, complementing the growing wave of agentic enterprise workflows showcased at Dreamforce template link, slack demo.

For platform teams, this lowers the integration cost to embed RAG, approvals, and code actions directly in Slack with CI/CDโ€‘friendly scaffolding.


๐Ÿงช Reasoning, RL and longโ€‘horizon methods

Dense research day: diffusion guidance, chain robustness, dynamic context windows, memoryโ€‘driven agents, overthinking control, KV compression, and AI peer review.

Deep search agents with sliding context hit ~33.5%

A sliding window that retains assistant thoughts, elides older tool dumps, and preserves latest tool outputs lets agents reason over ~100 turns within 32k, reaching ~33.5% on a tougher multiโ€‘page benchmarkโ€”without external summarizers paper abstract.

  • Training mirrors the runtime view (tool calls replaced by placeholders), followed by RL that rewards only correct final answers, stabilizing long multiโ€‘step sessions paper abstract.

MUSE learns on the job to set new TAC SOTA

An experienceโ€‘driven loopโ€”plan, execute, reflect, storeโ€”writes hierarchical memories (strategic notes, subโ€‘task SOPs, tool tips) and reuses them across tasks and models, pushing TAC to 51.78% with Geminiโ€‘2.5 Flash paper overview.

  • A separate reflect agent blocks false success, proposes retries, and distills new SOPs; plainโ€‘language memories transfer to other LLMs without retraining paper overview.

RL finds the heads that matter for reasoning

A tiny gate per attention head mixes a short sliding window with full history; RL raises gates for heads that improve correctness. Highโ€‘gate heads keep full KV while others get trimmed, preserving chain integrity and saving ~20โ€“50% KV with nearโ€‘lossless (or better) accuracy on math/coding paper title page.

  • On a tough math set, the compressed policy even beats the fullโ€‘cache baseline by focusing memory where it counts paper title page.

TAG diffusion guidance trims steps and hallucinations

Tangential Amplifying Guidance (TAG) amplifies the tangential component of each diffusion update to keep samples near highโ€‘probability regions, reducing semantic drift; in tests, ~30 TAG steps surpass 100โ€‘step classifierโ€‘free guidance on quality without extra model calls paper thread.

  • Methodologically, TAG keeps the radial (noise) part fixed while modestly boosting the tangential (content) component, improving likelihood and textโ€‘image alignment on standard samplers paper recap.

Rโ€‘Horizon: RL improves longโ€‘chain reasoning

New details show reinforcement learning on composed chains lifts chain accuracy and yields roughly +7.5 on AIME24, addressing early stopping and format breaks that crater long sequences paper update. This builds on Rโ€‘Horizon, which introduced breadthโ€‘andโ€‘depth chaining to stress longโ€‘horizon reasoning.

  • Gap analysis finds real chain scores fall below independentโ€‘trial expectations as chain length grows; verifiable rewards and GRPO reduce error accumulation paper summary.

Reasoning shaping curbs overthinking

Group Relative Segment Penalization (GRSP) supervises at the step level: it detects step boundaries (keywords or confidence dips) and penalizes choppy, short segments that correlate with wrong answers, yielding shorter outputs with stable accuracy on math and RL setups paper details.

  • Larger bases still benefit, compressing steps better and stabilizing training versus tokenโ€‘level penalties paper details.

AI metareview nears human accept/reject accuracy

An ensemble of reviewer personas (empiricist, theorist, pedagogical) plus a metareviewer achieves ~81.8% accept/reject accuracy on 1,963 submissions, close to the ~83.9% human average; AI excels at fact/literature checks but lags on novelty/theory paper summary.

  • Rebuttals can overโ€‘sway agents, so humans stay in the loop; ensembles reduce persona bias versus single reviewers paper summary.

MPO coโ€‘optimizes words and visuals for MLLMs

Treating prompts as paired text+cue (image/video/molecule) and updating both after singleโ€‘note feedback improves MLLM answers while slashing trial budgets by ~70% compared to textโ€‘only optimizers; strong parents are warmโ€‘started to explore efficiently paper summary.

  • Visual cues are created, edited, or mixed to align attention on the right details, with wins across images, videos, and molecular tasks paper summary.

On this page

Executive Summary
Feature Spotlight: Qwen3โ€‘VL goes compact (4B/8B) with nearโ€‘flagship VLM
๐Ÿงฉ Feature: Qwen3โ€‘VL goes compact (4B/8B) with nearโ€‘flagship VLM
Qwen3โ€‘VL 4B/8B launch: compact, FP8-ready models rival larger VLMs
Cookbooks, TRL notebooks and Spaces shorten timeโ€‘toโ€‘value for Qwen3โ€‘VL
Dayโ€‘0 Apple silicon path: MLXโ€‘VLM and LM Studio run Qwen3โ€‘VL locally
vLLM adds Qwen3โ€‘VL, signaling productionโ€‘grade serving path
Qwen3โ€‘VLโ€‘235B free on Ollama Cloud complements compact family
Qwen3โ€‘VLโ€‘8B joins LMArenaโ€™s Text & Vision battles
๐Ÿ†• Search models: GPTโ€‘5 web stack lands in the API
OpenAI ships gptโ€‘5โ€‘searchโ€‘api: 60% cheaper web search in Chat Completions with domain filtering
๐Ÿ› ๏ธ Agentic coding: Codex, subโ€‘agents, and endโ€‘toโ€‘end delegation
Claude Code subโ€‘agents emerge as a best practice for deep repository work
Factory 1.8: delegate Linear tickets, spin headless โ€œdroid execโ€, and close incidents via Sentryร—Linear
OpenAI kicks off Codex video series and CLI howโ€‘to for GPTโ€‘5โ€‘Codex
Anthropicโ€™s code_execution tool lands in the AI SDK with typed I/O for bash and text edits
Codex โ€œPlan Modeโ€ preview surfaces with iterative planning and planner model toggle
Codex CLI v0.45 trims tokens by ~10% at equal quality; new usage videos share workflows
Cursor agent runs 24 hours straight to ship a working project management app
Vercel ships a Slack Agent Template and Bolt library to build agents inside Slack
Braintrust adds remote evals so you can benchmark local agents without moving workloads
Vercel AI SDK adds Anthropic memory tool integration for agent state management
๐Ÿš€ Serving speed: GB200 tokens/sec and adaptive speculation
SGLang hits ~26k input / 13k output tok/s per GPU on GB200 NVL72
Together AIโ€™s ATLAS speculator delivers up to 4ร— faster inference and ~500 TPS on DeepSeekโ€‘V3.1
DGX Spark runtime clarified: read vs generation speeds, and where it lands vs 5090/M4 Pro
๐Ÿงฑ Desktops and racks: DGX Spark and AMD Helios
DGX Spark lands on desks: 1โ€‘PFLOP Blackwell box ~$4k, early buyers flag ARM64 gaps
Oracle to deploy 50,000 AMD MI450 GPUs on OCI from Q3 2026
AMD Helios racks: 72 MI450s, ~1.4 exaFLOPS FP8 and 31 TB HBM4 per cabinet
๐Ÿ—๏ธ AI factories and demand math
Hidden โ€œthinking tokensโ€ push usage to ~1.3 quadrillion per month even as unit prices fall
OpenAIโ€™s compute roadmap now totals ~26 GW across Nvidia, Broadcom and AMD
Google to build $15B AI hub in India with gigawatt-scale data center and subsea landing
Oracle to deploy 50,000 AMD MI450s on OCI starting Q3 2026, expanding in 2027+
AMD unveils Helios rack: 72 MI450s, 31 TB HBM4 and ~1.4 exaFLOPS FP8 per cabinet
Brookfield commits up to $5B to Bloom Energy to finance fuelโ€‘cell AI data centers
AI is lifting GDP via investment before productivity, says WSJ
๐Ÿ›ก๏ธ Wellโ€‘being guardrails, ageโ€‘gating and privacy signals
OpenAI will relax mentalโ€‘health refusals and allow erotica for verified adults in December
OpenAI creates Expert Council on Wellโ€‘Being and AI to inform youth safeguards and product design
California AB566 mandates oneโ€‘time browser โ€œdo not sell/shareโ€ signal and statewide deletion system
California signs first U.S. chatbot law requiring safeguards; enables lawsuits for harms
๐ŸŽฌ Video/3D tools: Sora 2 vs Veo 3.1, deflicker, and imageโ†’3D
Sora 2 Pro ties Veo 3 for #1 on Video Arena; Sora 2 (with audio) ranks #3
Gemini UI surfaces Veo 3.1 banners and a โ€œfastโ€ model, pointing to imminent release
Higgsfield launches Enhancer to kill flicker; adds Sora 2 MAX/Pro MAX and a freeโ€‘run promo
Imageโ†’3D in ~2โ€“5 minutes: Hitem3D cuts modeling to ~$0.30โ€“$1.40 per asset with watertight meshes
Runway debuts Apps for everyday video work: remove, reshoot, add dialogue, upscale to 4K, restyle
ComfyUIโ€™s 3โ€‘minute guide to WAN 2.2 Animate and a new character replacement workflow
๐Ÿ“Š Leaderboards and puzzlers (nonโ€‘video)
DeepSeek-V3.2-Exp leads new Vibe Coding Arena
NYT Connections update: GPTโ€‘5 Pro 83.9; DeepSeek V3.2 Exp 59.4
๐Ÿ’ผ Enterprise moves: funding, assistants and commerce apps
Salesforce brings Agentforce 360 into ChatGPT; $100M support savings cited
Goldman Sachs signals job cuts as AI adoption accelerates under โ€œOneGS 3.0โ€
Walmart launches a shopping app inside ChatGPT
Reducto raises $75M Series B, surpasses 1B pages processed
Slack turns Slackbot into a full AI assistant with private AWS processing
Vercel ships Slack Agent Template and Bolt library to build agents inside Slack
๐Ÿงช Reasoning, RL and longโ€‘horizon methods
Deep search agents with sliding context hit ~33.5%
MUSE learns on the job to set new TAC SOTA
RL finds the heads that matter for reasoning
TAG diffusion guidance trims steps and hallucinations
Rโ€‘Horizon: RL improves longโ€‘chain reasoning
Reasoning shaping curbs overthinking
AI metareview nears human accept/reject accuracy
MPO coโ€‘optimizes words and visuals for MLLMs