vLLM v0.15.0 adds MoE Multi‑LoRA – +454% tokens/s, -87% TTFT

Stay in the loop

Free daily newsletter & Telegram daily report

Executive Summary

vLLM shipped v0.15.0 with Multi‑LoRA serving for MoE via a new fused_moe_lora kernel; the project claims +454% output tokens/sec and -87% time‑to‑first‑token on GPT‑OSS 20B Multi‑LoRA, plus dense gains like Qwen3 32B at +99% OTPS; framing is “one base MoE + many adapters on one GPU,” handling compound sparsity (expert routing + adapter selection) with fused paths, Split‑K, and CTA swizzling. ROCm users also get AITER attention backends that route prefill/decode/extend separately; vLLM cites up to 4.4× decode throughput via KV‑layout and model‑specific kernels.

• Anthropic–DoW procurement fight: DoW directive language frames Anthropic as a “supply chain risk” with secondary‑boycott‑style contractor restrictions; Anthropic says it will sue and argues 10 USC 3252 would limit scope to DoW contract work; a Trump post calls for government‑wide offboarding with a 6‑month phaseout; Markey signals congressional pressure; Axios claims OpenAI’s red lines were accepted.
• Claude Code 2.1.63: /simplify and /batch ship for parallel refactors/migrations; HTTP hooks now POST/return JSON; auto‑memory/config shared across git worktrees; new opt‑out flag ENABLE_CLAUDEAI_MCP_SERVERS=false.
• Platform churn: Gemini 3 Pro Preview retires Mar 9 and gemini-pro-latest shifts Mar 6; DeepSeek V4 “next week” multimodal claims remain rumor-only.

Anthropic vs DoW escalates: supply‑chain risk threat, federal offboarding order, and court challenge

US DoW moves to blacklist Anthropic as a “supply chain risk,” triggering contractor/cloud shockwaves; Anthropic says customers mostly unaffected and vows a court fight—raising the stakes for AI procurement, safety terms, and vendor leverage.

Dominant cross-account storyline today: the Department of War moves to label Anthropic a “supply chain risk,” with knock-on implications for cloud hosting and defense contractors; Anthropic publicly commits to fight in court and reiterates red lines on surveillance and autonomous weapons.

Jump to Anthropic vs DoW escalates: supply‑chain risk threat, federal offboarding order, and court challenge topics

🏛️ Anthropic vs DoW escalates: supply‑chain risk threat, federal offboarding order, and court challenge

Hegseth declares Anthropic a “supply chain risk” with secondary-boycott-style language

Department of War (Hegseth): The Secretary of War message circulating today directs the DoW to designate Anthropic a “Supply-Chain Risk to National Security” and claims that “no contractor, supplier, or partner that does business with the United States military may conduct any commercial activity with Anthropic,” as shown in the Directive screenshot and repeated in Legal reading concern.

This framing matters operationally because it reads like a secondary boycott (restrictions on third parties), which is why multiple threads immediately model downstream effects on cloud hosting and defense supply chains rather than treating it as a normal procurement dispute.

vLLM v0.15.0 adds MoE Multi‑LoRA – +454% tokens/s, -87% TTFT

Executive Summary

Top links today

Anthropic vs DoW escalates: supply‑chain risk threat, federal offboarding order, and court challenge

Table of Contents

🏛️ Anthropic vs DoW escalates: supply‑chain risk threat, federal offboarding order, and court challenge

Hegseth declares Anthropic a “supply chain risk” with secondary-boycott-style language

Cloud hosting becomes the immediate choke point in the Anthropic “ban” wording

Trump orders federal agencies to cease Anthropic use with a six‑month phaseout

Altman says OpenAI reached DoW agreement for classified deployment with surveillance and weapons red lines

Axios claim: Pentagon accepts OpenAI red lines while still moving against Anthropic

Report: 500+ Google/OpenAI employees urge CEOs to back Anthropic-style red lines

The contract language fight centers on “all lawful purposes” vs explicit veto power

A compact timeline of the Anthropic–Pentagon escalation becomes a shared reference

Sen. Markey calls for congressional action to reverse Anthropic “supply chain risk” move

The dispute is framed as a new political-risk line item for AI labs and datacenters

🧰 Claude Code shipping sprint: /simplify + /batch, HTTP hooks, and worktree-shared memory/config

Claude Code 2.1.63 shares auto-memory and config across git worktrees

Claude Code AskUserQuestion can render markdown snippets in the UI

Claude Code reduces long-session memory by stripping progress payloads

Claude Code Remote Control begins rolling out for Pro users

Claude mobile Google connectors add Calendar actions and Gmail drafts

Claude Code /copy can now default to copying the full response

Claude Code adds an env var to opt out of bundled ClaudeAI MCP servers

Claude Code adds manual paste fallback for MCP OAuth redirects

Claude Code system prompt shifts from “Task tool” to “Agent tool” guidance

Claude Code’s VS Code sessions list gets rename and remove actions

🧑‍💻 Codex product & API momentum: GPT‑5.3‑Codex rollout, desktop multi-agent UX, and model-version breadcrumbs

OpenAI says Codex reached 1.6M weekly users and tripled since Jan 1

GPT-5.3-Codex positioning: strong for automation and backend, weaker on frontend

Responses API `view_image` gets original-resolution support behind a model gate

Codex CLI adds voice input: “Now you can speak to Codex CLI”

Codex Desktop app sentiment: multi-agent UX is improving and sticky

Codex reliability critique: premature “done” claims and fallback-heavy code paths

Conflicting signals: “GPT-5.4 cancelled for now” vs repo references

Codex update 0.106.0 enables an “Ask Question” tool in default modes

DesignArena tests a “Galapagos” model with GPT-5-like output style

Model-switching as a norm: “gear shifter” jokes about controlling Codex levels

🕸️ Agent runners & ops tooling: local swarms, queues, relay layers, and ‘agent computers’

Vercel Queues enters public beta as a durability primitive for agent workloads

Browser Use Cloud can replicate OpenClaw bots via HEARTBEAT + SOUL files

Browser Use Cloud pitches 24/7 agents with scoped auth and workspaces

SemiAnalysis claims Claude Code authors ~4% of public GitHub commits

Superset hits #1 on Product Hunt for running many coding agents locally

Agent Relay frames coordination as Slack primitives for agent teams

Claude climbs to #2 in the iOS App Store, a load and distribution signal

Convex scaling story: ClawHub spike from 5 users/day to 100k users/day

OpenClaw ops pattern: isolate an agent filesystem behind Box + Box CLI

Composio’s parody “model migration” thread spotlights provider compliance quirks

🧭 Agentic coding patterns: supervising teams, context caching, and the tab→agent→team transition

Karpathy: the workflow frontier keeps moving (None→Tab→Agent→Parallel→Teams)

Karpathy’s “program an organization” pattern for multi-agent research work

Measure long-running autonomy with session duration and test deltas

Treat agent “research” as a repo cache with TTL and staleness risks

Environment sanity checks: ask “what directory are you in?” early

PR governance pattern: handling AI-generated PRs from non-engineers

Terminology proposal: “harness” vs “apparatus” to reduce agent-stack ambiguity

🧩 Cursor & IDE agents: cloud VMs, PR demos, and mainstreaming of agent requests

Cursor Cloud Agents spin up per-agent VMs and ship draft PRs with proof

Cursor chart shows agents overtaking tab completion as the default interface

Cursor power users report token consumption rising fast

⚙️ Inference/runtime engineering: multi‑LoRA for MoE, ROCm attention backends, and DeepSeek kernels

vLLM v0.15.0 ships Multi-LoRA for MoE models with a fused_moe_lora kernel

vLLM ROCm AITER attention backends report up to 4.4× decode throughput on AMD

DeepSeek updates DeepGEMM with mHC, early Blackwell SM100 hooks, and FP4 compute

DeepSeek “DualPath” inference proposal pools decode bandwidth to speed prefill KV movement

🧠 Model & platform release stream: Qwen3.5 updates, DeepSeek V4 rumors, and Gemini API deprecations

Artificial Analysis: Qwen3.5 expands with 27B dense + 35B A3B + 122B A10B

DeepSeek V4 rumored for next week with image and video generation

Unsloth fixes Qwen3.5 chat template, improving tool-calls across GGUF quants

Gemini API will retire Gemini 3 Pro Preview on March 9; alias shifts March 6

Gemma arrives on iOS via Google AI Edge Gallery for offline on-device AI

📏 Benchmarks & evals: code review scoring, truthfulness tests, and arena leaderboard churn

IBench chart shows GPT-5.3-Codex at 86% with medium reasoning

Arena open-sources Arena-Rank to make pairwise leaderboards reproducible

BullshitBench expands coverage and claims Anthropic’s 4.5/4.6 series is pulling ahead

Code Arena February 2026: GLM-5 leads open models; #2 is a tie

EyeBench v2 leaderboard posts early spread between Codex 5.3 and Gemini 3.1 Pro

Text Arena February 2026: GLM-5 leads a tight top 3 among open models

🧪 Training & reasoning techniques: Doc‑to‑LoRA, ERL loops, and “dynamic data” RL shift