Qwen3.5 Small 0.8B–9B ships – ~7GB local footprint, multimodal tools

Stay in the loop

Free daily newsletter & Telegram daily report

Executive Summary

Alibaba’s Qwen team released open-weights Qwen3.5 Small in 0.8B/2B/4B/9B sizes (plus Base variants), positioning “more intelligence, less compute”; the launch emphasizes native multimodal training, tool-calling, and fast day‑0 packaging across local runtimes. Distribution landed immediately: Ollama added qwen3.5:{0.8b,2b,4b,9b} with “tool calling, thinking, multimodal”; Unsloth shipped GGUFs and a local run/fine-tune guide; LM Studio lists Qwen3.5‑9B with a ~7GB local footprint; an MLX demo shows Qwen3.5‑2B running on an iPhone 17 Pro at 6‑bit quantization.

• Qwen3.5 eval chatter: a circulating collage cites 81.7 GPQA Diamond and 87.7 OmniDocBench v1.5 for 9B; mappings/settings vary across retellings, so no single auditable artifact yet.
• Cursor business signal: Bloomberg-reported $2B ARR in February; up from $1B three months prior; ~60% enterprise mix.
• BullshitBench v2: adds 100 questions and reports 70+ model variants; claims more “reasoning” tokens correlate with worse nonsense detection.

Taken together, the day reads as “local-first weights + agent UX + eval skepticism”: small multimodal models get installed everywhere fast, while reliability/measurement artifacts lag behind the rollout velocity.

Qwen 3.5 Small open-weights: edge-ready multimodal models (0.8B–9B) land everywhere

Qwen3.5 Small (0.8B–9B) compresses “agent-capable” multimodal performance into laptop/phone-sized open weights, immediately impacting local deployment, offline agents, and cost/perf tradeoffs.

High-volume cross-account release of Qwen3.5 Small (0.8B/2B/4B/9B) with strong benchmark claims, multimodal + tool-calling support, and rapid day-0 availability across local runtimes (Ollama/LM Studio/MLX/GGUF guides).

Jump to Qwen 3.5 Small open-weights: edge-ready multimodal models (0.8B–9B) land everywhere topics

🪶 Qwen 3.5 Small open-weights: edge-ready multimodal models (0.8B–9B) land everywhere

Qwen ships Qwen3.5 Small (0.8B–9B) plus Base weights

Qwen3.5 Small (Alibaba Qwen): The hinted “small Qwens” are now a full release following up on early hint—Qwen3.5-0.8B, 2B, 4B, and 9B, with Base variants also published; positioning is “more intelligence, less compute” with native multimodal training and scaled RL called out in the launch thread launch thread.

The public packaging emphasis is migration-to-production friendliness (multiple sizes; Base + post-trained), with the canonical entry point being the Hugging Face collection linked as the model collection in Model collection.

Qwen3.5 Small 0.8B–9B ships – ~7GB local footprint, multimodal tools

Executive Summary

Top links today

Qwen 3.5 Small open-weights: edge-ready multimodal models (0.8B–9B) land everywhere

Table of Contents

🪶 Qwen 3.5 Small open-weights: edge-ready multimodal models (0.8B–9B) land everywhere

Qwen ships Qwen3.5 Small (0.8B–9B) plus Base weights

Qwen3.5-2B runs on iPhone 17 Pro via MLX (6-bit quant)

Qwen3.5-9B/4B benchmark collage sets aggressive “small beats big” narrative

Ollama ships Qwen3.5 Small runners with tool calling and multimodal support

Unsloth publishes Qwen3.5 Small GGUFs, pitching phones and 6–7GB RAM laptops

Hugging Face shows fast uptake for Qwen3.5 variants and GGUF packaging

Qwen3.5-0.8B/2B early benchmark table fuels “tiny models plateau” talk

Qwen3.5-9B is available in LM Studio, marketed as ~7GB to run local

Early bet: Qwen3.5 Small becomes a base for OCR and computer-use agents

🧑‍💻 Claude Code shipping week: voice input, remote control UX, skills/plugins, and Cowork tasks

Claude Opus 4.6 web search tool reaches #1 on Arena Search

Builders are using Claude Code Voice Mode for hands-free CLI work

Claude Code calls out auto-memory as a shipped feature

Claude Code Remote Control is being used as a phone-to-server control plane

Claude for Open Source Program offers Max 20x for 6 months

Claude usage page reportedly drops weekly/session stats view

Claude Code ships new Skills

Claude Cowork adds scheduled tasks

Claude expands plugins and connectors surfaced in the UI

Claude for Chrome highlights a new Quick mode

🧠 Claude Memory expands: free-plan availability + import/export portability

Claude makes Memory free, adds import/export so you can switch assistants without losing context

Builders frame Claude’s Memory import as an explicit switching mechanic during churn

🧰 Codex product signals: outages, Windows tease, hackathons, and fast-mode breadcrumbs tightening

OpenAI teases Codex for Windows and opens an interest form

Codex incident briefly blocked requests as “high cyber risk,” fixed in ~8 minutes

Codex CLI trims “GPT‑5.4” from the /fast command description

Codex hits all-time high RPS; team signals capacity ready

Builder reports Codex Xhigh taking 33+ minutes on a task Opus did in 3

Codex’s APAC hackathon: 100+ devs, 50+ projects; meetups directory goes live

Codex CLI 0.107 reportedly hits 503s; rollback behavior emerges

GPT‑5.3‑Codex‑Spark starts rolling out to engaged Codex users on Plus

📈 Cursor & agentic coding market: revenue milestones and “third era” autonomous agents

Cursor reportedly hits $2B ARR, doubling in three months

Cursor’s “third era”: autonomous cloud agents that return artifacts, not diffs

AI coding assistants get a $7.5–10B market size estimate

Cursor Plan mode uses Mermaid diagrams as a planning aid

🦞 OpenClaw ecosystem: rapid releases, ACP/subagents, multi-channel adapters, and community scaling

OpenClaw beta v2026.3.2 adds a first-class PDF tool and expands adapters

OpenClaw documents ACP Agents for running Claude Code/Codex as external runtimes

Telegram response streaming lands, and TeleClaw shows streaming OpenClaw agents

OpenClaw crosses 1,000 contributors on GitHub

OpenClaw maintainer bans a PR copier and retroactively repairs credits

OpenClaw’s meetup calendar shows a fast-growing multi-city builder circuit

OpenClaw users ask for per-channel instruction injection without spinning up new agents

Vercel signals OpenClaw support as a deployment surface

✅ Code quality pressure: reviews breaking under agent throughput + evaluation tactics

AGENTS.md can cut worst-case agent thrash, per a 124-PR study

Manual code review is buckling under agent PR volume

A lightweight evaluation stack for agents: deterministic first, LLM judge later

Use a fail-off/pass-on feature flag to keep long agent runs honest

AGENTS.md guidance: encode repeatable corrections, avoid dumping the wiki

Open-source maintainers are flagging AI-written issue overhead

Banning stock phrases is becoming a quality control tool for AI writing

🔌 MCP + agent integrations: skills vs tools, marketplace procurement, and batch learning

Vercel’s CLI becomes an agent procurement path for Marketplace integrations

Weaviate ships Agent Skills repo and a crisp MCP vs Skills mental model

Hyperbrowser shows “/learn batch” to teach agents multiple skills in parallel

📊 Benchmarks & evals: BS detection, repo-level tests, ARC scores, and search arena rankings

BullshitBench v2 expands to 100 questions and 70+ model variants

BullshitBench v2 plots suggest “think harder” can backfire on nonsense prompts

ARC-AGI-2 Semi-Private posts low scores for non-frontier providers (with costs)

GLM-5 posts a 76% points rate on Repo Bench (shared run)

Claude Opus 4.6 web search ranks #1 on Arena Search; API docs highlight code-filtering

WeirdML: GPT‑5.3 Codex (xhigh) reportedly takes the lead over Opus 4.6

🗺️ Practical agentic engineering patterns: exploration, specs/plans, context resets, and prompt variables

Being too “local” with coding agents leads to inconsistent changes

A concrete Figma-to-code stack: Gemini 3.1 Pro + MCP + browser + rules + resets

SPEC.md as destination and PLAN.md as journey for multi-agent work

Use prompt variables to keep outputs consistent while iterating

When agents loop on ideas, try the opposite approach

📑 Doc parsing & retrieval plumbing: PDF reality, layout data, and parsing toolchains

Why PDF parsing breaks: display coordinates aren’t document structure

OpenClaw beta adds a first-class pdf tool with provider backends