Z.ai GLM‑4.6V opens 106B VLM – 128K context, $0.60 per million tokens

Stay in the loop

Free daily newsletter & Telegram daily report

Join Telegram Channel

Executive Summary

Z.ai is throwing a serious gauntlet in the open vision‑language space: GLM‑4.6V, a 106B‑parameter multimodal model with 128K context, shipped today with public weights, native tool use, and an API priced at $0.60 / $0.90 per million input / output tokens. Its 9B sibling, GLM‑4.6V‑Flash, is not only open but free to call via API, giving teams a practical low‑latency option for local or cheap hosted runs.

What’s new here isn’t just another VLM checkpoint, it’s the stack around it. The model handles long video and document workloads end‑to‑end—think one‑hour matches or ~150‑page reports in a single pass—and bakes in multimodal function calling so it can pass screenshots and PDFs into tools, hit search or RAG backends, then visually re‑read charts before answering. Benchmarks show 88.8 on MMbench V1.1 and competitive MMMU‑Pro scores, often matching or beating larger open rivals like Qwen3‑VL‑235B and Step‑3‑321B.

Ecosystem support landed day‑zero: vLLM 0.12.0 ships an FP8 recipe with 4‑way tensor parallelism and tool parsers, MLX‑VLM and SGLang already have integrations, and indie apps are using it for OCR‑to‑JSON and design‑to‑code flows. Net effect: wherever you’d normally reach for Qwen or LLaVA, GLM‑4.6V is now a credible toggle in the dropdown rather than a science project.

Top links today

Feature Spotlight

Feature: Z.AI’s GLM‑4.6V goes open with native multimodal tool use

Open GLM‑4.6V/Flash add native multimodal function calling and 128K context; day‑0 vLLM support, free Flash tier, and docs make it a practical, low‑latency VLM option for real products.

Cross‑account launch dominates today: open GLM‑4.6V (106B) and 4.6V‑Flash (9B) add native function calling, 128K multimodal context, day‑0 vLLM serve, docs, pricing. Many demos stress long‑video/doc handling and design‑to‑code flows.

Jump to Feature: Z.AI’s GLM‑4.6V goes open with native multimodal tool use topics

Table of Contents

🧠 Feature: Z.AI’s GLM‑4.6V goes open with native multimodal tool use

Z.ai launches open GLM‑4.6V and free 4.6V‑Flash with 128K multimodal context

GLM‑4.6V and Flash post strong vision‑language scores vs Qwen and Step‑3

GLM‑4.6V bakes in native multimodal function calling and search‑to‑answer flows

GLM‑4.6V pushes 128K multimodal context to hour‑long videos and large docs

GLM‑4.6V and Flash get rapid support across Hugging Face, MLX‑VLM, SGLang and tools

GLM‑4.6V targets frontend devs with design‑to‑code generation

vLLM ships FP8 GLM‑4.6V recipe with tool and reasoning parsers

Early testers lean into GLM‑4.6V for SVG graphics, coding evals and OCR


🧰 Coding agents in practice: Slack handoff, background workers, routers

Claude Code can now be delegated tasks directly from Slack

OpenRouter’s Body Builder lets devs describe multi‑model calls in plain English

Warp adds model comparison cards and an auto‑routing option

Kilo Code debuts an Adoption Dashboard and leans into Copilot comparisons

RepoPrompt moves MCP to Unix sockets and cuts idle CPU to 0.1%

CodeLayer deep agents run planning phases as background sub‑agents


🏗️ Compute supply and DC finance: H200 to China and neocloud funding

US to license Nvidia H200 exports to China with 25% revenue skim

Fluidstack targets ~$700M raise at ~$7B valuation with Google‑backed DC leases


📊 Evals and telemetry: job‑level rankings, Code Arena, trace fan‑out

Arena debuts Occupational rankings to test models by real jobs

OpenRouter Broadcast pipes LLM traces into Langfuse, LangSmith, Datadog and W&B

DeepSeek V3.2 arrives in Code Arena for live coding battles

Step Game update shows GPT‑5.1 and Gemini 3 Pro leading social reasoning


📈 Enterprise adoption and GTM: OpenAI report and agentic commerce

OpenAI’s 2025 enterprise AI report puts hard numbers on workplace usage

ChatGPT turns Instacart into an in‑chat grocery shopping agent

Hugging Face and Google Cloud move 5 GB in 13 seconds


🧪 Frontier signals beyond GLM: Rnj‑1, Gemini Flash whispers, Grok ETA

LM Arena’s ‘Seahawk’ and ‘Skyhawk’ likely tease Gemini 3 Flash variants

Qwen 3 Next arrives on Ollama for local experimentation

Jina releases 2B VLM claiming SOTA multilingual doc understanding


NYT sues Perplexity over paywalled RAG and NYT‑branded hallucinations

Clinical LLMs ace exams but lag badly on real care and safety

Community jailbreak pipeline mass‑generates rich attack prompts

“From FLOPs to Footprints” ties AI training to heavy‑metal footprints

Big Tech–funded AI papers show higher impact and insularity


🔌 MCP interop and agent plumbing

AIGNE paper proposes ‘everything is a file’ abstraction for agent context

Anthropic clarifies how MCP tool calls flow through the context window

mcporter 0.7.1 daemon now hot‑reloads MCP servers on config changes

Amp IDE can now find the exact agent thread that created a file


🎬 Creative stacks: NB Pro workflows, Kling O1 editing, LongCat text fidelity

Kling O1 leans into multimodal video editing, not just text prompts

Meituan’s 6B LongCat-Image rivals 20B+ models in bilingual, text-heavy image work

Nano Banana Pro community is converging on reusable prompt workflows

Pika 2.2 arrives as an API via Fal for apps that need video

Gemini adds NB Pro-powered image resize flow in Thinking mode

NB Pro’s HTML→UI experiment exposes strengths and gaps in layout fidelity


📚 New papers: unified multimodal, realism rewards, agentic video loops

Active Video Perception frames long‑video QA as plan→observe→reflect loops

EMMA proposes a single efficient stack for multimodal understanding, generation, and editing

RealGen uses detector‑guided rewards to push text‑to‑image photorealism

Self‑Improving VLM Judges train themselves without human labels

EditThinker wraps existing image editors with an iterative reasoning layer

MotionV2V edits motion inside videos while keeping appearance fixed

One‑to‑All Animation enables alignment‑free character animation and pose transfer

SpaceControl adds test‑time spatial constraints to 3D generative models

TwinFlow pushes large diffusion models toward one‑step generation


🎙️ Realtime voice and music agents

Lyria Camera turns your phone into a real-time soundtrack generator

ElevenLabs ships real-time Santa voice agent plus AI Christmas music

Builders lean on Gemini Live’s new on-screen visual guidance

Pipecat 0.0.97 tightens voice agent core and adds Gradium models


🦾 Embodied AI in production: farm autonomy and mass humanoids

China doubles down on embodied AI with provincial pilots and big funds

AgiBot reaches 5,000 humanoids in mass production with shared control stack

Honghu T70 electric tractor shows 6‑hour, ±2.5 cm autonomous farm work

Autonomous delivery carts handle grocery routes in rural China

On this page

Executive Summary
Feature Spotlight: Feature: Z.AI’s GLM‑4.6V goes open with native multimodal tool use
🧠 Feature: Z.AI’s GLM‑4.6V goes open with native multimodal tool use
Z.ai launches open GLM‑4.6V and free 4.6V‑Flash with 128K multimodal context
GLM‑4.6V and Flash post strong vision‑language scores vs Qwen and Step‑3
GLM‑4.6V bakes in native multimodal function calling and search‑to‑answer flows
GLM‑4.6V pushes 128K multimodal context to hour‑long videos and large docs
GLM‑4.6V and Flash get rapid support across Hugging Face, MLX‑VLM, SGLang and tools
GLM‑4.6V targets frontend devs with design‑to‑code generation
vLLM ships FP8 GLM‑4.6V recipe with tool and reasoning parsers
Early testers lean into GLM‑4.6V for SVG graphics, coding evals and OCR
🧰 Coding agents in practice: Slack handoff, background workers, routers
Claude Code can now be delegated tasks directly from Slack
OpenRouter’s Body Builder lets devs describe multi‑model calls in plain English
Warp adds model comparison cards and an auto‑routing option
Kilo Code debuts an Adoption Dashboard and leans into Copilot comparisons
RepoPrompt moves MCP to Unix sockets and cuts idle CPU to 0.1%
CodeLayer deep agents run planning phases as background sub‑agents
🏗️ Compute supply and DC finance: H200 to China and neocloud funding
US to license Nvidia H200 exports to China with 25% revenue skim
Fluidstack targets ~$700M raise at ~$7B valuation with Google‑backed DC leases
📊 Evals and telemetry: job‑level rankings, Code Arena, trace fan‑out
Arena debuts Occupational rankings to test models by real jobs
OpenRouter Broadcast pipes LLM traces into Langfuse, LangSmith, Datadog and W&B
DeepSeek V3.2 arrives in Code Arena for live coding battles
Step Game update shows GPT‑5.1 and Gemini 3 Pro leading social reasoning
📈 Enterprise adoption and GTM: OpenAI report and agentic commerce
OpenAI’s 2025 enterprise AI report puts hard numbers on workplace usage
ChatGPT turns Instacart into an in‑chat grocery shopping agent
Hugging Face and Google Cloud move 5 GB in 13 seconds
🧪 Frontier signals beyond GLM: Rnj‑1, Gemini Flash whispers, Grok ETA
LM Arena’s ‘Seahawk’ and ‘Skyhawk’ likely tease Gemini 3 Flash variants
Rnj‑1 open 8B model surges on Hugging Face trending charts
Qwen 3 Next arrives on Ollama for local experimentation
Jina releases 2B VLM claiming SOTA multilingual doc understanding
🛡️ Legal and safety: NYT v. Perplexity, clinic gap, jailbreak datasets
NYT sues Perplexity over paywalled RAG and NYT‑branded hallucinations
Clinical LLMs ace exams but lag badly on real care and safety
Community jailbreak pipeline mass‑generates rich attack prompts
“From FLOPs to Footprints” ties AI training to heavy‑metal footprints
Big Tech–funded AI papers show higher impact and insularity
🔌 MCP interop and agent plumbing
AIGNE paper proposes ‘everything is a file’ abstraction for agent context
Anthropic clarifies how MCP tool calls flow through the context window
mcporter 0.7.1 daemon now hot‑reloads MCP servers on config changes
Amp IDE can now find the exact agent thread that created a file
🎬 Creative stacks: NB Pro workflows, Kling O1 editing, LongCat text fidelity
Kling O1 leans into multimodal video editing, not just text prompts
Meituan’s 6B LongCat-Image rivals 20B+ models in bilingual, text-heavy image work
Nano Banana Pro community is converging on reusable prompt workflows
Pika 2.2 arrives as an API via Fal for apps that need video
Gemini adds NB Pro-powered image resize flow in Thinking mode
NB Pro’s HTML→UI experiment exposes strengths and gaps in layout fidelity
📚 New papers: unified multimodal, realism rewards, agentic video loops
Active Video Perception frames long‑video QA as plan→observe→reflect loops
EMMA proposes a single efficient stack for multimodal understanding, generation, and editing
RealGen uses detector‑guided rewards to push text‑to‑image photorealism
Self‑Improving VLM Judges train themselves without human labels
EditThinker wraps existing image editors with an iterative reasoning layer
MotionV2V edits motion inside videos while keeping appearance fixed
One‑to‑All Animation enables alignment‑free character animation and pose transfer
SpaceControl adds test‑time spatial constraints to 3D generative models
TwinFlow pushes large diffusion models toward one‑step generation
🎙️ Realtime voice and music agents
Lyria Camera turns your phone into a real-time soundtrack generator
ElevenLabs ships real-time Santa voice agent plus AI Christmas music
Builders lean on Gemini Live’s new on-screen visual guidance
Pipecat 0.0.97 tightens voice agent core and adds Gradium models
🦾 Embodied AI in production: farm autonomy and mass humanoids
China doubles down on embodied AI with provincial pilots and big funds
AgiBot reaches 5,000 humanoids in mass production with shared control stack
Honghu T70 electric tractor shows 6‑hour, ±2.5 cm autonomous farm work
Autonomous delivery carts handle grocery routes in rural China