Z.ai GLM‑4.6V opens 106B VLM – 128K context, $0.60 per million tokens
Stay in the loop
Free daily newsletter & Telegram daily report
Executive Summary
Z.ai is throwing a serious gauntlet in the open vision‑language space: GLM‑4.6V, a 106B‑parameter multimodal model with 128K context, shipped today with public weights, native tool use, and an API priced at $0.60 / $0.90 per million input / output tokens. Its 9B sibling, GLM‑4.6V‑Flash, is not only open but free to call via API, giving teams a practical low‑latency option for local or cheap hosted runs.
What’s new here isn’t just another VLM checkpoint, it’s the stack around it. The model handles long video and document workloads end‑to‑end—think one‑hour matches or ~150‑page reports in a single pass—and bakes in multimodal function calling so it can pass screenshots and PDFs into tools, hit search or RAG backends, then visually re‑read charts before answering. Benchmarks show 88.8 on MMbench V1.1 and competitive MMMU‑Pro scores, often matching or beating larger open rivals like Qwen3‑VL‑235B and Step‑3‑321B.
Ecosystem support landed day‑zero: vLLM 0.12.0 ships an FP8 recipe with 4‑way tensor parallelism and tool parsers, MLX‑VLM and SGLang already have integrations, and indie apps are using it for OCR‑to‑JSON and design‑to‑code flows. Net effect: wherever you’d normally reach for Qwen or LLaVA, GLM‑4.6V is now a credible toggle in the dropdown rather than a science project.
Top links today
- Agentic file system abstraction for context
- EditThinker iterative reasoning for image editors
- From FLOPs to Footprints resource cost paper
- Big Tech funded AI papers analysis
- Clinical LLM performance and safety evaluation
- Fluidstack neocloud financing and valuation report
- IBM reportedly nearing $11B Confluent acquisition
- New York Times lawsuit against Perplexity AI
- Jensen Huang on gradual AI adoption and work
- Jamie Dimon on AI, jobs and workweeks
- Apple leadership shakeup and AI strategy
- Google Gemini smart glasses plans for 2026
- Tech M&A landscape and 2025 deal volume overview
Feature Spotlight
Feature: Z.AI’s GLM‑4.6V goes open with native multimodal tool use
Open GLM‑4.6V/Flash add native multimodal function calling and 128K context; day‑0 vLLM support, free Flash tier, and docs make it a practical, low‑latency VLM option for real products.
Cross‑account launch dominates today: open GLM‑4.6V (106B) and 4.6V‑Flash (9B) add native function calling, 128K multimodal context, day‑0 vLLM serve, docs, pricing. Many demos stress long‑video/doc handling and design‑to‑code flows.
Jump to Feature: Z.AI’s GLM‑4.6V goes open with native multimodal tool use topics