releaseMarch 17, 2026

H Company releases Holotron-12B: 8.9k tok/s on H100 and 80.5% WebVoyager

H Company launched Holotron-12B, an open multimodal model for computer-use agents built on a hybrid SSM-attention stack that targets KV-cache bottlenecks. Benchmark it if you need high-concurrency browser agents and want better throughput without giving up web-task accuracy.

Multimodal Computer Use KV Cache Reinforcement Learning

2 min read

H Company releases Holotron-12B: 8.9k tok/s on H100 and 80.5% WebVoyager

TL;DR

H Company launched Holotron-12B launch, an open multimodal model built with NVIDIA for "computer-use agents," and says it is tuned for web, Android, and mobile interaction workloads rather than generic vision-language chat.
The company says Holotron-12B is post-trained from Nemotron-Nano-12B-v2-VL and uses a hybrid SSM-attention stack that targets the "KV Cache" bottleneck for higher concurrency architecture details.
On H Company's reported benchmarks, the model reaches 8.9k tokens/s on a single H100, runs at "over 2x" the throughput of Holo2-8B, and improves WebVoyager from 35.1% to 80.5% performance claims.
H Company also said in the partner update that it has early access to NVIDIA's Nemotron 3 Omni and plans to use its MoE base for future low-latency enterprise agent deployments.