releaseMarch 18, 2026

Chandra OCR 2 opens with 85.9 on olmOCR Bench and 90+ language support

Datalab-to open-sourced Chandra OCR 2, a 4B document model with repo, weights, demo, and CLI quickstart, and claims state-of-the-art 85.9 on olmOCR Bench. It gives document pipelines a practical multilingual OCR option that can run with local tooling instead of only hosted APIs.

Multimodal Benchmarks Developer Experience

3 min read

Chandra OCR 2 opens with 85.9 on olmOCR Bench and 90+ language support

TL;DR

Datalab-to open-sourced Chandra OCR 2 as a 4B document OCR model, with the launch post saying it reached “85.9% (sota) on olmocr bench” and added “90+ language support” launch post.
The release includes a distribution post with GitHub, Hugging Face, and demo endpoints, plus a CLI quickstart that installs chandra-ocr, starts chandra_vllm, and runs chandra input.pdf ./output quickstart.
According to the linked GitHub repo and HF weights, Chandra OCR 2 supports local inference through Transformers or a vLLM server and returns layout-aware outputs in formats including Markdown, HTML, and JSON