Physical Intelligence says its RL token compresses VLA state into a lightweight signal that an on-robot actor-critic can adapt in minutes. This matters for last-millimeter manipulation, where full-size models are often too slow or too coarse to tune online.

Physical Intelligence is pitching RL token as a narrow interface between a large vision-language-action model and a much smaller reinforcement learning module. In the company's research summary, the VLA's high-dimensional embeddings are compressed through an encoder bottleneck into a low-dimensional token, which is then optimized to preserve task-relevant information.
That token becomes the input to a small actor-critic policy that runs online. According to the thread, the actor takes both the RL token and the base model's proposed action, then learns a residual correction rather than replacing the full policy. The
shows the loop explicitly: rollout on the robot, replay buffer, then repeated actor-critic updates tied back to the tokenized state.
The practical claim is not broader autonomy but faster refinement at the hardest part of execution. Physical Intelligence's announcement thread frames the failure mode as the "last millimeter," where coarse model outputs are good enough to reach the object but not to complete delicate alignment, insertion, or tool-use steps.
The reported advantage is sample efficiency. The research page says the small RL module can be trained directly on the robot with off-policy updates at hundreds of steps per second, and the thread says that was enough to improve behavior in about 15 minutes of practice. The same sources claim up to 3× faster task execution, fewer mistakes, and in some cases performance faster than human teleoperation.
For engineers, the architectural point is clear: keep the large pretrained VLA as the general controller, then add a compact, fast-learning adapter for precision corrections. That avoids full-model online tuning while still giving the system a way to specialize during deployment research summary.
NVIDIA published Nemotron-Cascade 2, a 30B MoE with 3B active parameters, claiming IMO gold-level math and Kimi K2.5-class code scores, then pushed it to Hugging Face and Ollama. It is worth testing if you want an open agent model with immediate local and hosted paths.
releaseOpenClaw shipped version 2026.3.22 with ClawHub, OpenShell plus SSH sandboxes, side-question flows, and more search and model options, then followed with a 2026.3.23 patch. Teams get a broader plugin surface, but should patch quickly and review plugin trust boundaries as the ecosystem grows.
releaseCursor shipped Instant Grep, a local regex index built from n-grams, inverted indexes, and Bloom filters that drops large-repo searches from seconds to milliseconds. Faster candidate retrieval shortens the coding-agent loop, especially when ripgrep-style scans become the bottleneck.
breakingChatGPT now saves uploaded and generated files into an account-level Library that can be reused across conversations from the web sidebar or recent-files picker. It removes repetitive re-uploading and makes past PDFs, spreadsheets, and images part of a persistent working context.
breakingEpoch AI says GPT-5.4 Pro elicited a publishable solution to one 2019 conjecture in its FrontierMath Open Problems set, with a formal writeup planned. Treat it as an early milestone worth reproducing, not blanket evidence that frontier models can already automate math research.
.@physical_int introduced an RL token It's a compact snapshot of the robot’s state that lets a small model quickly learn and fine-tune actions in real time Why is it needed? Robots today became surprisingly capable, but the failure happens at the same place: the last Show more