UT Austin researchers report that simple sequential fine-tuning with LoRA and on-policy RL can retain prior skills while learning new VLA tasks. Try this baseline before reaching for more complex continual-learning methods.

The core claim is that naive-looking sequential fine-tuning is not as brittle as the continual-learning literature often assumes. In the setup post, the recipe is simply “trains the model on one task,” then fine-tunes on the next, and keeps updating the same model as new tasks arrive; the thread opener says that with a large pretrained VLA, LoRA, and on-policy RL, this approach can “prevent catastrophic forgetting” while keeping “strong zero-shot abilities.”
The strongest implementation detail for engineers is that this is presented as a baseline that can rival heavier continual-learning machinery. According to the results post, the simple stack “even beats more complicated continual learning methods” in many cases, while multitask training still keeps a small edge of roughly 5% before extra training on weaker tasks closes the gap.
The thread’s explanation is that each component limits destructive updates in a different way. The method summary says pretrained VLAs already carry strong general knowledge, LoRA “updates only small parts of the model,” and on-policy RL lets the model adapt while interacting with the environment instead of making abrupt offline shifts.
The more specific claim is that policy-gradient RL “reduces forgetting” because updates are based on actions from the current policy, so training stays near behaviors the model already performs, per the RL explanation. LoRA then narrows the update space: the LoRA post says it restricts changes to a low-rank subspace, with “rank ≈ 29 instead of ~200,” smaller overall update magnitude, and more evenly distributed changes. Another thread post adds the paper’s intuition that large pretrained models offer many nearly orthogonal “safe directions” for learning new tasks without overwriting old ones.
This is not just a social thread. The release post links both the arXiv paper and the GitHub repo, and the linked summary says the code covers continual RL for VLA models with benchmark baselines, PPO and GRPO implementations, and evaluation tooling.
For practitioners working on embodied agents, the useful takeaway is less “new algorithm” than “new default baseline.” The release positions Seq. FT plus LoRA plus on-policy RL as the thing to beat first, not the straw man, and the reported results suggest its main tradeoff is modestly trailing multitask training unless compute is spent longer on the weakest tasks.
Skyler Miao said MiniMax M2.7 open weights are due in roughly two weeks, with updates tuned for agent tasks. Separate replies also confirm multimodal M3, so local-stack builders should watch both the drop and the benchmark setup.
releaseOpenClaw shipped version 2026.3.22 with ClawHub, OpenShell plus SSH sandboxes, side-question flows, and more search and model options, then followed with a 2026.3.23 patch. Teams get a broader plugin surface, but should patch quickly and review plugin trust boundaries as the ecosystem grows.
releaseCursor shipped Instant Grep, a local regex index built from n-grams, inverted indexes, and Bloom filters that drops large-repo searches from seconds to milliseconds. Faster candidate retrieval shortens the coding-agent loop, especially when ripgrep-style scans become the bottleneck.
breakingChatGPT now saves uploaded and generated files into an account-level Library that can be reused across conversations from the web sidebar or recent-files picker. It removes repetitive re-uploading and makes past PDFs, spreadsheets, and images part of a persistent working context.
breakingEpoch AI says GPT-5.4 Pro elicited a publishable solution to one 2019 conjecture in its FrontierMath Open Problems set, with a formal writeup planned. Treat it as an early milestone worth reproducing, not blanket evidence that frontier models can already automate math research.
6. Results: 1) Plasticity is high because of LoRA+RL: RL provides little information per episode, so LoRA still has enough capacity to learn new tasks. 2) Zero-shot generalization improves because sequential training shifts the objective, implicitly regularizing and Show more
5. Why LoRA helps LoRA (Low-Rank Adaptation) restricts weight updates to a low-rank subspace. This prevents large destructive changes to individual layers, so: - updates are small and evenly distributed - rank ≈ 29 instead of ~200 - overall update magnitude is much smaller
2. Why does this work? 3 things complement each other: - Large pretrained VLA models already contain strong general knowledge - LoRA (parameter-efficient fine-tuning) updates only small parts of the model, which protects previous knowledge - On-policy RL helps the model adapt Show more