releaseMarch 23, 2026

Miles adds ROCm support on AMD Instinct and raises AIME to 0.729

Miles added ROCm support for AMD Instinct clusters and reported GRPO post-training gains on Qwen3-30B-A3B, including AIME rising from 0.665 to 0.729. It matters if you are evaluating rollout-heavy RL jobs off NVIDIA and want concrete throughput and step-time numbers before porting.

LLM Serving Inference Optimization GPU Infrastructure Reinforcement Learning

3 min read

Miles adds ROCm support on AMD Instinct and raises AIME to 0.729

TL;DR

LMSYS and AMD say Miles now supports ROCm on MI300/350-class Instinct clusters, bringing its end-to-end RL post-training stack to non-NVIDIA hardware through the open-source Miles repo and the accompanying ROCm blog post.
The headline training result is a GRPO run on Qwen3-30B-A3B where AIME improved from 0.665 to 0.729, according to LMSYS's launch thread.
The rollout-side numbers are concrete enough to matter for cluster planning: LMSYS's performance thread reports roughly 1.1-1.3k tokens per GPU per second on MI300X and a mean step time of 388.5 seconds on one 8-GPU node.
The stack is positioned as more than a single benchmark run: the