The toolkit sweeps contiguous layer ranges in GGUF and llama.cpp-style setups to test whether duplicating them can unlock better reasoning without retraining. Treat the jump as a reproducible experiment, not a settled mechanism, because thread responses challenge whether the effect reflects circuits, routing, or training artifacts.

sweep.py for circuit discovery, layer_path.py for path modification, and evaluation scripts, according to the toolkit page.Posted by xlayn
Toolkit replicating Ng's RYS method to find and exploit reasoning circuits in LLMs by duplicating specific contiguous layer blocks during inference, boosting performance without training. Examples: duplicating layers 7-9 in Qwen2.5-32B improves reasoning by 17%; layers 12-14 in Devstral-24B raise logical deduction from 0.22 to 0.76 on BBH. Includes sweep.py for circuit discovery, layer_path.py for path modification, and evaluation scripts. Requires llama.cpp, GGUF models, tested on Mistral/Qwen architectures.
The release is a small research toolkit around inference-time model surgery. The toolkit page says it replicates Ng's RYS method by duplicating specific contiguous layer blocks during inference, working with llama.cpp and GGUF models and tested on Mistral- and Qwen-family architectures.
That makes the engineering contribution concrete: instead of retraining or merging checkpoints, the workflow searches for layer spans that are worth repeating at runtime. The bundled tools cover three steps — search, path modification, and eval — via sweep.py, layer_path.py, and evaluation scripts, as described on the repo page and summarized in the Show HN post.
Posted by xlayn
Useful as a concrete example of inference-time model surgery: duplicating selected contiguous layers in GGUF/llama.cpp, sweeping layer ranges, and checking benchmark impacts. The discussion also surfaces the main engineering questions: whether the effect is a genuine circuit phenomenon, a routing/looping artifact, or a training-time/architecture interaction.
The headline result is large enough to get attention. In the HN summary, the author reports that duplicating three layers in a 24B model pushed logical deduction from 0.22 to 0.76, and the linked repo page gives a second example where duplicating layers 7-9 in Qwen2.5-32B improved reasoning by 17%.
Posted by xlayn
Thread discussion highlights: - 4bpp on Skepticism about mechanism: The proposed explanation "does not pass the smell test"; suggests duplicated layers may be near-identity blocks or that training/RLHF degraded reasoning in a way duplication partially undoes. - Lerc on Looping / router interpretation: Argues the result looks more like a higher-level MoE-style routing problem, with the model’s thinking phase made into a looping layer and a router choosing patterns like 13,13,14,14,15,15,16. - woadwarrior01 on Prior art: Compares the work to Solar 10.7B and its depth up-scaling technique, noting that earlier approach repeated layers during continued training.
But the mechanism is very much in dispute. One commenter says the proposed explanation “does not pass the smell test,” arguing duplicated layers may be “near-identity blocks” or may undo reasoning damage introduced during training or RLHF skeptical thread. Another suggests the pattern looks more like a routing or looping effect — “a higher-level MoE-style routing problem” with paths such as 13,13,14,14,15,15,16 — rather than evidence of a clean circuit interpretation routing interpretation. A third comparison points to Solar 10.7B depth up-scaling, where repeated layers appeared during continued training, which makes this look more like an inference-time variant of a known idea than a wholly new primitive prior-art note.
Epoch AI says GPT-5.4 Pro elicited a publishable solution to one 2019 conjecture in its FrontierMath Open Problems set, with a formal writeup planned. Treat it as an early milestone worth reproducing, not blanket evidence that frontier models can already automate math research.
releaseOpenClaw shipped version 2026.3.22 with ClawHub, OpenShell plus SSH sandboxes, side-question flows, and more search and model options, then followed with a 2026.3.23 patch. Teams get a broader plugin surface, but should patch quickly and review plugin trust boundaries as the ecosystem grows.
releaseCursor shipped Instant Grep, a local regex index built from n-grams, inverted indexes, and Bloom filters that drops large-repo searches from seconds to milliseconds. Faster candidate retrieval shortens the coding-agent loop, especially when ripgrep-style scans become the bottleneck.
breakingChatGPT now saves uploaded and generated files into an account-level Library that can be reused across conversations from the web sidebar or recent-files picker. It removes repetitive re-uploading and makes past PDFs, spreadsheets, and images part of a persistent working context.
breakingEpoch AI says GPT-5.4 Pro elicited a publishable solution to one 2019 conjecture in its FrontierMath Open Problems set, with a formal writeup planned. Treat it as an early milestone worth reproducing, not blanket evidence that frontier models can already automate math research.