
arXiv:2603.10067v2 Announce Type: replace Abstract: Muon has recently shown promising results in LLM training. In this work, we study how to further improve Muon. We argue that Muon's orthogonalized update rule suppresses the emergence of heavy-tailed weight spectra and over-emphasizes the training along noise-dominated directions. Motivated by the Heavy-Tailed Self-Regularization (HT-SR) theory, we propose HTMuon. HTMuon preserves Muon's ability to capture parameter interdependencies while producing heavier-tailed updates and inducing heavier-tailed weight spectra. Experiments on LLM pretrain
The paper demonstrates an improvement to Muon, a recent LLM training method, indicating continuous refinement in core AI training algorithms as LLM development matures.
Sophisticated readers should care about incremental improvements in LLM training, as these directly impact the efficiency, performance, and capabilities of future large language models.
The proposed HTMuon method offers a way to induce heavier-tailed weight spectra, potentially leading to more robust and higher-performing LLMs compared to the original Muon.
- · LLM developers
- · AI research institutions
- · Cloud providers
- · Inefficient LLM training methods
Improved performance metrics for Large Language Models using HTMuon.
Faster development cycles and deployment of more capable AI assistants and applications.
Increased competition and innovation in the AI model architecture and training landscape.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG