arXiv:2605.26459v1 Announce Type: new Abstract: Muon-style optimizers take a matrix-valued momentum or preconditioned update $B = U \operatorname{diag}(\sigma_1,\ldots,\sigma_r) V^\top$ and replace it with its canonical partial polar factor $\operatorname{Pol}(B) = U V^\top$. This maps every nonzero singular value to one. MuCon is the clipped-Muon variant studied here: it applies singular-value clipping to the same Muon matrix, $D^{\mathrm{MuCon}}\_\tau(B) = \operatorname{MClip}\_\tau(B) = U \operatorname{diag}\bigl(\min\{\sigma\_i,\tau\}\bigr) V^\top, \qquad \tau > 0$. Thus, $\operatorname{MC

Source: arXiv cs.LG — read the full report at the original publisher.

This is a curated wire item. The Continuum Brief does not republish full third-party articles; this entry links to the original source.