arXiv:2605.27465v1 Announce Type: cross Abstract: The quadratic cost of self-attention in Vision Transformers (ViTs) constitutes a fundamental bottleneck for practical deployment, motivating a vibrant line of research on token reduction. Among existing approaches, token merging (ToMe) has emerged as an elegant training-free solution; yet its design rests on an unspoken premise of token equality, which contravenes the well-documented non-uniformity of self-attention and leads to information loss in high-salience tokens under aggressive compression. We address this limitation with AdaMerge, a to
Source: arXiv cs.AI — read the full report at the original publisher.
