arXiv:2607.01127v1 Announce Type: new Abstract: Quantization has become an invaluable tool to reduce memory requirements and inference speed of modern language models, in particular to make them available for consumer setups and edge devices. While previous work has primarily focused on uniform quantization codebooks, such approaches are prone to suboptimal representations due to low-frequency high-magnitude weights. We introduce Log$_\text{b}$Quant, a novel logarithmic quantization approach with adjustable bases, to adapt to common parameter distributions. We show that our method exhibits sup

Source: arXiv cs.CL — read the full report at the original publisher.

This is a curated wire item. The Continuum Brief does not republish full third-party articles; this entry links to the original source.