QAM-W: Joint 2D Codebook Quantization for LLM Weights via Hadamard Rotation and Activation-Aware Scaling

arXiv:2605.26339v1 Announce Type: new Abstract: Scalar post-training quantizers discard pairwise coordinate structure within weight rows. We introduce QAM-W (Quadrature Amplitude Modulation for Weights), a codec that recovers this structure: each row is L2-normalized, block-Hadamard rotated, paired into 2D coordinates, and quantized against a single Lloyd-Max codebook trained on the unit circular Gaussian, with activation-aware per-channel scaling. In a cross-model study spanning five LLMs from four families (1.1B--13B parameters) and eight quantized configurations, the activation-aware varian
The continuous drive to optimize large language models (LLMs) for efficiency and deployment on constrained hardware environments necessitates novel quantization techniques.
Sophisticated readers should care about QAM-W because it represents a significant advancement in LLM quantization, enabling more efficient deployment and operation of powerful AI models across a wider array of devices and computational budgets.
The ability to quantize LLM weights more effectively means that the operational footprint of these models is reduced, potentially lowering the computational and energy demands for their use.
- · AI hardware manufacturers
- · LLM developers
- · Edge AI providers
- · Cloud infrastructure providers
- · Inefficient AI model architectures
- · High-power computing dependency
Improved model inference efficiency and reduced memory footprint for large language models.
Accelerated adoption of LLMs in environments with limited computational resources, such as mobile or edge devices.
Potential for new AI applications and services that were previously economically or technically unfeasible due to high compute requirements.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG