
arXiv:2512.06609v3 Announce Type: replace Abstract: Vector-quantized variational autoencoders (VQ-VAEs) are discrete autoencoders that compress images into discrete tokens. However, they are difficult to train due to discretization. In this paper, we propose a simple yet effective technique dubbed Gaussian Quant (GQ), which first trains a Gaussian VAE under certain constraints and then converts it into a VQ-VAE without additional training. For conversion, GQ generates random Gaussian noise as a codebook and finds the closest noise vector to the posterior mean. Theoretically, we prove that when
The continuous evolution of AI models demands more efficient and stable training methods for discrete autoencoders, making new techniques like Gaussian Quant timely.
Improved training stability and efficiency for VQ-VAEs can accelerate the development of discrete token-based AI, impacting various applications from image generation to multimodal learning.
The proposed method, Gaussian Quant, simplifies the training process for VQ-VAEs by removing the need for additional training after initial constraint-based training, potentially democratizing access and reducing computational overhead.
- · AI researchers and developers
- · Companies developing discrete AI models
- · Hardware providers for AI training
- · Developers reliant on complex, resource-intensive VQ-VAE training methods
Easier and more stable training of discrete autoencoders could lead to faster innovation in AI models that rely on tokenized representations.
Generalized adoption of such training-free quantization could reduce the computational requirements for developing certain classes of AI, lowering entry barriers.
More efficient discrete autoencoders might enable AI systems to process and generate complex data types with greater fidelity and speed, accelerating progress in fields like synthetic media and advanced perception.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG