EdgeRazor: A Lightweight Framework for Large Language Models via Mixed-Precision Quantization-Aware Distillation

arXiv:2605.04062v2 Announce Type: replace Abstract: Quantization has emerged as a mainstream approach for deploying Large Language Models (LLMs) on resource-constrained devices, yet compressing precision below 4-bit typically causes severe performance degradation or prohibitive retraining costs. In this paper, we propose EdgeRazor, a lightweight framework for LLMs via Mixed-Precision Quantization-Aware Distillation. It contains three modules: Structural Quantization with Mixed Precision for fine-grained control of bit-widths, Layer-Adaptive Feature Distillation that dynamically selects the mos
The proliferation of Large Language Models (LLMs) creates an urgent demand for efficient deployment on edge devices, addressing current computational and energy constraints.
This development allows for broader accessibility and integration of advanced AI capabilities into resource-constrained environments, expanding the practical applications of LLMs.
The ability to run sophisticated LLMs efficiently on smaller devices reduces the need for constant cloud connectivity and high-end hardware, making AI more ubiquitous.
- · Edge device manufacturers
- · AI application developers
- · Sectors requiring on-device AI
- · Consumers of AI-powered devices
- · Companies reliant solely on cloud-based LLM inference
- · Manufacturers of overly specialized, high-power AI accelerators for edge
- · Traditional, unoptimized large LLMs
More powerful AI features become standard on smartphones, IoT devices, and autonomous systems.
Increased competition among device manufacturers to integrate advanced, efficient on-device AI, accelerating innovation cycles.
Potential for new privacy-preserving AI applications as less data needs to be sent to the cloud for processing.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG