
arXiv:2510.13537v2 Announce Type: replace-cross Abstract: On-device deployment of Large Language Models (LLMs) frequently leverages Low-Rank Adapters (LoRAs) to support diverse downstream tasks under tight resource constraints. To address the limited storage capacity of mobile devices, recent works have explored model merging techniques to fuse multiple LoRAs into a single one. In practice, however, LoRAs are often delivered incrementally, as users request support for new tasks (e.g., novel problem types or languages). This scenario introduces a new challenge: on-device online continual mergin
The proliferation of LLMs and the constraints of on-device deployment are driving innovation in efficient model management, making techniques like 'online continual merging' critical for practical application.
This research addresses a core technical challenge for decentralized and resource-constrained AI, enabling more adaptive and self-contained AI systems outside of hyperscale data centers.
The ability to continually merge AI model adapters on-device allows for more flexible, up-to-date, and efficient LLM deployment on mobile and edge devices without constant network dependency.
- · Mobile device manufacturers
- · Edge AI developers
- · AI-powered application developers
- · On-device LLM end-users
- · Cloud-dependent AI service providers (for certain use cases)
- · LLM architectures requiring high, continuous bandwidth
On-device LLMs become more capable and independent, reducing reliance on cloud infrastructure for updates and new tasks.
This improved on-device capability could accelerate the development of sophisticated, personalized AI agents running locally.
Enhanced local AI autonomy might lead to new paradigms of data ownership and privacy, as less raw user data needs to traverse to central servers.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL