SIGNALAI·Jun 17, 2026, 4:00 AMSignal75Short term

Online LLM Selection via Constrained Bandits with Time-Varying Demand

arXiv:2606.17489v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly deployed in edge-cloud inference systems to handle diverse user tasks with heterogeneous accuracy, latency, and cost profiles. Selecting the appropriate LLM for each incoming task is critical for ensuring service quality and efficient resource utilization. However, model heterogeneity, stochastic and unknown performance characteristics, and time-varying task demands make static selection strategies inadequate. Real-world deployments often impose hard resource budgets such as monetary expenditure lim

Why this matters

Why now

The proliferation of diverse LLMs and their deployment in real-world inference systems necessitate dynamic selection strategies to manage performance, cost, and resource constraints effectively.

Why it’s important

This research addresses a critical operational challenge in deploying LLMs, impacting efficiency, cost-effectiveness, and quality of service, especially as LLMs become foundational infrastructure.

What changes

The shift from static to dynamic, adaptive LLM selection allows for more efficient resource utilization and better performance guarantees in varied, real-time environments.

Winners

· Cloud providers
· AI-powered enterprises
· Users of LLM applications
· Edge computing infrastructure

Losers

· Inefficient LLM deployment strategies
· Enterprises with high compute waste

Second-order effects

Direct

Reduced operational costs and improved application performance for LLM-dependent services.

Second

Accelerated adoption of more complex and diverse LLM architectures due to better management tools.

Third

Increased competition among LLM providers as selection mechanisms become more sophisticated in evaluating real-world performance.

Editorial confidence: 90 / 100 · Structural impact: 60 / 100

Original report

This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.

Read at arXiv cs.AI

#cs.LG #cs.AI

Tracked by The Continuum Brief · live intelligence network

The Brief · Weekly Dispatch

Stay ahead of the systems reshaping markets.