SIGNALAI·Jun 30, 2026, 4:00 AMSignal75Short term

Learning to Route and Schedule LLMs from User Retrials via Contextual Queueing Bandits

arXiv:2602.02061v2 Announce Type: replace Abstract: Explosive demands for LLMs often cause user queries to accumulate in server queues, requiring efficient routing (query-LLM matching) and scheduling (query prioritization) mechanisms. Several online algorithms are being deployed, but they overlook the following two key challenges inherent to conversational LLM services: (1) unsatisfied users may retry queries, increasing the server backlog, and (2) requests for ``explicit" feedback, such as ratings, degrade user experiences. In this paper, we develop a joint routing and scheduling algorithm th

Why this matters

Why now

The explosive demand for LLMs is creating significant backlogs and user dissatisfaction, necessitating immediate algorithmic solutions for resource management.

Why it’s important

Efficient routing and scheduling of LLMs are critical for scaling AI services, improving user experience, and retaining market share in a highly competitive and resource-constrained environment.

What changes

New algorithms that account for user retrials and avoid explicit feedback requests will optimize LLM infrastructure, potentially leading to more seamless and scalable AI services.

Winners

· Cloud providers offering LLM services
· Companies developing LLM router/scheduler software
· Users of LLM-powered applications

Losers

· LLM service providers with inefficient queueing systems
· Companies relying on explicit user feedback for model improvement

Second-order effects

Direct

Improved user satisfaction and reduced operational costs for LLM providers due to more efficient resource utilization.

Second

Accelerated adoption of LLM-powered applications as performance and responsiveness improve, driving further demand for compute infrastructure.

Third

The necessity for sophisticated resource management in AI becomes a new standard, influencing the design of future AI systems and potentially leading to specialized 'AI operations' sectors.

Editorial confidence: 90 / 100 · Structural impact: 60 / 100

Original report

This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.

Read at arXiv cs.LG

#cs.LG

Tracked by The Continuum Brief · live intelligence network

The Brief · Weekly Dispatch

Stay ahead of the systems reshaping markets.