
arXiv:2604.08782v3 Announce Type: replace Abstract: Large language models (LLMs) suffer significant performance degradation when user instructions and context are distributed over multiple conversational turns, yet multi-turn (MT) interactions dominate chat interfaces. The routine approach of appending full chat history to prompts rapidly exhausts context windows, leading to increased latency, higher computational costs, and diminishing returns as conversations extend. We introduce MT-OSC, a One-off Sequential Condensation framework that efficiently and automatically condenses chat history in
The proliferation of LLMs in chat interfaces has made the limitations of context windows and multi-turn conversations a pressing and widely experienced problem.
Efficiently handling multi-turn conversations is critical for the practical deployment and user experience of large language models, impacting their utility across various applications.
This framework offers a concrete technical solution to a major bottleneck in sustained, natural language interactions with LLMs, potentially improving performance and reducing operational costs.
- · AI developers
- · LLM application providers
- · SaaS companies integrating LLMs
- · Cloud infrastructure providers
- · LLM architectures reliant on full history
- · Companies with inefficient context management
Increased practical utility and adoption of LLMs in complex, multi-turn conversational agents.
Reduced computational costs for long-running LLM conversations, potentially enabling new business models.
Enhanced user experience with AI conversational interfaces could accelerate the displacement of traditional white-collar workflows.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL