More Bang for the Buck: Improving the Inference of Large Language Models at a Fixed Budget using Reset and Discard (ReD)

arXiv:2601.21522v2 Announce Type: replace Abstract: The performance of large language models (LLMs) on verifiable tasks is usually measured by pass@k, the probability of answering a question correctly at least once in k trials. At a fixed budget, a more suitable metric is coverage@cost, the average number of unique questions answered as a function of the total number of attempts. We connect the two metrics and show that the empirically-observed power-law behavior in pass@k leads to a sublinear growth of the coverage@cost (diminishing returns). To solve this problem, we propose Reset-and-Discar
The continuous drive to optimize Large Language Models (LLMs) for efficiency and cost-effectiveness compels innovation in inference techniques.
This development allows for more efficient utilization of computational resources, directly impacting the operational costs and scalability of advanced AI applications.
New methods like Reset and Discard (ReD) could significantly improve the practical performance of LLMs at given budgetary constraints, making them more accessible and deployable.
- · AI developers and companies
- · Cloud computing providers
- · Organizations deploying LLMs
- · Less efficient LLM inference techniques
- · High-cost specialized AI hardware
LLMs can achieve higher performance or coverage for the same computational budget.
Increased adoption and broader application of sophisticated LLMs become more feasible due to reduced operational costs.
The economic viability of AI agents and complex autonomous systems improves, accelerating their development and deployment.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG