Asymptotic Optimality of Thompson Sampling for Risk-Averse Bandits with Sub-Gaussian Rewards

arXiv:2606.09191v1 Announce Type: new Abstract: We prove that $\rho\text{-}\mathrm{NPTS}_{\mathrm{SG}}$, an anchor-free nonparametric Thompson Sampling algorithm for risk-averse bandits, achieves regret matching the instance-dependent lower bound to leading order in $\log n$, establishing it as asymptotically optimal for any continuous risk functional $\rho$ (CVaR, mean-variance, Sharpe ratio, distortion risk measures, and more) on the class of distributions with bounded density and sub-Gaussian tails, including Gaussian arms. Both this result and its bounded-support counterpart require only c
The paper contributes to the ongoing research in AI and machine learning, particularly in the domain of risk-averse decision-making, which is an active area of academic and practical interest.
Improved Thompson Sampling algorithms for risk-averse bandits can enhance the robustness and efficiency of AI systems operating under uncertainty, particularly in financial, medical, and control applications.
This research provides a more theoretically sound and asymptotically optimal method for various risk-averse bandit problems, potentially leading to more reliable and predictable AI agent behavior in critical applications.
- · AI researchers
- · Quantitative finance
- · Autonomous systems developers
- · Healthcare AI
- · Less efficient bandit algorithms
- · Systems with high risk exposure
More sophisticated AI decision-making under risk becomes possible in real-world applications.
Increased adoption of AI agents in high-stakes environments where risk mitigation is paramount.
Financial markets and critical infrastructure become more resilient due to AI-driven risk management using such algorithms.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG