arXiv:2605.10299v2 Announce Type: replace Abstract: This paper studies kernelized bandits (also known as Gaussian process bandits) in an adversarial environment, where the reward functions in a known reproducing kernel Hilbert space (RKHS) may be adversarially chosen at each round. We show that the exponential-weight algorithm achieves $\tilde{O}(\sqrt{T \gamma_T})$ adversarial regret, where $T$ and $\gamma_T$ denote the number of total rounds and the maximum information gain, respectively. For squared exponential (SE) and $\nu$-Mat\'ern kernels, we also show algorithm-independent lower bounds
Source: arXiv cs.LG — read the full report at the original publisher.
