Learning What to Recommend: Minimax Optimal Simple Regret in Logistic Bandits

arXiv:2601.21167v2 Announce Type: replace Abstract: We study stochastic logistic bandits with $d$-dimensional action features under the simple-regret objective, where a learner uses $T$ rounds of exploration to output a single final action. The logistic structure is essential here: because the informativeness of an action depends on the local curvature of the sigmoid, actions that are best for immediate reward need not be the most useful for identifying the best final recommendation. We show that the first-order minimax difficulty is governed by $\kappa_*$, the inverse slope of the sigmoid at

Source: arXiv cs.LG — read the full report at the original publisher.

This is a curated wire item. The Continuum Brief does not republish full third-party articles; this entry links to the original source.

Stay ahead of the systems reshaping markets.