SIGNALAI·May 29, 2026, 4:00 AMSignal0Short term

Certified Policy Optimisation for Nested Causal Bandits via PAC-Bayes Risk

Source: arXiv cs.LG

Share
Certified Policy Optimisation for Nested Causal Bandits via PAC-Bayes Risk

arXiv:2605.29788v1 Announce Type: cross Abstract: Critical sequential decisions are rarely single-timescale: a strategic decision causally shapes the context in which every subsequent tactical choice is made; standard bandit and reinforcement-learning theory does not capture this causal coupling between timescales. We formalise the problem class as Nested Contextual Causal Bandits (NCCBs), a hierarchical SCM where each level's action sets the next level's context distribution, and propose Nested Causal Thompson Sampling (NCTS), which draws one mechanism-factorised belief per episode and acts r

Original report

This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.

Read at arXiv cs.LG
Tracked by The Continuum Brief · live intelligence network
Share
The Brief · Weekly Dispatch

Stay ahead of the systems reshaping markets.

By subscribing, you agree to receive updates from THE CONTINUUM BRIEF. You can unsubscribe at any time.