Off-Policy Evaluation for Missingness-Aware Policies in MDPs with Rewards Missing Not at Random

arXiv:2606.20206v1 Announce Type: cross Abstract: In offline Reinforcement Learning, immediate rewards in logged batch data are often unobserved due to sparse or irregular record-keeping, or censored beyond certain reward values. This issue arises in practical settings, including health care and marketing. We investigate off-policy evaluation (OPE) in finite-horizon Markov decision processes when rewards are missing not at random (MNAR), which breaks ignorability and induces selection bias even after conditioning on states and actions. To address this, we formalize a reward-dependent propensit
The paper addresses a critical, long-standing issue in offline Reinforcement Learning concerning missing data, which is becoming more acute as RL is applied to real-world, often messy, datasets.
Improving Off-Policy Evaluation (OPE) for missing data in real-world settings like healthcare and marketing is crucial for the safe and effective deployment of AI agents in high-stakes environments.
This research provides a formalized method using reward-dependent propensity for more accurate evaluation of missingness-aware policies, potentially reducing bias and enabling more robust RL applications.
- · AI/ML researchers
- · Healthcare sector
- · Marketing analytics
- · Reinforcement Learning practitioners
- · Organizations relying on biased OPE
- · Low-quality data collection practices
Improved OPE methods lead to more reliable assessment of AI policy efficacy in real-world scenarios.
Safer and more effective AI deployments could accelerate AI adoption in critical sectors where data quality is a known challenge.
The increased practical reliability of RL could contribute to the development of more advanced, generalizable AI agents capable of handling complex, incomplete datasets.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG